Tag Archives: Amazon Bedrock

AWS Weekly Roundup: EC2 C8gn instances, Amazon Nova Canvas virtual try-on, and more (July 7, 2025)

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-bedrock-api-keys-amazon-nova-canvas-virtual-try-on-and-more-july-7-2025/

Every Monday we tell you about the best releases and blogs that caught our attention last week.

Before continuing with this AWS Weekly Roundup, I’d like to share that last month I moved with my family to San Francisco, California, to start a new role as Developer Advocate/SDE, GenAI.

This excites me because I’ll have the opportunity to connect with new communities in the Bay Area while tackling exciting new challenges. If you’re part of a community focused on building generative AI and agentics applications, or know of one, I’d love to connect. Let’s connect!

Last week’s launches
Here are the launches from last week:

  • New Amazon EC2 C8gn instances powered by AWS Graviton4 offering up to 600Gbps network bandwidth – Amazon Elastic Compute Cloud (Amazon EC2) C8gn instances are now generally available, powered by AWS Graviton4 processors and 6th generation AWS Nitro Cards. These network-optimized instances deliver up to 600 Gbps network bandwidth. This represents the highest bandwidth among EC2 network-optimized instances, with up to 192 vCPUs and 384 GiB memory. They provide 30% higher compute performance than C7gn instances and are ideal for network-intensive workloads like virtual appliances, data analytics, and cluster computing jobs.
  • Build the highest resilience apps with multi-Region strong consistency in Amazon DynamoDB global tables – Amazon DynamoDB global tables now supports multi-Region strong consistency (MRSC) for applications requiring zero Recovery Point Objective (RPO). This capability ensures applications can read the latest data from any Region during outages, addressing critical needs in payment processing and financial services. MRSC requires three AWS Regions configured as either three full replicas or two replicas plus a witness, providing the highest level of application resilience for mission-critical workloads.
  • Amazon Nova Canvas update: Virtual try-on and style options now available – Amazon Nova Canvas introduces virtual try-on capabilities that help you visualize how clothing looks on a person by combining two images, plus eight new pre-trained style options (3D animation, design sketch, vector illustration, graphic novel, etc.) for generating images with improved artistic consistency. Available in three AWS Regions, these features enhance AI-powered image generation capabilities for retailers and content creators seeking realistic product visualizations.
  • Amazon Q in Connect now supports 7 languages for proactive recommendations – Amazon Q in Connect, a generative AI-powered assistant for customer service, now provides proactive recommendations in seven languages: English, Spanish, French, Portuguese, Mandarin, Japanese, and Korean. The AI-powered customer service assistant detects customer intent during voice and chat interactions to help agents resolve issues quickly and accurately.
  • Amazon Aurora MySQL and Amazon RDS for MySQL integration with Amazon SageMaker is now available – This integration provides near real-time data availability for analytics. It automatically extracts MySQL data into lakehouses with Apache Iceberg compatibility. You can then access this data seamlessly through various analytics engines and machine learning tools.
  • Amazon Aurora DSQL is now available in additional AWS RegionsAmazon Aurora DSQL expands to Asia Pacific (Seoul) and now supports multi-Region clusters across Asia Pacific and European regions. This serverless, distributed SQL database offers unlimited scalability, highest availability, and zero infrastructure management with AWS Free Tier access.

Other AWS blog posts

  • Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service – Learn how to optimize Retrieval Augmented Generation (RAG) in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service. This comprehensive guide demonstrates implementing RAG workflows with LangChain, covers OpenSearch optimization strategies, provides setup instructions, and explains benefits of combining these AWS services for scalable, cost-effective generative AI applications.v
  • Agentic GenAI App Using Bedrock, MCP servers on EKS – This post shows how to build a scalable AI chat application using Amazon Bedrock, Strands Agent, and Model Context Protocol (MCP) servers deployed on Amazon Elastic Kubernetes Service (Amazon EKS). The architecture combines agentic workflows with containerized microservices for intelligent, auto-scaling conversations with multiple foundation models.
  • Enforce table level access control on data lake tables using AWS Glue 5.0 with AWS Lake Formation – AWS Glue 5.0 introduces Full-Table Access (FTA) control for Apache Spark with AWS Lake Formation, providing table-level security without fine-grained access overhead. This feature supports native Spark SQL/DataFrames for Lake Formation tables. It enables read/write operations on Iceberg and Hive tables with improved performance and lower costs.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

  • AWS re:Invent – Register now to get a head start on choosing your best learning path, booking travel and accommodations, and bringing your team to learn, connect, and have fun. Early-career professionals can apply for the All Builders Welcome Grant program, designed to remove financial barriers and create diverse pathways into cloud technology. Applications are now open and close on July 15, 2025.
  • AWS NY Summit – You can gain insights from Swami’s keynote featuring the latest cutting-edge AWS technologies in compute, storage, and generative AI. My News Blog team is also preparing some exciting news for you. If you’re unable to attend in person, you can still participate by registering for the global live stream. Also, save the date for these upcoming Summits in July and August near your city.
  • AWS Builders Online Series – If you’re based in one of the Asia Pacific time zones, join and learn fundamental AWS concepts, architectural best practices, and hands-on demonstrations to help you build, migrate, and deploy your workloads on AWS.
  • Join AWS Gen AI Lofts – Experience AWS Gen AI Lofts across San Francisco, Berlin, Dubai, Dublin, Bengaluru, Manchester, Paris, Tel Aviv, and additional locations – hands-on workshops, expert guidance, investor networking, and collaborative spaces designed to accelerate your generative AI startup journey.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Eli

Amazon Nova Canvas update: Virtual try-on and style options now available

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/amazon-nova-canvas-update-virtual-try-on-and-style-options-now-available/

Have you ever wished you could quickly visualize how a new outfit might look on you before making a purchase? Or how a piece of furniture would look in your living room? Today, we’re excited to introduce a new virtual try-on capability in Amazon Nova Canvas that makes this possible. In addition, we are adding eight new style options for improved style consistency for text-to-image based style prompting. These features expand Nova Canvas AI-powered image generation capabilities making it easier than ever to create realistic product visualizations and stylized images that can enhance the experience of your customers.

Let’s take a quick look at how you can start using these today.

Getting started
The first thing is to make sure that you have access to the Nova Canvas model through the usual means. Head to the Amazon Bedrock console, choose Model access and enable Amazon Nova Canvas for your account making sure that you select the appropriate regions for your workloads. If you already have access and have been using Nova Canvas, you can start using the new features immediately as they’re automatically available to you.

Virtual try-on
The first exciting new feature is virtual try-on. With this, you can upload two pictures and ask Amazon Nova Canvas to put them together with realistic results. These could be pictures of apparel, accessories, home furnishings, and any other products including clothing. For example, you can provide the picture of a human as the source image and the picture of a garment as the reference image, and Amazon Nova Canvas will create a new image with that same person wearing the garment. Let’s try this out!

My starting point is to select two images. I picked one of myself in a pose that I think would work well for a clothes swap and a picture of an AWS-branded hoodie.

Matheus and AWS-branded hoodie

Note that Nova Canvas accepts images containing a maximum of 4.1M pixels – the equivalent of 2,048 x 2,048 – so be sure to scale your images to fit these constraints if necessary. Also, if you’d like to run the Python code featured in this article, ensure you have Python 3.9 or later installed as well as the Python packages boto3 and pillow.

To apply the hoodie to my photo, I use the Amazon Bedrock Runtime invoke API. You can find full details on the request and response structures for this API in the Amazon Nova User Guide. The code is straightforward, requiring only a few inference parameters. I use the new taskType of "VIRTUAL_TRY_ON". I then specify the desired settings, including both the source image and reference image, using the virtualTryOnParams object to set a few required parameters. Note that both images must be converted to Base64 strings.

import base64


def load_image_as_base64(image_path): 
   """Helper function for preparing image data."""
   with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode("utf-8")


inference_params = {
   "taskType": "VIRTUAL_TRY_ON",
   "virtualTryOnParams": {
      "sourceImage": load_image_as_base64("person.png"),
      "referenceImage": load_image_as_base64("aws-hoodie.jpg"),
      "maskType": "GARMENT",
      "garmentBasedMask": {"garmentClass": "UPPER_BODY"}
   }
}

Nova Canvas uses masking to manipulate images. This is a technique that allows AI image generation to focus on specific areas or regions of an image while preserving others, similar to using painter’s tape to protect areas you don’t want to paint.

You can use three different masking modes, which you can choose by setting maskType to the correct value. In this case, I’m using "GARMENT", which requires me to specify which part of the body I want to be masked. I’m using "UPPER_BODY" , but you can use others such as "LOWER_BODY", "FULL_BODY", or "FOOTWEAR" if you want to specifically target the feet. Refer to the documentation for a full list of options.

I then call the invoke API, passing in these inference arguments and saving the generated image to disk.

# Note: The inference_params variable from above is referenced below.

import base64
import io
import json

import boto3
from PIL import Image

# Create the Bedrock Runtime client.
bedrock = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")

# Prepare the invocation payload.
body_json = json.dumps(inference_params, indent=2)

# Invoke Nova Canvas.
response = bedrock.invoke_model(
   body=body_json,
   modelId="amazon.nova-canvas-v1:0",
   accept="application/json",
   contentType="application/json"
)

# Extract the images from the response.
response_body_json = json.loads(response.get("body").read())
images = response_body_json.get("images", [])

# Check for errors.
if response_body_json.get("error"):
   print(response_body_json.get("error"))

# Decode each image from Base64 and save as a PNG file.
for index, image_base64 in enumerate(images):
   image_bytes = base64.b64decode(image_base64)
   image_buffer = io.BytesIO(image_bytes)
   image = Image.open(image_buffer)
   image.save(f"image_{index}.png")

I get a very exciting result!

Matheus wearing AWS-branded hoodie

And just like that, I’m the proud wearer of an AWS-branded hoodie!

In addition to the "GARMENT" mask type, you can also use the "PROMPT" or "IMAGE" masks. With "PROMPT", you also provide the source and reference images, however, you provide a natural language prompt to specify which part of the source image you’d like to be replaced. This is similar to how the "INPAINTING" and "OUTPAINTING" tasks work in Nova Canvas. If you want to use your own image mask, then you choose the "IMAGE" mask type and provide a black-and-white image to be used as mask, where black indicates the pixels that you want to be replaced on the source image, and white the ones you want to preserve.

This capability is specifically useful for retailers. They can use it to help their customers make better purchasing decisions by seeing how products look before buying.

Using style options
I’ve always wondered what I would look like as an anime superhero. Previously, I could use Nova Canvas to manipulate an image of myself, but I would have to rely on my good prompt engineering skills to get it right. Now, Nova Canvas comes with pre-trained styles that you can apply to your images to get high-quality results that follow the artistic style of your choice. There are eight available styles including 3D animated family film, design sketch, flat vector illustration, graphic novel, maximalism, midcentury retro, photorealism, and soft digital painting.

Applying them is as straightforward as passing in an extra parameter to the Nova Canvas API. Let’s try an example.

I want to generate an image of an AWS superhero using the 3D animated family film style. To do this, I specify a taskType of "TEXT_IMAGE" and a textToImageParams object containing two parameters: text and style. The text parameter contains the prompt describing the image I want to create which in this case is “a superhero in a yellow outfit with a big AWS logo and a cape.” The style parameter specifies one of the predefined style values. I’m using "3D_ANIMATED_FAMILY_FILM" here, but you can find the full list in the Nova Canvas User Guide.

inference_params = {
   "taskType": "TEXT_IMAGE",
   "textToImageParams": {
      "text": "a superhero in a yellow outfit with a big AWS logo and a cape.",
      "style": "3D_ANIMATED_FAMILY_FILM",
   },
   "imageGenerationConfig": {
      "width": 1280,
      "height": 720,
      "seed": 321
   }
}

Then, I call the invoke API just as I did in the previous example. (The code has been omitted here for brevity.) And the result? Well, I’ll let you judge for yourself, but I have to say I’m quite pleased with the AWS superhero wearing my favorite color following the 3D animated family film style exactly as I envisioned.

What’s really cool is that I can keep my code and prompt exactly the same and only change the value of the style attribute to generate an image in a completely different style. Let’s try this out. I set style to PHOTOREALISM.

inference_params = { 
   "taskType": "TEXT_IMAGE", 
   "textToImageParams": { 
      "text": "a superhero in a yellow outfit with a big AWS logo and a cape.",
      "style": "PHOTOREALISM",
   },
   "imageGenerationConfig": {
      "width": 1280,
      "height": 720,
      "seed": 7
   }
}

And the result is impressive! A photorealistic superhero exactly as I described, which is a far departure from the previous generated cartoon and all it took was changing one line of code.

Things to know
Availability – Virtual try-on and style options are available in Amazon Nova Canvas in the US East (N. Virginia), Asia Pacific (Tokyo), and Europe (Ireland). Current users of Amazon Nova Canvas can immediately use these capabilities without migrating to a new model.

Pricing – See the Amazon Bedrock pricing page for details on costs.

For a preview of virtual try-on of garments, you can visit nova.amazon.com where you can upload an image of a person and a garment to visualize different clothing combinations.

If you are ready to get started, please check out the Nova Canvas User Guide or visit the AWS Console.

Matheus Guimaraes | @codingmatheus

Simplifying sustainability reporting using AWS and generative AI in banking

Post Syndicated from Sachin Kulkarni original https://aws.amazon.com/blogs/architecture/simplifying-sustainability-reporting-using-aws-and-generative-ai-in-banking/

European banks face a new challenge with the European Commission’s transition from the Non-Financial Reporting Directive (NFRD) to the Corporate Sustainability Reporting Directive (CSRD) regulations. This transition represents an expansion in sustainability reporting scope that will affect approximately 50,000 companies, a significant increase from the previous 11,700.

This means that banks themselves need to file sustainability reports because they will now be one of those 50,000 companies, but for their own reporting, they also need to assess their clients’ sustainability reports because they lend or finance those companies.

In this post, you learn how you can use generative AI services on Amazon Web Services (AWS) to automate your sustainability reporting requirements, reduce manual effort, and improve accuracy. You do this by implementing an automated solution for extracting, processing, and validating data from corporate reports.

The challenge

Financial institutions and sustainability teams managing sustainability reporting face three critical challenges:

  • Scale and complexity: Banks and financial institutions must process thousands of annual reports and sustainability documents, often spanning hundreds of pages each. This process requires extensive data extraction, complex EU Taxonomy alignment calculations, and resource-intensive validation steps. Manual processing introduces significant risks of errors and consumes valuable team resources.
  • Regulatory compliance: Banks must now implement detailed CSRD requirements, track specific metrics for turnover, capital expenditure (CapEx), and operating expenses (OpEx), and calculate their Green Asset Ratio (GAR) as well as environmental risks that come with their loans, debt, or equity investments. These new requirements demand robust data collection and processing capabilities.
  • Data management: Processing Green House Gas (GHG) emissions data across Scope 1, 2, and 3 categories requires analyzing complex lending and investment activities. With strict reporting deadlines, organizations need efficient tools to process this expanding volume of sustainability data.

The sustainability team point of view

Banks finance a large variety of counterparties and economic activities. While their carbon footprint is primarily linked to the greenhouse gas (GHG) emissions of their counterparties (Scope 3), The direct GHG emissions (Scope 1) of financial institutions or the GHG emissions linked to their energy consumption (Scope 2) are usually limited. For banks, the most critical key performance indicator (KPI) is the GAR, which measures the proportion of a bank’s taxonomy-aligned balance sheet exposures versus its total eligible exposures, as shown in the following figure.

To calculate their GAR, banks must obtain and use sustainability data from annual reports or sustainability reports of up to 50,000 companies (many of which are subject to NFRD and CSRD reporting), and understand how much of their activities are linked to EU Taxonomy.

The manual process

In the example that follows, we use the Amazon 2023 Annual Report. Some of the data that teams would have to manually extract includes: Revenue, Scope 1, Scope 2, and Scope 3 emissions.

Amazon Annual Report

As you can see from the page count at the top of the preceding figure, people manually searching for this data would have to go through 92 pages to find the parameters they’re looking for. Next, we might determine that some of the data we need (Scope 1, Scope 2, Scope 3) isn’t available in the annual report, so we need to analyze the sustainability report. As shown in the following figure, to manually retrieve the relevant data from this report, we would have to go through 98 pages of information.

amazon sustainability report

To prepare a GAR, we would have to repeat this process across hundreds or even thousands of companies.

A solution using AWS and generative AI

To address these challenges, we propose an automated approach using AWS services. This approach can help banks streamline their sustainability reporting processes.

high level flow

Here’s how this solution works— as shown in the preceding figure:

  1. Upload your counterparties’ reports to Amazon Simple Storage Service (Amazon S3).
  2. Amazon Bedrock automatically:
    1. Determines NFRD eligibility.
    2. Extracts relevant sustainability data.
    3. Organizes information for GAR calculations.
  3. Review and validate the extracted data.
  4. Generate required regulatory reports.

Architecture

We divide the architecture into two areas:

  1. Data ingestion flow
  2. Report generation flow

Data ingestion flow

We use Amazon Bedrock Knowledge Bases to build an automated data ingestion flow. See Prerequisites for your Amazon Bedrock knowledge base data to understand supported document formats and limits for knowledge base data.

Data Ingestion flow

The workflow, shown in the preceding figure, is:

  1.  Annual reports or sustainability reports are uploaded into an S3 bucket.
  2. On the S3 bucket, we enable event notifications for events such as addition, change, or deletion of the reports.
  3. These events are sent to Amazon Event Bridge, which trigger an AWS Lambda function.
  4. The Lambda function syncs the data source to an Amazon Bedrock knowledge base.
  5. Amazon Bedrock Knowledge Bases processes the documents and converts it into vector embeddings. For more information, see Amazon Bedrock Knowledge Bases supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications
  6. Amazon Bedrock Knowledge Bases stores the vector embeddings in the vector database of your choice, such as in an Amazon OpenSearch Serverless collection.

Now the data is read, broken into chunks, converted to embeddings and stored in a vector store. You use a report generation flow to ask questions about the information in the knowledge base.

Report generation flow

To automate the report generation for sustainability teams, we created the report generation flow shown in the following figure.

report generation flow

The report generation flow includes the following steps:

  1. When user uploads an annual report, the data from the report is ingested into the knowledge base, as shown in the data ingestion flow.
  2. A Lambda function—Invoke Bedrock Agent—is triggered to invoke an Amazon Bedrock agent.
  3. The Amazon Bedrock agent determines NFRD or CSRD applicability based on various parameters such as employee numbers and annual revenues. This agent then passes on what kind of regulation to apply to a Lambda function.
  4. The Lambda function Retrieve Sustainability Metrics retrieves various parameters needed for NFRD or CSRD from the annual report.
    1. The function receives NFRD or CSRD applicability from the Amazon Bedrock agent.
    2. Based on NFRD or CSRD applicability, there are specific sustainability metrics that need to be retrieved. For NFRD, there are about 15 metrics that need to be retrieved, and for CSRD, there are about 30 metrics.
    3.  The function iteratively sends {variable} to the Amazon Bedrock flow. For example, if the metric to be retrieved is Scope 1 emission, then the Lambda function will send variable=‘Scope 1 emission’
    4. The function gets the metric value from the Amazon Bedrock flow and when the required metrics are retrieved, creates a CSV file with the details.
  5. Amazon Bedrock flow:
    1. Retrieve {variable} (for example, ‘Scope 1 emission’) from the annual report. For this, we create a prompt, as shown in the following diagram.
    2. Use the prompt to fetch the value from the knowledge base.
        • Prompt:
          <query> You are an intelligent agent that helps retrieve information from a knowledgebase. Please find {{variable}}. Please return only a number and not any additional text. I only need the value so you will return one word</query>

    3. Return the value to the Lambda function in Step 4.

Breakdown of key components

Amazon S3 is used for storing annual statements and sustainability reports, providing highly durable and secure object storage that facilitate immediate access when needed for processing.

Amazon Bedrock Knowledge Bases enables using Retrieval-Augmented Generation (RAG) to optimize the output of a large language model by giving it the context of companies’ annual reports and regulatory requirements. It does so by creating chunks and vector embeddings from the annual reports to enable efficient information retrieval from a vector database of your choice.

Amazon Bedrock foundation models (FMs) extract information from an Amazon Bedrock knowledge base and generate standardized PDF reports for regulators, providing consistent formatting and alignment with CSRD requirements. We encourage you to choose the best foundational model for your use case through the flexibility and enterprise-grade controls of Amazon Bedrock. For this solution, we used Anthropic’s Claude Sonnet 3.5 as the model, but by using Amazon Bedrock, you can choose from over 50 different models to see which one best fits your use case.

Amazon Bedrock Flows orchestrates the document processing pipeline, coordinating between services to automatically extract required sustainability metrics and validate compliance requirements. This feature helps us manage the workflow from initial document ingestion through to final report generation.

Amazon Bedrock Prompt Management creates and helps manage precise prompts that help retrieve multiple sustainability metrics from reports for example: turnover, Scope 1, Scope 2, and Scope 3 emissions data. These structured prompts facilitate consistent data extraction across different document formats.

Amazon Bedrock Agents evaluates each uploaded document to determine NFRD or CSRD eligibility by analyzing company revenue, employee count, and incorporation details. The agents retrieve these parameters by using a Lambda function that’s part of the actions the agent can perform.

Lambda handles event-driven processing when new documents are uploaded. Lambda functions are also used by the agent to retrieve data from companies’ annual reports and trigger the appropriate workflows based on document type.

Amazon EventBridge is used to build event-driven applications at scale across AWS and manages workflow orchestration, automatically initiating document processing when new reports are uploaded through S3 event notifications.

This architecture enables banks to process thousands of sustainability reports efficiently. The solution scales automatically to handle increasing document volumes while keeping security a top priority.

Additional considerations

You can use the following additional AWS service to help further increase the accuracy of information retrieval from sustainability documents.

Amazon Bedrock Guardrails to make sure that the solution caters to responsible AI policies. Specifically, we have added contextual grounding checks to reduce hallucinations. This is important for the solution because we’re trying to find a few specific values in a large document, and these checks make sure that the metrics retrieved are based on the documents.

Automated reasoning checks which help to verify the metrics returned by the solution. Consider the metric Number of employees. There can be multiple places in the annual report where the number of employees is mentioned; for example, temporary workers, part-time employees, employees from various departments, employees from a company that was taken over last year, and so on. To arrive at the right number, automated reasoning checks help.

Benefits

This sustainability reporting solution cuts document processing time from 8—10 weeks to few hours. Banks get clear audit trails showing exactly how they extracted and validated sustainability data. When regulations are updated, the system adapts through its knowledge base without disrupting operations. Built-in security protects company data through the entire process. Access controls and encryption are in place to secure information. The output delivers standardized, accurate reports. This automation lets sustainability teams concentrate on environmental improvements rather than paperwork. Teams can instead analyze trends and develop initiatives instead of hunting through reports for data points.

Conclusion

As sustainability reporting requirements evolve, having a flexible and automated solution will become crucial. While we focused on NFRD reporting, the same pattern can be adapted for CSRD compliance reporting, SFDR reporting requirements, and Internal sustainability metrics, or EU Taxonomy alignment.

Customers looking to build their products in the Financial Services industry have access to industry and domain AWS specialists; contact us for help in your cloud journey.

You can also learn more about AWS services and solutions for financial services by visiting AWS for Financial Services and Generative AI on AWS.


About the authors

Amazon Bedrock baseline architecture in an AWS landing zone

Post Syndicated from Abdel-Rahman Awad original https://aws.amazon.com/blogs/architecture/amazon-bedrock-baseline-architecture-in-an-aws-landing-zone/

As organizations increasingly adopt Amazon Bedrock to build and deploy large-scale AI applications, it’s important that they understand and adopt critical network access controls to protect their data and workloads. These generative AI-enabled applications might have access to sensitive or confidential information within their knowledge bases, Retrieval Augmented Generation (RAG) data sources, or models themselves, which could pose a risk if exposed to unauthorized parties. Additionally, organizations might want to limit access to certain AI models to specific teams or services, making sure only authorized users can use the most powerful capabilities. Another important consideration is cost optimization, because organizations need to be able to monitor and control access to manage various aspects of their cloud spending.

In this post, we explore the Amazon Bedrock baseline architecture and how you can secure and control network access to your various Amazon Bedrock capabilities within AWS network services and tools. We discuss key design considerations, such as using Amazon VPC Lattice auth policies, Amazon Virtual Private Cloud (Amazon VPC) endpoints, and AWS Identity and Access Management (IAM) to restrict and monitor access to your Amazon Bedrock capabilities.

By the end of this post, you will have a better understanding of how to configure your AWS landing zone to establish secure and controlled network connectivity to Amazon Bedrock across your organization using VPC Lattice.

Solution overview

Addressing the aforementioned challenges requires a well-designed network architecture and security controls. For this, we use the standard AWS Landing Zone Accelerator networking configuration. It provides a good starting point for managing network communication across multiple accounts. On top of the AWS Landing Zone Accelerator network design, we add two shared accounts.

In this solution design, we create a centralized architecture for managing organization AI capabilities across different accounts. The architecture consists of three main parts that work together to provide secure and controlled access to AI services:

  • Service network account – This account serves as the central networking hub for the organization, managing network connectivity and access policies. Through this account, network administrators can centrally manage and control access to AI services across the organization. The account follows AWS Landing Zone Accelerator networking practices that scale with enterprise organizational needs.
  • Generative AI account – This account hosts the organization’s Amazon Bedrock capabilities and serves as the central point for AI/ML management. The organization’s AI/ML scientists and prompt engineers will centrally build and manage Amazon Bedrock capabilities. The account provides access to various large language models (LLMs) through Amazon Bedrock by using VPC interface endpoints, while also enabling centralized monitoring of cost consumption and access patterns.
  • Workload accounts (dev, test, prod) – These accounts represent different environments where teams develop and deploy applications that consume AI services. Through secure network connections established through the service network account, these workload accounts can access the AI capabilities hosted in the generative AI account. This separation enforces proper isolation between development, testing, and production workloads while maintaining secure access to AI services.
Amazon Bedrock baseline architecture in an AWS landing zone

Amazon Bedrock baseline architecture in an AWS landing zone

The following diagram illustrates the solution architecture.

The service network account has its own VPC Lattice service network—a centralized networking construct that enables service-to-service communication across your organization, which is shared with workload accounts using AWS Resource Access Manager (AWS RAM) to enable VPC Lattice Service network sharing.

Workload accounts (dev, test, prod) establish VPC associations with the shared VPC Lattice service network by creating a service network association in their VPC. When an application in these accounts makes a request, it first queries the VPC resolver for DNS resolution. The resolver routes the traffic to the VPC Lattice service network.

Access control is implemented through an VPC Lattice auth policy. The service network policies determine which accounts can access the VPC Lattice service network, and service-level policies control access to specific AI services and define what actions each account can perform.

In the central AI services account, we find the proxy layer, we create a VPC Lattice service that points to a proxy layer, which acts as a single entry point, providing workload accounts access to Amazon Bedrock. This proxy layer then connects to Amazon Bedrock through VPC endpoints. Through this setup, the AI team can configure which foundation models (FMs) are available and manage access permissions for different workload accounts. After the necessary policies and connections are in place, workload accounts can access Amazon Bedrock capabilities through the established secure pathway. This setup enables secure, cross-account access to AI services while maintaining centralized control and monitoring.

Network components

We use VPC Lattice, which is a fully managed application networking service that helps you simplify network connectivity, security, and monitoring for service-to-service communication needs. With VPC Lattice, organizations can achieve a centralized connectivity pattern to control and monitor access to the services required for building generative AI applications.

For details about VPC Lattice, refer to the Amazon VPC Lattice User Guide. The following is an overview of the constructs you can use in setting up the centralized pattern in this solution:

  • VPC Lattice service network – You can use the VPC Lattice service network to provide central connectivity and security to the central AI services account. The service network is a logical grouping mechanism that simplifies how you can enable connectivity across VPCs or accounts, and apply common security policies for application communication patterns. You can create a service network in an account and share it with other accounts within or outside AWS Organizations using AWS RAM.
  • VPC Lattice service – In a service network, you can associate a VPC Lattice service, which consists of a listener (protocol and port number), routing rules that allow for control of the application flow (for example, path, method, header-based, or weighted routing), and target group, which defines the application infrastructure. A service can have multiple listeners to meet various client capabilities. Supported protocols include HTTP, HTTPS, gRPC, and TLS. The path-based routing allows control to various high-performing FMs and other capabilities you would need to build a generative AI application.
  • Proxy layer – You use a proxy layer for the VPC Lattice service target group. The proxy layer can be built based on your organization’s preference of AWS services, such as AWS Lambda, AWS Fargate, or Amazon Elastic Kubernetes Service (Amazon EKS). The purpose of the proxy layer is to provide a single entry point to access LLMs, knowledge bases, and other capabilities that are tested and approved according to your organization’s compliance requirements.
  • VPC Lattice auth policies – For security, you use VPC Lattice auth policies. VPC Lattice auth policies are specified using the same syntax as IAM policies. You can apply an auth policy to VPC Lattice service network as well as to the VPC Lattice service.
  • Fully Qualified Domain Names –To facilitate service discovery, VPC Lattice supports custom domain names for your services and resources, and maintains a Fully Qualified Domain Name (FQDN) for each VPC Lattice service and resource you define. You can use these FQDNs in your Amazon Route 53 private hosted zone configurations, and empower business units or teams to discover and access services and resources.
  • Service network VPC – Business units or teams can access generative AI services in a service network using service network VPC associations or a service network VPC endpoint.
  • Monitoring – You can choose to enable monitoring at the VPC Lattice service network level and VPC Lattice service level. VPC Lattice generates metrics and logs for requests and responses, making it more efficient to monitor and troubleshoot applications

The preceding guidance takes a “secure by default” approach—you must be explicit about which features, models, and so on should be accessed by which business unit. The setup also enables you to implement a defense-in-depth strategy at multiple layers of the network:

  • The first level of defense is that business team needs to connect to the service network in order to get access to the generative AI service through the central AI service account.
  • The second level includes network-level security protections in the business team’s VPC for the service network, such as security groups and network access control lists (ACLs). By using these, you can allow access to specific workloads or teams in a VPC.
  • The third level is through the VPC Lattice auth policy, which you can apply at two layers: at the service network level to allow authenticated requests within the organization, and at the service level to allow access to specific models and features.

VPC Lattice auth policy

This solution makes it possible to centrally manage access to Amazon Bedrock resources across your organization. This approach uses an VPC Lattice auth policy to centrally control Amazon Bedrock resources and manage it from one location across all the organization accounts.

Typically, the auth policy on the service network is operated by the network or cloud administrator. For example, allowing only authenticated requests from specific workloads or teams in your AWS organization. In the following example, access is granted to invoke the generated AI service for authenticated requests and to principals that are part of the o-123456example organization:

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Effect": "Allow",
         "Principal": "*",
         "Action": "vpc-lattice-svcs:Invoke",
         "Resource": "*",
         "Condition": {
            "StringEquals": {
               "aws:PrincipalOrgID": [ 
                  "o-123456example"
               ]
            }
         }
      }
   ]
}

The auth policy at the service level is managed by the central AI service team to set fine-grained controls, which can be more restrictive than the coarse-grained authorization applied at the service network level. For example, the following policy restricts access to claude-3-haiku for only business-team1:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "Allow",
         "Principal": {
            "AWS": [
               "arn:aws:iam::<account-number>:role/businss-team1"
            ]
         },
         "Action": "vpc-lattice-svcs:Invoke",
         "Resource": [
            "arn:aws:vpc-lattice:<aws-region>:<account-number>:service/svc-0123456789abcdef0/*"
         ],
         "Condition": {
            "StringEquals": {
               "vpc-lattice-svcs:RequestQueryString/modelid": "claude-3-haiku"            }
         }
      }
   ]
}

Monitoring and tracking

This design employs three monitoring approaches, using Amazon CloudWatch, AWS CloudTrail, and VPC Lattice access logs. This strategy provides a view of service usage, security, and performance.

CloudWatch metrics offer real-time monitoring of VPC Lattice service performance and usage. CloudWatch tracks metrics such as request counts and response times for Amazon Bedrock related endpoints, allowing for the setup of alarms for proactive management of service health and capacity. This enables monitoring of overall usage patterns of Amazon Bedrock models across different business units, facilitating capacity planning and resource allocation. CloudTrail provides detailed API-level auditing of Amazon Bedrock related actions. It logs cross-account access attempts and interactions with Amazon Bedrock services, providing a compliance and security audit trail. This tracking of who is accessing which Amazon Bedrock models, when, and from which accounts helps organizations adhere to their organizational policies.VPC Lattice access logs provide detailed insights into HTTP/HTTPS requests to Amazon Bedrock services, capturing specific usage patterns of AI models by different business teams. These logs contain client-specific information, which for example can be used to enable organizations to implement capabilities such as charge-back models. This allows for accurate attribution of AI service usage to specific teams or departments, facilitating fair cost allocation and responsible resource utilization across the organization. These services work together to enhance security, optimize performance, and provide valuable insights for managing cross-account Amazon Bedrock access.

Conclusion

In this post, we explored the importance of securing and controlling network access to Amazon Bedrock capabilities within an organization’s AWS landing zone. We discussed the key business challenges, such as the need to protect sensitive information in Amazon Bedrock knowledge bases, limit access to AI models, and optimize cloud costs by monitoring and controlling Amazon Bedrock capabilities. To address these challenges, we outlined a multi-layered network solution that uses AWS networking services, including a VPC Lattice auth policy to restrict and monitor access to Amazon Bedrock capabilities. Try out this solution for your own use case, and share your feedback in the comments.


About the authors

Empower AI agents with user context using Amazon Cognito

Post Syndicated from Abrom Douglas original https://aws.amazon.com/blogs/security/empower-ai-agents-with-user-context-using-amazon-cognito/

Amazon Cognito is a managed customer identity and access management (CIAM) service that enables seamless user sign-up and sign-in for web and mobile applications. Through user pools, Amazon Cognito provides a user directory with strong authentication features, including passkeys, federation to external identity providers (IdPs), and OAuth 2.0 flows for secure machine-to-machine (M2M) authorization.

Amazon Cognito issues standard JSON Web Tokens (JWTs) and supports the customization of identity and access tokens for user authentication by using the pre token generation Lambda trigger. Learn more about this in How to customize access tokens in Amazon Cognito user pools. Amazon Cognito has extended token customization capabilities to support access token customization for M2M and the ability to pass metadata from the client during M2M authorization. Application builders can use these two features to support multiple use cases, including customizing access tokens based on unique runtime policies, entitlements, environment, or passed metadata. This can simplify and enrich M2M authentication and authorization scenarios and opens up new possibilities for emerging use cases, such as identity and access management for AI agents.

This post demonstrates how Amazon Cognito enables AI agents to perform authorized actions on behalf of users through user-contextualized access tokens for OAuth 2.0-enabled resource servers. AI agents represent a class of autonomous services that require robust identity management and precise access control, especially when acting on behalf of users. By using the Amazon Cognito client credentials flow with access token customization, you can establish distinct identities for AI agents that carry critical information about their capabilities, scope of access, and intended use cases. This approach provides a foundation for more secure, auditable AI agent operations while maintaining clear boundaries around their authorized activities.

The identity of an AI agent can be represented within Amazon Cognito as an app client. The AI agent obtains an access token (JSON Web Token (JWT)) through an OAuth 2.0 client credentials grant. This JWT can be customized to contain claims that represent the authenticated human user whom the AI agent is acting on behalf of. This token can then be used to authorize access to other services that has established trust with the Amazon Cognito user pool by trusting the issuer and audience of the token. For example, this third-party service could be a claims processor, a travel agency service, or a scheduling service acting on behalf of a user. The focus of this post is on foundational building blocks using Amazon Cognito for AI agents and how to obtain a customized access token with user context.

Solution overview and reference architecture

Looking at an example architecture (Figure 1), a user signs in to a web or mobile application using an Amazon Cognito user pool, and tokens for the user are returned to the client. Here, the application could be a serverless digital assistant using an Amazon Bedrock agent that needs to gather and process data residing in a third-party cross-domain service. The AI agent obtains its own access token by performing an OAuth 2.0 client credentials grant while passing the user’s access token as context using the aws_client_metadata request parameter. The AI agent receives the user contextualized access token and calls an external, third-party, or cross-domain service that trusts the issuer and audience of the AI agent’s access token issued from an Amazon Cognito user pool. The cross-domain service can obtain the JSON Web Key Set (JWKS) to verify the token and extract claims presenting both the AI agent and most importantly, the underlying user. Authorization takes place within the cross-domain service using the claims of the customized access token and for fine-grain authorization, Amazon Verified Permissions is used. See Figure 1 for a detailed flow of this example.

Figure 1: AI agent identity reference architecture

Figure 1: AI agent identity reference architecture

  1. The user navigates to the application through the client.
  2. There is no existing session or token for the user, so the user authentication flow with the Amazon Cognito user pool begins.
  3. After a successful sign-in, Amazon Cognito returns access, ID, and refresh tokens to the client for the user.
  4. As the user interacts with AI agent through the application, the client sends the user’s access token to an Amazon API Gateway endpoint.
  5. The API gateway integrates with the AI agent, which is using an Amazon Bedrock agent. As an example, this can use several AWS Lambda functions interacting with an Amazon Bedrock Knowledge Base or a Retrieval-Augmented Generation (RAG) process.
  6. The AI agent obtains its own access token from an Amazon Cognito user pool using an OAuth 2.0 client credentials grant. The user’s access token, obtained in step 1, is sent with the token request in the aws_client_metadata request parameter.

Note: You can use different Amazon Cognito user pools for user authentication and for agent (machine) authentication. This promotes separation and provides the ability to apply different settings and controls on each user pool if needed to meet security requirements.

  1. Amazon Cognito validates the client ID and secret from the AI agent and invokes the pre token generation Lambda trigger to customize the access token for the AI agent.

Note: Within the pre token generation Lambda trigger, the user’s access token is verified before returning a customized access token to the AI agent using the aws-jwt-verify library.

  1. The customized access token is returned to the AI agent, including custom claims representing the user.
  2. The AI agent, using its own access token, calls the cross-domain service to perform the requested action on behalf of the user. For example, this can be a third-party reservation system or a photo sharing service.
  3. The resource server in the cross-domain service verifies that the access token from the AI agent is valid. The resource server must be pre-configured to trust the user pool that issued the agent access token.
  4. Coarse- and fine-grained authorization can happen either locally in the service code or using Verified Permissions.
  5. A response from the cross-domain service flows back to the AI agent, if necessary.
  6. A response from the AI agent to the user application or client is returned, if necessary.
  7. Actions that take place throughout the flow are logged in AWS CloudTrail, providing end-to-end logging and auditing.

Implementation details

Let’s take a deeper look into the three core components of this scenario:

  1. The AI agent obtaining its own OAuth 2.0 access token
  2. The Amazon Cognito pre token generation Lambda trigger used to enrich the AI agent’s access token with user context
  3. The cross-domain resource server performing fine-grained authorization

AI agent

Figure 2: AI agent obtaining a user access token from the frontend application through API Gateway

Figure 2: AI agent obtaining a user access token from the frontend application through API Gateway


Amazon Bedrock Agents is used in this solution with a
custom orchestration configured to use Lambda. When the application interacts with the Amazon Bedrock agent, the custom orchestrator initiation begins with the agent passing the user’s access token to a Lambda function as part of the custom orchestration (shown in Figure 2). The Lambda function validates the user’s token to verify that it’s not expired and hasn’t been tampered with. This custom orchestrator begins the process for the agent to obtain its own OAuth access token and to access downstream and cross-domain resources on behalf of the user. The human user’s access token is included in the call from the application through the client. To learn more about Amazon Bedrock Agents custom orchestrator, see
Getting started with Amazon Bedrock Agents custom orchestrator. The following is an example of what a human user’s decoded access token provided through an API Gateway REST API might look like.

{
  sub: "user-identity-4e4c-example-7cede8e609a2",
  cognito:groups: 
    [
    "exampleChatApplicationAccess"
    ]
  ,
  iss: https://cognito-idp.<region>.amazonaws.com/<region>_example,
  version: 2,
  client_id: "1example23456789",
  origin_jti: "",
  token_use: "access",
  scope: "openid profile email",
  auth_time: 499192140,
  exp: 1445444940,
  iat: 499192140,
  jti: "",
  username: "my-example-username"
}

The following is a Node.js code sample that an AI agent can use to obtain its own access token from Amazon Cognito. This can be the Lambda function part of the custom orchestration for the Amazon Bedrock agent. Notice the clientMetadata variable being set, which will be passed to the Cognito /token endpoint using the aws_client_metadata request parameter. This request parameter is where the user’s access token is provided. In the following code example, you will find an attribute called callerApp, which is set to ExampleChatApplication, which serves as a unique identifier for the application. The callerApp value is preconfigured in the backend of the solution. This unique application identifier is included in the customized access token for the agent and used for additional authorization checks later. It’s a security best practice to use AWS Secrets Manager to store the client ID and client secret and obtain these credentials at runtime. As a security best practice, the user’s access token should be verified prior to passing it to the AI agent backend.

async function getAccessToken() {
    const clientId = 'exampleAiAgentClientId'; // use Secrets Manager
    const clientSecret = 'exampleAiAgentClientSecret'; // use Secrets Manager
    const tokenEndpoint = 'https://mydomain.auth.<region>.amazoncognito.com/oauth2/token';
    const scope = 'crossDomainService/read userData/read';
    const clientMetadata = '{"onBehalfOfToken":"<HUMAN-USER-ACCESS-TOKEN>", "callerApp":"ExampleChatApplication"}';
  
    const basicAuth = Buffer.from(`${clientId}:${clientSecret}`).toString('base64');
  
    const body = new URLSearchParams({
      grant_type: 'client_credentials',
      scope,
      aws_client_metadata: clientMetadata
    });
  
    const res = await fetch(tokenEndpoint, {
      method: 'POST',
      headers: {
        'Authorization': `Basic ${basicAuth}`,
        'Content-Type': 'application/x-www-form-urlencoded'
      },
      body
    });
  
    if (!res.ok) {
      const error = await res.text();
      throw new Error(`Token request failed: ${res.status} ${error}`);
    }
  
    const { access_token } = await res.json();
    console.log('Access Token:', access_token);
  
    return access_token;
  }
  
  getAccessToken().catch(err => console.error('Error:', err.message));

The access token for the AI agent is returned only if the client ID and secret are correct and the provided user access token is valid. However, before it’s returned, the AI agent’s access token is customized by the Amazon Cognito pre token generation Lambda trigger.

Amazon Cognito pre token generation Lambda trigger

Figure 3: AI agent access token customization with Cognito pre token generation Lambda trigger

Figure 3: AI agent access token customization with Cognito pre token generation Lambda trigger

After the AI agent’s action calls the Amazon Cognito /token endpoint with a valid client ID and secret, Cognito invokes the pre token generation Lambda trigger. The following is an example Lambda function that takes the aws_client_metadata request parameter, which contains the access token of the user and the callerApp attribute that was defined while the user was authenticating. In the following Lambda function, the access token provided from the user is verified (shown in Figure 3). The aws-jwt-verify library is used to verify the token is not expired, the token has not been tampered with by verifying the signature, and it’s making sure that an access token was provided. The Lambda function is also pre-configured to accept user tokens from a specific issuer and audience, this protects against malicious context injection risks. This is also an opportunity to perform additional authorization. For example, check if the user is a member of certain groups.

After the token is verified, the Lambda function customizes the access token to be returned to the AI agent.

import { CognitoJwtVerifier } from "aws-jwt-verify";

// Initialize the JWT verifier to verify the user’s access token
// Provide the user pool ID, token use, and client ID 
const jwtVerifier = CognitoJwtVerifier.create({
  userPoolId: process.env.USER_POOL_ID,  // user pool for user authentication
  clientId: process.env.CLIENT_ID,
  // groups: "exampleChatApplicationAccess", // optional group membership authorization
  tokenUse: 'access'
});

export const handler = async function(event, context) {
  try {
    const onBehalfOfToken = event.request.clientMetadata?.onBehalfOfToken || '';
    // It’s recommended that the provided “callerApp” value from the application is authorized for use with the app client for the AI agent
    const callerApp = event.request.clientMetadata?.callerApp || '';

    // The below console log will display the authenticated user’s JWT
    // Keep this logging with caution in a production environment
    console.log('Original event:', event);

    // Verify the access token from the human user
    // You could optionally also perform some authorization checks here as well
    // Example: check for the membership of a group
    let decodedJWT;
    if (onBehalfOfToken) {
      try {
        decodedJWT = await jwtVerifier.verify(onBehalfOfToken);
        console.log('Decoded JWT:', decodedJWT);
      } catch (err) {
        console.error('Token verification failed:', err);
        throw new Error('Token verification failed');
      }
    }

    // Create the onBehalfOf claim structure
    const behalfOfClaim = decodedJWT ? {
      sub: decodedJWT.sub,
      username: decodedJWT.username,
      groups: decodedJWT['cognito:groups'] || []
    } : {};

    // Customized token returned to client
    event.response = {
      "claimsAndScopeOverrideDetails": {
        "accessTokenGeneration": {
          "claimsToAddOrOverride": {
            "onBehalfOf": behalfOfClaim,
            "callerApp": callerApp
          },
        }
      }
    };

    return event;

  } catch (error) {
    console.error('Error in Lambda execution:', error);
    throw error;
  }
};

Notice in the preceding Lambda function that two custom claims are being dynamically created within the event.response: onBehalfOf and callerApp. The onBehalfOf claim contains nested claims that were extracted from the human user’s access token. The callerApp is carried forward from the frontend application and provided alongside the user access token. It’s recommended for the callerApp value to also be verified against some custom logic to add additional layer of protection. The return AI agent’s access token would look like the following JWT.


{    
	"sub": "agent-identity-4e4c-example-7cede8e609a2",
	"onBehalfOf": {
		"sub": "user-identity-4e4c-example-7cede8e609a2",
		"username": "my-example-username",
		"groups": [
			"readaccess"        
				]    
		},    
		"iss": "https://cognito-idp..amazonaws.com/_example",
		"version": 2,
		"client_id": "1example23456789",
		"callerApp": "ExampleChatApplication",
		"token_use": "access",
		"scope": "crossDomainService123/read userData/read",
		"auth_time": 499192140,
		"exp": 1445444940,
		"iat": 499192140,
		"jti": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}

Cross-domain resource server authorization check

At this point, shown in Figure 4, the human user has successfully authenticated to the web application, the human user’s access token was sent as context to the backend, an AI agent obtained its own customized access token containing the human user context, and now the agent is ready to call an external cross-domain service.

Figure 4: Cross-domain resource server performing fine-grained authorization with Amazon Verified Permissions

Figure 4: Cross-domain resource server performing fine-grained authorization with Amazon Verified Permissions

As shown in Figure 4, the cross-domain service is the resource server and therefore needs to perform an authorization check. For this example, we’ll keep things straightforward and make sure that three core things are verified:

  1. The AI agent’s OAuth access token is valid
  2. The AI agent is authorized to access this service
  3. The AI agent is authorized to interact with the user data

Depending on your use case and requirements, you might also need to verify that the user’s consent has been obtained prior to the AI agent acting on their behalf. Ultimately, you want to verify that the AI agent can access a user’s data on their behalf and only for the purpose for which consent has been provided by the user.

For the token verification, use the aws-jwt-verify library again. The following is a Node.js example to verify the AI agent’s access token.

import { CognitoJwtVerifier } from "aws-jwt-verify";

// add custom logic to verify that AI agent is authorized to perform this action on behalf of the user

// Verifier that expects valid access tokens:
const verifier = CognitoJwtVerifier.create({
  userPoolId: "<user_pool_id>", // user pool for AI agent authentication
  tokenUse: "access",
  clientId: "<client_id>",
});

try {
  const payload = await verifier.verify(
    "eyJraWQeyJhdF9oYXNoIjoidk..." //this will be the AI agent's access token
  );
  console.log("Token is valid. Payload:", payload);
} catch {
  console.log("Token not valid!");
}

Fine-grained authorization with Verified Permissions

As a security best practice, the zero trust principle of enforcing fine-grained identity-based authorization should take place using Verified Permissions. The preceding Node.js code sample is a basic validation of the AI agents access token that can happen within the application logic. Instead of keeping authorization logic within the resource server, you can use Verified Permissions to offload the authorization policies to a managed service. The following is an example Cedar policy for this use case.

permit(
    principal == Agent::"agent-identity-4e4c-example-7cede8e609a2",
    action == Action::"readOnly",
    resource == Resource::"crossDomainService123::userData"
)
when {
    resource.scope == Scope::"crossDomainService123/read" &&
    resource.owner == User::" user-identity-4e4c-example-7cede8e609a2" &&
    context.onBehalfOf.sub == " user-identity-4e4c-example-7cede8e609a2" &&
    context.callerApp == "ExampleChatApplication"
};

With the preceding Cedar policy example, you are permitting the AI agent to read userData from the crossDomainService123 resource. This is only permitted when the AI agent’s access token contains the crossDomainService/read scope and when the resource owner and the onBehalfOf user (from the access token) are the same—the human user in this case. There’s also an additional when clause in the policy to make sure that this interaction initiated from ExampleChatApplication.

The cross-domain resource server would use the AI agent’s access token and call the Verified Permissions IsAuthorizedWithToken API. To learn more, see Simplify fine-grained authorization with Amazon Verified Permissions and Amazon Cognito.

The following is a Node.js example using the IsAuthorizedWithToken API from Verified Permissions using the AWS SDK for JavaScript v3.

import { VerifiedPermissionsClient, IsAuthorizedWithTokenCommand } from "@aws-sdk/client-verifiedpermissions";

const client = new VerifiedPermissionsClient({ region: "<region>" });

// Dynamically provided token 
const jwtToken = "eyJraWQiOiJrMWtleSIsInR..."; //AI agent's access token

async function checkAccess() {
  const input = {
    policyStoreId: "ps-abc123example", // your AVP policy store ID
    accessToken: jwtToken,
    action: {
      actionType: "Action",
      actionId: "readOnly"
    },
    resource: {
      entityType: "crossDomainService123",
      entityId: "userData"
    }
  };

  const command = new IsAuthorizedWithTokenCommand(input);

  try {
    const response = await client.send(command);
    console.log("Authorization Decision:", response.decision);
  } catch (err) {
    console.error("Authorization error:", err);
  }
}

Based on the preceding examples of the AI agent’s access token (with user context), the Cedar policy, and the IsAuthorizedWithToken API call, the resource server would get an Allow decision for this action to take place. The following is an example of the authorization decision response.

{
    "decision": "Allow",
    "determiningPolicies": [{
        "determiningPolicyId": "ps-abc123example"
    }],
    "errors": []
}

Before this policy can be evaluated, you must define a schema that includes the relevant entity types (Agent, User, Resource, Scope, and so on), and create corresponding entities in your policy store that match the IDs used in the policy and request.

Bringing it all together, the requested data from the AI agent, on behalf of the user, is returned from the cross-domain service to the AI agent. This additional data can now be used within the context of the AI agent workload. The entire solution can be used for a chat application, such as the one described in Protect sensitive data in RAG applications with Amazon Bedrock.

Conclusion

Amazon Cognito M2M access token customization and support for passing client metadata provides you the extensibility to solve complex use cases and enables emerging ones like AI agent identity and access management. For example, passing contextual client metadata and customizing access tokens at runtime can help software as a service (SaaS) and multi-tenant service providers scale to an unlimited number of resource servers, because these can be dynamically determined at runtime. As organizations increasingly explore the use of AI agents, having a secure, scalable identity management solution becomes crucial for maintaining control and accountability. By using these new features, you can build more secure and scalable solutions with Amazon Cognito to prepare for the future of autonomous AI agent use cases.

Use the comments section to leave feedback about this post. If you have questions about this post, start a new thread on Amazon Cognito re:Post or contact AWS Support.

Abrom Douglas

Abrom Douglas III

Abrom is a Senior Solutions Architect within AWS Identity with nearly 20 years of software engineering and security experience, specializing in the identity and access management space. He loves speaking with customers about how identity and access management can provide secure outcomes that enable both business and technology initiatives. In his free time, he enjoys cheering for Arsenal FC, photography, travel, volunteering, and competing in duathlons.

Analyze media content using AWS AI services

Post Syndicated from Jack Bradham original https://aws.amazon.com/blogs/architecture/analyze-media-content-using-aws-ai-services/

Organizations managing large audio and video archives face significant challenges in extracting value from their media content. Consider a radio network with thousands of broadcast hours across multiple stations and the challenges they face to efficiently verify ad placements, identify interview segments, and analyze programming patterns.

In this post, we demonstrate how you can automatically transform unstructured media files into searchable, analyzable content. By combining Amazon Transcribe, Amazon Bedrock, Amazon QuickSight, and Amazon Q, organizations can achieve the following:

  • Process and transcribe media files upon upload
  • Identify commercials, interviews, and program segments
  • Extract insights using foundation models (FMs)
  • Create a searchable knowledge base
  • Generate rich visualizations for decision-making
  • Enable natural language queries across their media archive
  • Visualize complex information with intuitive graphics

In the following sections, we explore how these AWS services work together to help organizations unlock the full potential of their media content, whether for advertising compliance, content analysis, or discovering specific segments within thousands of hours of recordings.

Solution overview

This solution provides an event-driven media analysis pipeline that transforms how you manage and extract value from your content:

  • Streamline content management – Automatically process media files the moment they’re uploaded, saving time and reducing manual work
  • Unlock deeper insights – Generate accurate transcriptions that capture not just words, but the full context of your content—including speakers, timing, and key moments
  • Harness AI – Automatically extract meaningful insights and uncover hidden patterns in your media without extensive manual review
  • Build a searchable knowledge base – Turn scattered media files into a discoverable catalog that your entire team can use
  • Build a customizable interface – Create a customizable UI to search the catalog
  • Create powerful visualizations – Bring your insights to life with intuitive visualizations that make complex information immediately understandable

The following diagram illustrates our architecture.

Media Analysis Architecture

This event-driven architecture automatically processes and analyzes multimedia content using AWS services. The workflow consists of the following steps:

  1. A user uploads media files to an Amazon Simple Storage Service (Amazon S3) bucket. A “New Media” event triggers the first AWS Step Functions workflow. This workflow handles the initial cataloging based on values in the file name and launches the transcription process.
  2. Amazon Transcribe converts the audio into accurate, readable text. The transcribed content is securely saved to an S3 bucket for further analysis. A “Transcription Complete” event triggers the next step.
  3. A second Step Functions workflow processes the transcription. Using predefined prompts, Amazon Bedrock analyzes the transcripts to extract meaningful information. Key insights extracted from the transcript are stored in an S3 data lake.
  4. The processed results are organized systematically, structured by date (year/month/day) and tagged with relevant attributes. This organized data enables natural language queries through Amazon Q when used as a knowledge base, interactive visualizations using QuickSight, and straightforward content discovery and analysis.
  5. Amazon Athena serves as the data exploration tool to query the data lake. Athena is used as the data source in QuickSight, which turns complex data into clear, compelling visuals.

This architecture automatically transforms raw media content into searchable, analyzable data while maintaining an organized hierarchy for efficient access and analysis. The event-driven design provides automatic processing of new content as it arrives, and the combination of AWS AI services enables deep content understanding and insight extraction. Each AWS service plays a crucial role in transforming your media content:

  • Amazon Bedrock – Reviews content after transcription for insights and entity extraction:
    • Uses advanced FMs to analyze transcripts
    • Identifies commercials, interviews, and program segments
    • Extracts meaningful insights from content
  • Amazon EventBridge – Triggers actions in the cataloging workflow:
    • Monitors for new media files and completed transcriptions
    • Automatically triggers Step Functions workflows
  • AWS Lambda – Handles custom code actions needed in the workflow:
    • Facilitates interaction with Amazon Bedrock
    • Executes custom prompts on transcripts
    • Enables flexible, scalable processing
  • Amazon Q – Serves as the frontend and Retrieval Augmented Generation (RAG) engine:
    • Addresses enterprise generative AI needs by providing a turnkey solution with built-in security features like single sign-on (SSO) integration and responsible AI governance policies
    • Allows businesses to quickly deploy AI assistance while maintaining compliance, data privacy, and security standards
    • Enables natural language queries across the media archive
    • Links results to the source media files
    • Provides conversational access to content
  • Amazon QuickSight – Turns insights in beautiful visualization for better consumption:
    • Creates interactive dashboards and visualizations
    • Displays comprehensive media analytics
    • Helps track advertising, programming, and content patterns
  • Amazon S3 – Stores assets and the catalog:
    • Securely stores raw media files, transcripts, and processed data
    • Automatically triggers processing when new content is uploaded
  • AWS Step Functions – Orchestrates the entire content processing workflow:
    • Manages transcription and AI analysis steps
    • Provides robust error handling and automatic retries
  • Amazon Transcribe – Converts speech to accurate, readable text:
    • Identifies speakers and timestamps
    • Provides accurate transcriptions of audio content

Security considerations

Although this post focuses on the technical implementation of media content analysis, it’s important to acknowledge that production deployments should include comprehensive security measures:

  • Data storage security (Amazon S3):
    • Enable server-side encryption using AWS Key Management Service (AWS KMS) keys
    • Apply bucket policies restricting access to authorized principals only
    • Enable Amazon S3 Block Public Access at account and bucket levels
    • Enable versioning for data recovery
    • Implement lifecycle policies for data retention
    • Enable S3 access logging
    • Use presigned URLs for temporary access
  • Identity and Access Management (IAM):
    • Create dedicated service roles with minimum required permissions for:
      • Step Functions execution
      • Amazon Transcribe jobs
      • Amazon Bedrock API calls
      • Athena queries
    • Implement role-based access control
    • Regularly rotate credentials
    • Enable multi-factor authentication (MFA) for all users
    • Use AWS Organizations for multi-account management
  • Network security:
    • Deploy virtual private cloud (VPC) endpoints for:
      • Amazon S3
      • Athena
      • QuickSight
    • Implement network access control lists (ACLs) and security groups
    • Enable VPC Flow Logs
    • Use AWS PrivateLink where applicable
    • Configure route tables to control traffic flow
  • Data encryption:
    • Implement AWS KMS encryption for S3 objects
    • Use TLS 1.2+ for all API communications
    • Enable automatic key rotation in AWS KMS
    • Implement envelope encryption for sensitive data
  • Monitoring and detection:
    • Enable AWS CloudTrail for API activity logging
    • Configure Amazon GuardDuty for threat detection
    • Set up Amazon CloudWatch:
      • Metrics for service health
      • Alarms for security events
      • Log groups for application logs
    • Enable S3 server access logging
    • Configure VPC Flow Logs
  • Access controls:
    • Implement fine-grained access controls for:
      • Amazon Bedrock model access
      • Athena query permissions
      • QuickSight dashboard sharing
    • Conduct regular access reviews

Additionally, compliance requirements and data governance policies might impact how you implement this solution in your environment.

These security considerations are crucial but beyond the scope of this post. We recommend consulting AWS security best practices and working with your security team to implement appropriate measures for your specific use case. For more information on AWS security best practices, refer to Best Practices for Security, Identity, & Compliance.

The following sections walk you through setting up each component of the architecture to help you transform raw media into actionable insights.

Prerequisites

The following are the prerequisites to follow along this post:

Create S3 buckets

For this solution, we create three distinct buckets to support the media analytics workflow:

  • Raw media bucket for incoming files
  • Transcription outputs bucket
  • Processed insights bucket

For instructions on creating buckets, refer to Creating a general purpose bucket.

Configure EventBridge

You can enable event notifications on the raw media bucket to trigger your automated workflow through EventBridge. Establish your automation backbone by monitoring S3 bucket activities. When new media arrives or transcription completes, EventBridge will trigger the appropriate workflow, providing continuous processing. For further instructions, refer to Creating rules that react to events in Amazon EventBridge.

The following are two example triggers that can be used to filter events and trigger Step Functions workflows. The following is an example filter for new media files:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["rawinputbucket"]
    }
  }
}

The following is an example filter for new transcripts added to the data lake:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["business-data-lake"]
    },
    "object": {
      "key": [{
        "suffix": ".transcription"
      }]
    }
  }
}

Create Step Functions workflows

We design the orchestration layer with two key workflows. The first handles media intake and transcription, and the second manages AI analysis. Each workflow includes safeguards for potential failures and retry mechanisms. For further instructions, refer to Learn how to get started with Step Functions.

The following diagram shows an example of processing new media uploads for indexing and transcription.

Media Analysis - Step Function Example

The following diagram shows an example of the Step Functions workflow that analyzes the transcription.

Set up Amazon Transcribe

To create an Amazon Transcribe job, you need permissions to do so. You can implement a speech-to-text conversion with powerful features like language detection, speaker identification, and custom vocabulary support to provide accurate transcription of your media content. For further instructions, refer to How Amazon Transcribe works.

Configure Amazon Bedrock

You can power your AI analysis engine by setting up precise prompts that extract meaningful insights. Amazon Bedrock processes transcripts to identify key segments, speakers, and topics, transforming raw text into structured data. For instructions, refer to Design a prompt.

The following is a sample prompt:

You will be reviewing a radio transcription to identify advertisements and extract relevant details. Your task is to analyze the provided transcript and output the results in a specific JSON format based on a given schema.
Please follow these steps to complete the task:
1. Carefully read through the entire transcript.
2. Identify all advertisements within the transcript. Look for clear indicators such as product mentions, promotional language, or transitions from regular content to commercial content.
3. For each advertisement you identify, determine the following information:
    - Company: The name of the company being advertised
    - Start time: The timestamp in the transcript where the ad begins
    - End time: The timestamp in the transcript where the ad ends
    - Product: The specific product or service being advertised
4. Format your findings into a JSON object that follows the provided schema. Each advertisement should be a separate object within an array.
5. Ensure these fields in your response are provided for each advertisement.. All are required fields: company, starttime, endtime, product. 
6. Use precise timestamps for start and end times. If exact times are not available, make a best estimate based on the transcript's context.
7. If a particular field is unclear or not explicitly mentioned in the transcript, you may use "Unknown" as the value.
8. Only respond with json and nothing else. Do not provide comments or explain your answer. 
9. Surround the JSON response with standard ```json markers
Here's an example of how your output should be formatted:
{
"advertisements": [
        {
            "company": "TechGadgets Inc.",
            "starttime": "00:05:30",
            "endtime": "00:06:15",
            "product": "SmartHome Hub"
        },
        {
            "company": "FreshFoods Market",
            "starttime": "00:15:45",
            "endtime": "00:16:30",
            "product": "Organic Produce Delivery Service"
        }
    ]
}
Do not add any fields that are not specified in the schema, and ensure all required fields are present for each advertisement.

Create a structured data lake

We create a hierarchical data organization strategy that enables efficient access and analysis. You can use AWS Glue crawlers to automatically discover and catalog your media metadata. For instructions, refer to Using crawlers to populate the Data Catalog. Configure Athena tables to enable SQL-based querying of your media insights:

CREATE OR REPLACE VIEW "commercials_view" AS 
SELECT
  metadata.market market
, metadata.station_call station_call
, metadata.format_type format_type
, CAST(metadata.timestamp AS timestamp) timestamp
, ads.company adCompany
, ads.product adProduct
, ads.starttime
, ads.endtime
FROM
  (commercials
CROSS JOIN UNNEST(advertisements) t (ads))

Set Up Amazon Q

You can enable natural language interaction with your media archive using Amazon Q Business. Configure the knowledge base and metadata to make your content searchable and accessible through conversational queries. Use the processed insights S3 buckets to configure the knowledge base. For instructions, refer to Getting started with Amazon Q Business.

The following screenshot shows example conversations with an AI assistant.

Build QuickSight dashboards

With QuickSight, you can create visual analytics that bring your data to life. Connect to Athena views to display advertising patterns, content analysis, and performance metrics in interactive dashboards. For more information, refer to Tutorial: Create an Amazon QuickSight dashboard.

The following screenshots are a few examples of dashboards created for a fictitious radio station as part of our use case.

Validate and optimize your media analytics solution

After you implement your media analytics architecture, follow these critical steps to achieve robust performance and alignment with your organization’s needs. First, configure a comprehensive testing approach. Imagine you’re preparing to launch your media analysis solution. Your testing journey begins with accuracy validation:

  • Compare transcription outputs against original media
  • Verify AI-generated insights for precision
  • Use representative sample sets from your content library

You start by taking a recently processed radio show and comparing its transcription against the original broadcast. Your team meticulously reviews the AI-generated insights, checking if key moments like ad transitions or interview segments are correctly identified. To make sure your system works across all content types, you select diverse samples from your library—perhaps a morning talk show, an evening news segment, and a weekend sports broadcast. Next, you delve into performance benchmarking:

  • Measure processing time for different media types
  • Evaluate resource utilization across AWS services
  • Identify potential bottlenecks in the workflow

Time how long it takes to process different types of media files, from short commercial segments to lengthy program recordings. As you watch how your AWS services respond under various loads, you can monitor resource consumption patterns. This helps you identify processing bottlenecks—for instance, you might discover that certain file types take longer to transcribe or that concurrent processing needs optimization. Finally, you put yourself in your users’ shoes for a thorough user experience assessment:

  • Test natural language queries with Amazon Q
  • Validate search result relevance
  • Gather feedback from potential end-users

Team members can interact with Amazon Q, asking questions they would naturally pose when searching for specific content. For example, you can test whether searching for “interviews about climate change last week” returns relevant results. Gathering feedback from potential users—perhaps a content manager with different needs than a compliance officer—provides invaluable insights. Their real-world experiences guide your refinements and make sure the system serves its intended purpose. This comprehensive testing approach, combining structured evaluation with real-world scenarios, sets the stage for a robust and user-friendly media analysis solution. As your media analysis solution moves from initial deployment to production, optimizing its performance becomes crucial for both cost-efficiency and user satisfaction. A radio network processing thousands of hours of content weekly might find that even small improvements in transcription accuracy or processing speed can lead to significant cost savings and better content discoverability. Similarly, a marketing team analyzing ad placements across multiple stations needs precise insights to make data-driven decisions about advertising effectiveness. With these business imperatives in mind, consider the following configuration optimization strategies:

  • Transcription refinement:
    • Adjust language models for domain-specific terminology
    • Fine-tune speaker identification settings
    • Implement custom vocabularies for improved accuracy
  • AI insight generation:
    • Refine prompts for more targeted analysis
    • Experiment with different AI models
    • Align extraction parameters with business objectives
  • Scalability considerations:
    • Test workflow performance with increasing media volumes
    • Implement appropriate auto scaling configurations
    • Monitor cost-effectiveness of your architecture
  • Continuous improvement:
    • Establish regular review cycles
    • Track key performance metrics
    • Iterate on your solution based on real-world usage

We recommend starting with a pilot implementation and gradually expanding your media analytics capabilities.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this solution:

  1. Delete QuickSight resources:
    1. Delete dashboards created for media analytics.
    2. Delete the datasets connected to Athena.
    3. If no longer needed, delete the QuickSight Enterprise subscription.
  2. Delete S3 buckets:
    1. Empty and delete the raw media bucket, transcription outputs bucket, and processed insights bucket.
  3. Remove EventBridge rules:
    1. Delete the rules created for monitoring S3 bucket activities.
    2. Remove targets associated with these rules.
  4. Delete Step Functions workflows:
    1. Delete the media intake and transcription workflow.
    2. Delete the AI analysis workflow.
  5. Remove Lambda functions:
    1. Delete Lambda functions created for interaction with Amazon Bedrock.
    2. Remove associated IAM roles and policies.
  6. Clean up data lake components:
    1. Delete Athena views and tables.
    2. Remove AWS Glue crawlers and databases.
    3. Delete stored query results.
  7. Remove Amazon Q configurations:
    1. Delete knowledge bases created.
    2. Remove custom configurations.
  8. Remove Amazon Bedrock settings:
    1. Remove custom prompts.
    2. Disable access to FMs if no longer needed.
  9. Delete Amazon Transcribe settings:
    1. Remove custom vocabularies.
    2. Delete stored transcription jobs.
  10. Remove IAM resources:
    1. Delete custom IAM roles created for this solution.
    2. Remove associated IAM policies.
  11. Complete additional cleanup:
    1. Delete CloudWatch Logs groups associated with Lambda functions.
    2. Remove CloudWatch alarms or metrics created for monitoring.
    3. Delete saved queries in Athena.

Common use cases

Organizations in different sectors can use this architecture to unlock value from their audio and video content. You can adapt this solution to meet your specific needs, such as managing broadcast media, corporate communications, educational materials, and more. Let’s explore how different industries might apply this technology:

  • Media and broadcasting:
    • Track advertising compliance
    • Verify media placement accuracy
    • Analyze broadcast content at scale
  • Corporate and enterprise:
    • Convert meeting recordings into searchable knowledge bases
    • Identify key decisions and action items
    • Enhance organizational knowledge management
  • Education and training:
    • Create comprehensive, topic-based course catalogs
    • Index training materials for quick retrieval
    • Support continuous learning initiatives
  • Legal services:
    • Generate precise, timestamped transcripts
    • Develop searchable legal proceeding archives
    • Improve document review efficiency
  • Healthcare:
    • Extract critical medical insights from consultations
    • Categorize patient interaction data
    • Support clinical documentation processes
  • Government and public sector:
    • Build comprehensive public meeting archives
    • Implement automated topic categorization
    • Enhance transparency and accessibility
  • Customer service:
    • Analyze call recordings for quality improvement
    • Identify service trends and customer pain points
    • Drive continuous customer experience enhancement

This media analytics architecture demonstrates notable versatility. By using AI, organizations can transform raw audio and video content into structured, meaningful insights that drive decision-making across industries.

Conclusion

In this post, we demonstrated how to use AWS services to convert unstructured media content into actionable intelligence. By combining Amazon Transcribe, Amazon Bedrock, QuickSight, and Amazon Q, you can create a scalable, automated solution for media analysis that adapts to your organizational needs.

This solution offers the following key architectural advantages:

  • Automated media file processing at scale
  • AI-powered insight generation
  • Natural language search capabilities
  • Interactive decision-making visualizations
  • Flexible, maintainable infrastructure

Organizations can now convert content into searchable knowledge, extract insights automatically, develop data-driven content strategies, and enhance operational efficiency through automation.

As audio and video content generation continues to accelerate, the ability to efficiently process and extract value becomes increasingly critical. This architecture provides a robust foundation for current needs while remaining adaptable to future technological innovations.

We invite you to explore how this media analytics solution can address your organization’s unique challenges. Consider your specific use cases and unlock the insights waiting to be discovered in your media archives.


About the authors

Optimizing fleet operations using Amazon SageMaker AI and Amazon Bedrock

Post Syndicated from Manny Sidhu original https://aws.amazon.com/blogs/architecture/optimizing-fleet-operations-using-amazon-sagemaker-ai-and-amazon-bedrock/

Every year in the United States, distracted driving claims thousands of lives and causes immense financial damage. More than 1.6 million accidents annually are caused by cell phone use while driving, and another 1.5 million result from drowsy drivers falling asleep at the wheel. These devastating—and preventable—accidents have sparked a major push for enhanced driver safety.

This initiative is particularly crucial in the commercial fleet industry, as accidents involving a large truck are often more dangerous and can cost hundreds of thousands of dollars. This post explores an innovative solution that leverages Amazon SageMaker AI and Amazon Bedrock to revolutionize driver coaching and enhance fleet efficiency. By harnessing the power of machine learning and artificial intelligence, we demonstrate how fleet operators can transform raw dashcam footage into actionable insights, empowering real-time driver monitoring and proactive safety measures – reducing costly accidents. Our approach combines AWS Artificial Intelligence (AI) and Internet of Things (IoT) services to create a comprehensive solution that not only detects distracted driving but also continuously improves its performance over time. Through this solution, we aim to show how fleet managers can significantly reduce distracted driving incidents, improve operational efficiency, and ultimately drive down costs in their commercial vehicle operations.

The Challenge: Effectively managing multiple dashcam feeds from commercial vehicle fleet

Today’s commercial vehicles are equipped with multi-camera systems that provide comprehensive coverage: inward-facing cameras monitor driver behavior, outward-facing cameras track oncoming traffic, and side/rear cameras detect cross-traffic and potential rear-end collisions. The sheer volume of video data generated by thousands of vehicles daily creates significant management and analysis challenges. While fleet operators traditionally use this dashcam footage for reactive purposes – such as law enforcement reporting, insurance claims, and driver exoneration – many organizations are missing a significant opportunity to leverage this data. As commercial fleets accumulate more miles, they generate rich datasets that can be used to train AI models capable of facilitating proactive safety improvements.

In this post, we’ll explore how to maximize the value of dashcam footage through best practices for implementing and managing Computer Vision systems in commercial fleet operations. We’ll demonstrate how to build and deploy edge-based machine learning models that provide real-time alerts for distracted driving behaviors, while effectively collecting, processing, and analyzing footage to train these AI models. This approach transforms fleet operations from reactive incident management to proactive safety enhancement, helping organizations convert raw video data into actionable insights that reduce safety incidents and improve overall fleet operational efficiency and cost-effectiveness.

Solution overview

A Distracted Driving Incident can occur when drivers engage in unsafe behaviors such as speeding, rolling stops, harsh braking, and aggressive acceleration. Fleet managers need to understand not just what happened during these incidents, but also the driver’s state of attention – whether they were focused on the road or distracted by activities like using a cellphone, eating, drinking, or experiencing fatigue common in long-haul driving.

Our solution leverages AWS services to create an end-to-end workflow capable of detecting and mitigating distracted driving. The steps involved include:

  1. Incident capture, ingestion, and labeling
  2. Model training, optimization, and deployment
  3. Continuous testing and improvement

Solution deep dive

This solution relies on a mix of AWS IoT, AI and generative AI services to build a scalable and cost-effective solution. Let’s start by looking at high level solution architecture and build the solution step-by-step.

Incident capture, ingestion, and labeling

To start the process of ingesting videos from a driver’s dashboard camera into the cloud, we capture the dashcam’s feed using the IoT Greengrass Kinesis Video Streamer Component. The video is streamed into the AWS Cloud using Kinesis Video Streams and stored in Amazon S3 by leveraging Kinesis Firehose. The videos are then converted into individual frames, analyzed by the Amazon Bedrock Nova Pro model to determine driver distraction, and sorted by an AWS Lambda function into an S3 bucket based on the analysis results. The sorted frames will next be used to train an AI model for edge deployment to detect distracted driving.

From a security perspective, it’s good practice to encrypt data in Amazon S3 buckets using AWS Key Management Service (KMS). You can enforce this by setting up SSE-KMS as the default encryption method to automatically encrypt uploaded objects. We also recommend implementing fine-grained AWS Identity & Access Management (IAM) roles to grant scoped access to images and videos. For data in transit between the edge and the cloud, you can use AWS IoT Greengrass certificates to encrypt your data and enforce identity verification. These measures can help protect against unauthorized access.

Edge-to-cloud architecture for real-time driver monitoring using AWS IoT, Kinesis, and ML services

With this process in place, we are continually collecting data from our fleet of commercial vehicles (while keeping security in mind). This data is automatically categorized and labeled based on the analysis from our Nova Pro model, and conveniently stored in S3, enabling us to seamlessly train an AI model – a process which we will describe next.

Model training, optimization, and deployment

The following diagram illustrates the process of training and deploying a distracted driver detection model. The process runs inside of an Amazon SageMaker Pipelines Workflow, which allows for seamless orchestration of other Amazon SageMaker AI services. This workflow begins with labeled driver images stored in Amazon S3, generated from the previously described workflow. This labeled dataset – consisting of driver images labeled as “distracted” or “not distracted’ – is used to train a ResNet50 model using Amazon SageMaker Training Jobs running on a Trn1 instance for price performance. As we train, the model learns how to identify distracted drivers. Once complete, the trained model is then quantized to INT8 using SageMaker Processing Jobs, and optimized for our specific type of edge hardware using SageMaker Neo. The optimized model is then stored in the SageMaker Model Registry for version control and governance (this will be helpful later when we iterate on our model with new training data). Finally, the model is pushed to S3 where AWS IoT Greengrass can initiate a deployment to the fleet of edge devices.

Running on the edge, the model performs inference multiple times a second on frames from the inward facing dashcam. (Inference speed calculated assuming edge compute has specs comparable to a Raspberry-Pi class of device.) If the driver is found to be distracted, the system alerts the driver by means of a noise. (ex. driver was falling asleep, and alert awakens them).

End-to-end AWS architecture for distracted driver detection: from model training to edge deployment

With this process in place, we have successfully leveraged the dataset we generated in the first diagram to train, optimize, and deploy our custom model to the ‘edge’ – in this case, to each vehicle in our fleet. Our model is now alerting drivers of dangerous behavior and helping to proactively prevent collisions. But our model likely isn’t perfect – perhaps it misses a dangerous behavior that wasn’t in the training dataset, or alerts unnecessarily. To validate our model is working well and further improve it to reduce errors, we implement continuous testing and improvement procedures.

Continuous testing and improvement

We need to continue to ingest driver dashcam data and compare our edge model’s predictions with our original source of ‘ground truth’ – Nova Pro.

The system collects frames for model validation in two scenarios: when vehicle telemetry detects incidents (hard braking, crashes) or when the edge model identifies distracted driving. These frames are sent to Amazon Bedrock for a ‘fact check’ to see if the edge model performed optimally. The comparative results between Amazon Bedrock and the edge model are stored in a dedicated S3 bucket for model evaluation. When sufficient new validated data is collected, or when the model’s agreement with Amazon Bedrock falls below a threshold, Amazon EventBridge triggers the previously described SageMaker Pipelines Workflow to fine tune, optimize, and re-deploy the improved model to the edge, now powered by our newly collected ‘disagreement data’.

Edge-to-cloud feedback loop for ML model validation using AWS IoT, Bedrock, and SageMaker services

We should also perform comparative analysis of our new model against our historical models stored in the Amazon SageMaker Model Registry to validate that our latest model actually performs better than historical models, verifying we don’t see a regression. If our latest model doesn’t outperform historical models, we should not deploy it, and instead investigate if we are suffering from overfitting or bad training data. In summary, we now have a model running inside fleet vehicles capable of alerting drivers to unsafe behavior. This could effectively reduce drowsy driving accidents by keeping drivers awake and alert, while also warning drivers about unsafe decisions like eating or using a cell phone while driving. This system is also self-training and self-improving, so it will continue to get better over time. Additionally, fleet management companies could aggregate safety data and reward top drivers to further incentivize safe driving habits.

Conclusion

In this post, we’ve explored an innovative solution that leverages AWS services to revolutionize driver coaching and fleet operations. By combining the power of Amazon SageMaker and Amazon Bedrock with AWS IoT and edge computing capabilities, we’ve demonstrated how to create a comprehensive, scalable solution for monitoring and improving driver behavior in real-time. This solution addresses the challenges of managing vast amounts of dashcam footage from commercial vehicle fleets, transforming raw video data into actionable insights. By implementing an end-to-end workflow that includes incident capture, categorization, model training, deployment, and continuous improvement, fleet operators can shift from reactive incident management to proactive safety enhancement. The benefits of this approach include:

  1. Enhanced safety: Real-time detection of distracted driving behaviors allows for immediate intervention and coaching.
  2. Improved efficiency: Automated analysis of dashcam footage reduces manual review time and costs.
  3. Scalability: The solution can handle large fleets and growing datasets with ease.
  4. Continuous improvement: The system learns and adapts over time, becoming more accurate and effective.
  5. Cost-effectiveness: By leveraging edge computing and optimized models, the solution minimizes compute costs.

As the transportation industry continues to evolve, solutions like this will play a crucial role in improving road safety, reducing operational costs, and enhancing overall fleet performance. By harnessing the power of AI and cloud computing, fleet operators can create safer, more efficient driving environments that benefit not only their businesses but also society as a whole. The future of fleet operations is here, and it’s driven by intelligent, data-driven systems that turn every mile driven into an opportunity for improvement and innovation.

Learn more by exploring AWS code samples to build hands-on SageMaker expertise. See the service in action through practical examples that demonstrate how to optimize model training and deployment across various use cases. Understand the financial advantages by conducting a cloud economics TCO analysis comparing traditional infrastructure against SageMaker’s managed services. This exercise reveals how SageMaker alleviates hidden costs while accelerating your ML development cycle.

Ready to take the next step? Connect with your AWS Solutions Architect to arrange a SageMaker AI Immersion Day tailored to your team’s specific challenges. These expert-led sessions provide personalized guidance that will help you implement SageMaker effectively within your organization’s unique context. For deeper dive into other relevant services Amazon Kinesis Video Streams, AWS IoT Greengrass, Amazon Bedrock


About the authors

AWS Weekly Roundup: Claude 4 in Amazon Bedrock, EKS Dashboard, community events, and more (May 26, 2025)

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-4-in-amazon-bedrock-eks-dashboard-community-events-and-more-may-26-2025/

As the tech community we continue to have many opportunities to learn and network with other like-minded folks. This past week AWS customers attended the AWS Summit Dubai for an action-packed day featuring live demos, hands-on experiences with cutting-edge AI/ML tools, and more. Right here in South Africa I attended the Data & AI Community in Durban for a day of inspiration and learning from the community. In India, the AWS Community Day Bengaluru brought together hundreds of passionate tech enthusiasts for a day of learning and networking.

Last week’s launches
Here are the launches that got my attention:

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS? page.

Additional updates
Here are some additional projects, blog posts, and news items that you might find interesting:

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

  • AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Tel Aviv (May 28), Singapore (May 29), Stockholm (June 4), Sydney (June 4–5), Washington (June 10-11), and Madrid (June 11).
  • AWS re:Inforce – Mark your calendars for AWS re:Inforce (June 16–18) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity.
  • AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Milwaukee, USA (June 5), and Nairobi, Kenya (June 14).

That’s all for this week. Check back next Monday for another Weekly Roundup!

Veliswa.

Introducing Claude 4 in Amazon Bedrock, the most powerful models for coding from Anthropic

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/claude-opus-4-anthropics-most-powerful-model-for-coding-is-now-in-amazon-bedrock/

Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Both models are now generally available in Amazon Bedrock, giving developers immediate access to both the model’s advanced reasoning and agentic capabilities.

Amazon Bedrock expands your AI choices with Anthropic’s most advanced models, giving you the freedom to build transformative applications with enterprise-grade security and responsible AI controls. Both models extend what’s possible with AI systems by improving task planning, tool use, and agent steerability.

With Opus 4’s advanced intelligence, you can build agents that handle long-running, high-context tasks like refactoring large codebases, synthesizing research, or coordinating cross-functional enterprise operations. Sonnet 4 is optimized for efficiency at scale, making it a strong fit as a subagent or for high-volume tasks like code reviews, bug fixes, and production-grade content generation.

When building with generative AI, many developers work on long-horizon tasks. These workflows require deep, sustained reasoning, often involving multistep processes, planning across large contexts, and synthesizing diverse inputs over extended timeframes. Good examples of these workflows are developer AI agents that help you to refactor or transform large projects. Existing models may respond quickly and fluently, but maintaining coherence and context over time—especially in areas like coding, research, or enterprise workflows—can still be challenging.

Claude Opus 4
Claude Opus 4 is the most advanced model to date from Anthropic, designed for building sophisticated AI agents that can reason, plan, and execute complex tasks with minimal oversight. Anthropic benchmarks show it is the best coding model available on the market today. It excels in software development scenarios where extended context, deep reasoning, and adaptive execution are critical. Developers can use Opus 4 to write and refactor code across entire projects, manage full-stack architectures, or design agentic systems that break down high-level goals into executable steps. It demonstrates strong performance on coding and agent-focused benchmarks like SWE-bench and TAU-bench, making it a natural choice for building agents that handle multistep development workflows. For example, Opus 4 can analyze technical documentation, plan a software implementation, write the required code, and iteratively refine it—while tracking requirements and architectural context throughout the process.

Claude Sonnet 4
Claude Sonnet 4 complements Opus 4 by balancing performance, responsiveness, and cost, making it well-suited for high-volume production workloads. It’s optimized for everyday development tasks with enhanced performance, such as powering code reviews, implementing bug fixes, and new feature development with immediate feedback loops. It can also power production-ready AI assistants for near real-time applications. Sonnet 4 is a drop-in replacement from Claude Sonnet 3.7. In multi-agent systems, Sonnet 4 performs well as a task-specific subagent—handling responsibilities like targeted code reviews, search and retrieval, or isolated feature development within a broader pipeline. You can also use Sonnet 4 to manage continuous integration and delivery (CI/CD) pipelines, perform bug triage, or integrate APIs, all while maintaining high throughput and developer-aligned output.

Opus 4 and Sonnet 4 are hybrid reasoning models offering two modes: near-instant responses and extended thinking for deeper reasoning. You can choose near-instant responses for interactive applications, or enable extended thinking when a request benefits from deeper analysis and planning. Thinking is especially useful for long-context reasoning tasks in areas like software engineering, math, or scientific research. By configuring the model’s thinking budget—for example, by setting a maximum token count—you can tune the tradeoff between latency and answer depth to fit your workload.

How to get started
To see Opus 4 or Sonnet 4 in action, enable the new model in your AWS account. Then, you can start coding using the Bedrock Converse API with model IDanthropic.claude-opus-4-20250514-v1:0 for Opus 4 and anthropic.claude-sonnet-4-20250514-v1:0 for Sonnet 4. We recommend using the Converse API, because it provides a consistent API that works with all Amazon Bedrock models that support messages. This means you can write code one time and use it with different models.

For example, let’s imagine I write an agent to review code before merging changes in a code repository. I write the following code that uses the Bedrock Converse API to send a system and user prompts. Then, the agent consumes the streamed result.

private let modelId = "us.anthropic.claude-sonnet-4-20250514-v1:0"

// Define the system prompt that instructs Claude how to respond
let systemPrompt = """
You are a senior iOS developer with deep expertise in Swift, especially Swift 6 concurrency. Your job is to perform a code review focused on identifying concurrency-related edge cases, potential race conditions, and misuse of Swift concurrency primitives such as Task, TaskGroup, Sendable, @MainActor, and @preconcurrency.

You should review the code carefully and flag any patterns or logic that may cause unexpected behavior in concurrent environments, such as accessing shared mutable state without proper isolation, incorrect actor usage, or non-Sendable types crossing concurrency boundaries.

Explain your reasoning in precise technical terms, and provide recommendations to improve safety, predictability, and correctness. When appropriate, suggest concrete code changes or refactorings using idiomatic Swift 6
"""
let system: BedrockRuntimeClientTypes.SystemContentBlock = .text(systemPrompt)

// Create the user message with text prompt and image
let userPrompt = """
Can you review the following Swift code for concurrency issues? Let me know what could go wrong and how to fix it.
"""
let prompt: BedrockRuntimeClientTypes.ContentBlock = .text(userPrompt)

// Create the user message with both text and image content
let userMessage = BedrockRuntimeClientTypes.Message(
    content: [prompt],
    role: .user
)

// Initialize the messages array with the user message
var messages: [BedrockRuntimeClientTypes.Message] = []
messages.append(userMessage)

// Configure the inference parameters
let inferenceConfig: BedrockRuntimeClientTypes.InferenceConfiguration = .init(maxTokens: 4096, temperature: 0.0)

// Create the input for the Converse API with streaming
let input = ConverseStreamInput(inferenceConfig: inferenceConfig, messages: messages, modelId: modelId, system: [system])

// Make the streaming request
do {
    // Process the stream
    let response = try await bedrockClient.converseStream(input: input)

    // Iterate through the stream events
    for try await event in stream {
        switch event {
        case .messagestart:
            print("AI-assistant started to stream"")

        case let .contentblockdelta(deltaEvent):
            // Handle text content as it arrives
            if case let .text(text) = deltaEvent.delta {
                self.streamedResponse + = text
                print(text, termination: "")
            }

        case .messagestop:
            print("\n\nStream ended")
            // Create a complete assistant message from the streamed response
            let assistantMessage = BedrockRuntimeClientTypes.Message(
                content: [.text(self.streamedResponse)],
                role: .assistant
            )
            messages.append(assistantMessage)

        default:
            break
        }
    }

To help you get started, my colleague Dennis maintains a broad range of code examples for multiple use cases and a variety of programming languages.

Available today in Amazon Bedrock
This release gives developers immediate access in Amazon Bedrock, a fully managed, serverless service, to the next generation of Claude models developed by Anthropic. Whether you’re already building with Claude in Amazon Bedrock or just getting started, this seamless access makes it faster to experiment, prototype, and scale with cutting-edge foundation models—without managing infrastructure or complex integrations.

Claude Opus 4 is available in the following AWS Regions in North America: US East (Ohio, N. Virginia) and US West (Oregon). Claude Sonnet 4 is available not only in AWS Regions in North America but also in APAC, and Europe: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Hyderabad, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), and Europe (Spain). You can access the two models through cross-Region inference. Cross-Region inference helps to automatically select the optimal AWS Region within your geography to process your inference request.

Opus 4 tackles your most challenging development tasks, while Sonnet 4 excels at routine work with its optimal balance of speed and capability.

Learn more about the pricing and how to use these new models in Amazon Bedrock today!

— seb

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Post Syndicated from Nita Shah original https://aws.amazon.com/blogs/big-data/empower-financial-analytics-by-creating-structured-knowledge-bases-using-amazon-bedrock-and-amazon-redshift/

Traditionally, financial data analysis could require deep SQL expertise and database knowledge. Now with Amazon Bedrock Knowledge Bases integration with structured data, you can use simple, natural language prompts to query complex financial datasets. By combining the AI capabilities of Amazon Bedrock with an Amazon Redshift data warehouse, individuals with varied levels of technical expertise can quickly generate valuable insights, making sure that data-driven decision-making is no longer limited to those with specialized programming skills.

With the support for structured data retrieval using Amazon Bedrock Knowledge Bases, you can now use natural language querying to retrieve structured data from your data sources, such as Amazon Redshift. This enables applications to seamlessly integrate natural language processing capabilities on structured data through simple API calls. Developers can rapidly implement sophisticated data querying features without complex coding—just connect to the API endpoints and let users explore financial data using plain English. From customer portals to internal dashboards and mobile apps, this API-driven approach makes enterprise-grade data analysis accessible to everyone in your organization. Using structured data from a Redshift data warehouse, you can efficiently and quickly build generative AI applications for tasks such as text generation, sentiment analysis, or data translation.

In this post, we showcase how financial planners, advisors, or bankers can now ask questions in natural language, such as, “Give me the name of the customer with the highest number of accounts?” or “Give me details of all accounts for a specific customer.” These prompts will receive precise data from the customer databases for accounts, investments, loans, and transactions. Amazon Bedrock Knowledge Bases automatically translates these natural language queries into optimized SQL statements, thereby accelerating time to insight, enabling faster discoveries and efficient decision-making.

Solution overview

To illustrate the new Amazon Bedrock Knowledge Bases integration with structured data in Amazon Redshift, we will build a conversational AI-powered assistant for financial assistance that is designed to help answer financial inquiries, like “Who has the most accounts?” or “Give details of the customer with the highest loan amount.”

We will build a solution using sample financial datasets and set up Amazon Redshift as the knowledge base. Users and applications will be able to access this information using natural language prompts.

The following diagram provides an overview of the solution.

For building and running this solution, the steps include:

  1. Load sample financial datasets.
  2. Enable Amazon Bedrock large language model (LLM) access for Amazon Nova Pro.
  3. Create an Amazon Bedrock knowledge base referencing structured data in Amazon Redshift.
  4. Ask queries and get responses in natural language.

To implement the solution, we use a sample financial dataset that is for demonstration purposes only. The same implementation approach can be adapted to your specific datasets and use cases.

Download the SQL script to run the implementation steps in Amazon Redshift Query Editor V2. If you’re using another SQL editor, you can copy and paste the SQL queries either from this post or from the downloaded notebook.

Prerequisites

Make sure your meet the following prerequisites:

  1. Have an AWS account.
  2. Create an Amazon Redshift Serverless workgroup or provisioned cluster. For setup instructions, see Creating a workgroup with a namespace or Create a sample Amazon Redshift database, respectively. The Amazon Bedrock integration feature is supported in both Amazon Redshift provisioned and serverless.
  3. Create an AWS Identity and Access Management (IAM) role. For instructions, see Creating or updating an IAM role for Amazon Redshift ML integration with Amazon Bedrock.
  4. Associate the IAM role to a Redshift instance.
  5. Set up the required permissions for Amazon Bedrock Knowledge Bases to connect with Amazon Redshift.

Load sample financial data

To load the finance datasets to Amazon Redshift, complete the following steps:

  1. Open the Amazon Redshift Query Editor V2 or another SQL editor of your choice and connect to the Redshift database.
  2. Run the following SQL to create the finance data tables and load sample data:
    -- Create table
    CREATE TABLE accounts (
        id integer ,
        account_id integer PRIMARY KEY,
        customer_id integer,
        account_type character varying(256),
        opening_date date,
        balance bigint,
        currency character varying(256)
    );
    
    CREATE TABLE customer (
        id integer,
        customer_id integer PRIMARY KEY ,
        name character varying(256) ,
        age integer,
        gender character varying(256) ,
        address character varying(256) ,
        phone character varying(256) ,
        email character varying(256)
    );
    
    CREATE TABLE investments (
        id integer ,
        investment_id integer PRIMARY KEY,
        customer_id integer ,
        investment_type character varying(256) ,
        investment_name character varying(256) ,
        purchase_date date ,
        purchase_price bigint ,
        quantity integer 
    );
    
    
    CREATE TABLE loans (
        id integer ,
        loan_id integer PRIMARY KEY,
        customer_id integer ,
        loan_type character varying(256) ,
        loan_amount bigint ,
        interest_rate integer ,
        start_date date ,
        end_date date 
    );
    
    CREATE TABLE orders (
        id integer ,
        order_id integer PRIMARY KEY,
        customer_id integer ,
        order_type character varying(256) ,
        order_date date ,
        investment_id integer ,
        quantity integer ,
        price integer 
    );
    
    CREATE TABLE transactions (
        id integer ,
        transaction_id integer PRIMARY KEY ,
        account_id integer REFERENCES accounts(account_id),
        transaction_type character varying(256) ,
        transaction_date date ,
        amount integer ,
        description character varying(256) 
    );

  3. Download the sample financial dataset to your local storage and unzip the zipped folder.
  4. Create an Amazon Simple Storage Service (Amazon S3) bucket with a unique name. For instructions, refer to Creating a general purpose bucket.
  5. Upload the downloaded files into your newly created S3 bucket.
  6. Using the following COPY command statements, load the datasets from Amazon S3 into the new tables you created in Amazon Redshift. Replace <<your_s3_bucket>> with the name of your S3 bucket and <<your_region>> with your AWS Region.
    -- Load sample data
    COPY accounts FROM 's3://<<your_s3_bucket>>/accounts.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
    
    COPY customer FROM 's3://<<your_s3_bucket>>/customer.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
    COPY investments FROM 's3://<<your_s3_bucket>>/investments.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
    COPY loans FROM 's3://<<your_s3_bucket>>/loans.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
    COPY orders FROM 's3://<<your_s3_bucket>>/orders.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
    COPY transactions FROM 's3://<<your_s3_bucket>>/transactions.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';

Enable LLM access

With Amazon Bedrock, you can access state-of-the-art AI models from providers like Anthropic, AI21 Labs, Stability AI, and Amazon’s own foundation models (FMs). These include Anthropic’s Claude 2, which excels at complex reasoning and content generation; Jurassic-2 from AI21 Labs, known for its multilingual capabilities; Stable Diffusion from Stability AI for image generation; and Amazon Titan models for various text and embedding tasks. For this demo, we use Amazon Bedrock to access the Amazon Nova FMs. Specifically, we use the Amazon Nova Pro model, which is a highly capable multimodal model designed for a wide range of tasks like video summarization, Q&A, mathematical reasoning, software development, and AI agents, including high speed and accuracy for text summarization tasks.

Make sure you have the required IAM permissions to enable access to available Amazon Bedrock Nova FMs. Then complete the following steps to enable model access in Amazon Bedrock:

  1. On the Amazon Bedrock console, in the navigation pane, choose Model access.
  2. Choose Enable specific models.
  3. Search for Amazon Nova models, select Nova Pro, and choose Next.
  4. Review the selection and choose Submit.

Create an Amazon Bedrock knowledge base referencing structured data in Amazon Redshift

Amazon Bedrock Knowledge Bases uses Amazon Redshift as the query engine to query your data. It reads metadata from your structured data store to generate SQL queries. There are different supported authentication methods to create the Amazon Bedrock knowledge base using Amazon Redshift. For more information, refer to the Set up query engine for your structured data store in Amazon Bedrock Knowledge Bases.

For this post, we create an Amazon Bedrock knowledge base for the Redshift database and sync the data using IAM authentication.

If you’re creating an Amazon Bedrock knowledge base through the AWS Management Console, you can skip the service role setup mentioned in the previous section. It automatically creates one with the necessary permissions for Amazon Bedrock Knowledge Bases to retrieve data from your new knowledge base and generate SQL queries for structured data stores.

When creating an Amazon Bedrock knowledge base using an API, you must attach IAM policies that grant permissions to create and manage knowledge bases with connected data stores. Refer to Prerequisites for creating an Amazon Bedrock Knowledge Base with a structured data store for instructions.

Complete the following steps to create an Amazon Bedrock knowledge base using structured data:

  1. On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
  2. Choose Create and choose Knowledge Base with structure data store from the dropdown menu.
  3. Provide the following details for your knowledge base:
    1. Enter a name and optional description.
    2. Select Amazon Redshift as the query engine.
    3. Select Create and use a new service role for resource management.
    4. Make note of this newly created IAM role.
    5. Choose Next to proceed to the next part of the setup process.
    6. Configure the query engine:
      • Select Redshift Serverless (Amazon Redshift provisioned is also supported).
      • Choose your Redshift workgroup.
      • Use the IAM role created earlier.
      • Under Default storage metadata, select Amazon Redshift databases and for Database, choose dev.
      • You can customize settings by adding specific contexts to enhance the accuracy of the results.
      • Choose Next.
    7. Complete creating your knowledge base.
    8. Record the generated service role details.
    9. Next, grant appropriate access to the service role for Amazon Bedrock Knowledge Bases through the Amazon Redshift Query Editor V2. Update <your Service Role name> in the following statements with your service role, and update the value for <your schema>.
      CREATE USER "IAMR:<your Service Role name>" WITH PASSWORD DISABLE;
      SELECT * FROM PG_USER; -- To verify that the user is created.
      GRANT SELECT ON ALL TABLES IN SCHEMA <your schema> TO "IAMR:<your Service Role name>";
      --You can also Restricting access to certain tables for finer-grained control on the tables that can be accessed as shown below
      GRANT SELECT ON TABLE customer to "IAMR:<your Service Role name>";
      GRANT SELECT ON TABLE loan to "IAMR:<your Service Role name>";

Now you can update the knowledge base with the Redshift database.

  1. On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
  2. Open the knowledge base you created.
  3. Select the dev Redshift database and choose Sync.

It may take a few minutes for the status to display as COMPLETE.

Ask queries and get responses in natural language

You can set up your application to query the knowledge base or attach the knowledge base to an agent by deploying your knowledge base for your AI application. For this demo, we use a native testing interface on the Amazon Bedrock Knowledge Bases console.

To ask questions in natural language on the knowledge base for Redshift data, complete the following steps:

  1. On the Amazon Bedrock console, open the details page for your knowledge base.
  2. Choose Test.
  3. Choose your category (Amazon), model (Nova Pro), and inference settings (On demand), and choose Apply.
  4. In the right pane of the console, test the knowledge base setup with Amazon Redshift by asking a few simple questions in natural language, such as “How many tables do I have in the database?” or “Give me list of all tables in the database.

The following screenshot shows our results.

  1. To view the generated query from your Amazon Redshift based knowledge base, choose Show details next to the response.
  2. Next, ask questions related to the financial datasets loaded in Amazon Redshift using natural language prompts, such as, “Give me the name of the customer with the highest number of accounts” or “Give the details of all accounts for customer Deanna McCoy.

The following screenshot shows the responses in natural language.

Using natural language queries in Amazon Bedrock, you were able to retrieve responses from the structured financial data stored in Amazon Redshift.

Considerations

In this section, we discuss some important considerations when using this solution.

Security and compliance

When integrating Amazon Bedrock with Amazon Redshift, implementing robust security measures is crucial. To protect your systems and data, implement essential safeguards including restricted database roles, read-only database instances, and proper input validation. These measures help prevent unauthorized access and potential system vulnerabilities. For more information, see Allow your Amazon Bedrock Knowledge Bases service role to access your data store.

Cost

You incur a cost for converting natural language to text based on SQL. To learn more, refer to Amazon Bedrock pricing.

Use custom contexts

To improve query accuracy, you can enhance SQL generation by providing custom context in two key ways. First, specify which tables to include or exclude, focusing the model on relevant data structures. Second, supply curated queries as examples, demonstrating the types of SQL queries you expect. These curated queries serve as valuable reference points, guiding the model to generate more accurate and relevant SQL outputs tailored to your specific needs. For more information, refer to Create a knowledge base by connecting to a structured data store.

For different workgroups, you can create separate knowledge bases for each group, with access only to their specific tables. Control data access by setting up role-based permissions in Amazon Redshift, verifying each role can only view and query authorized tables.

Clean up

To avoid incurring future charges, delete the Redshift Serverless instance or provisioned data warehouse created as part of the prerequisite steps.

Conclusion

Generative AI applications provide significant advantages in structured data management and analysis. The key benefits include:

  • Using natural language processing – This makes data warehouses more accessible and user-friendly
  • Enhancing customer experience – By providing more intuitive data interactions, it boosts overall customer satisfaction and engagement
  • Simplifying data warehouse navigation – Users can understand and explore data warehouse content through natural language interactions, improving ease of use
  • Improving operational efficiency – By automating routine tasks, it allows human resources to focus on more complex and strategic activities

In this post, we showed how the natural language querying capabilities of Amazon Bedrock Knowledge Bases when integrated with Amazon Redshift enables rapid solution development. This is particularly valuable for the finance industry, where financial planners, advisors, or bankers face challenges in accessing and analyzing large volumes of financial data in a secured and performant manner.

By enabling natural language interactions, you can bypass the traditional barriers of understanding database structures and SQL queries, and quickly access insights and provide real-time support. This streamlined approach accelerates decision-making and drives innovation by making complex data analysis accessible to non-technical users.

For additional details on Amazon Bedrock and Amazon Redshift integration, refer to Amazon Redshift ML integration with Amazon Bedrock.


About the authors

Nita Shah is an Analytics Specialist Solutions Architect at AWS based out of New York. She has been building data warehouse solutions for over 20 years and specializes in Amazon Redshift. She is focused on helping customers design and build enterprise-scale well-architected analytics and decision support platforms.

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Strategic customers architect their data workloads on AWS. With a background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based in Florida and enjoys traveling, reading and playing tennis.

Jonathan Katz is a Principal Product Manager – Technical on the Amazon Redshift team and is based in New York. He is a Core Team member of the open source PostgreSQL project and an active open source contributor, including PostgreSQL and the pgvector project.

AWS Weekly Roundup: Strands Agents, AWS Transform, Amazon Bedrock Guardrails, AWS CodeBuild, and more (May 19, 2025)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-strands-agents-aws-transform-amazon-bedrock-guardrails-aws-codebuild-and-more-may-19-2025/

Many events are taking place in this period! Last week I was at the AI Week in Italy. This week I’ll be in Zurich for the AWS Community Day – Switzerland. On May 22, you can join us remotely for AWS Cloud Infrastructure Day to learn about cutting-edge advances across compute, AI/ML, storage, networking, serverless technologies, and global infrastructure. Look for events near you for an opportunity to share your knowledge and learn from others.

What got me particularly excited last Friday was the introduction of Strands Agents, an open source SDK that you can use to build and run AI agents in just a few lines of code. It can scale from simple to complex use cases, including local development and production deployment. By default, it uses Amazon Bedrock as model provider, but many others are supported, including Ollama (to run models locally), Anthropic, Llama API, and LiteLLM (to provide a unified interface for other providers such as Mistral). With Strands, you can use any Python function as a tool for your agent with the @tool decorator. Strands provides many example tools for manipulating files, making API requests, and interacting with AWS APIs. You can also choose from thousands of published Model Context Protocol (MCP) servers, including this suite of specialized MCP servers that help you get the most out of AWS. Multiple teams at AWS already use Strands for their AI agents in production, including Amazon Q Developer, AWS Glue, and VPC Reachability Analyzer. Read it all in Clare’s post.

Strands Agents SDK agentic loop

Last week’s launches
Here are the other launches that got my attention:

Additional updates
Here are some additional projects, blog posts, and news items that you might find interesting:

  • Securing Amazon S3 presigned URLs for serverless applications – Focusing on the security ramifications of using Amazon S3 presigned URLs, explaining mitigation steps that developers can take to improve the security of their systems using S3 presigned URLs, and walking through an AWS Lambda function that adheres to the provided recommendations.
    Architectural diagram.
  • Running GenAI Inference with AWS Graviton and Arcee AI Models – While large language models (LLMs) are capable of a wide variety of tasks, they require compute resources to support hundreds of billions and sometimes trillions of parameters. Small language models (SLMs) in contrast typically have a range of 3 to 15 billion parameters and can provide responses more efficiently. In this post, we share how to optimize SLM inference workloads using AWS Graviton based instances.
    AWS Graviton processors.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

  • AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Dubai (May 21), Tel Aviv (May 28), Singapore (May 29), Stockholm (June 4), Sydney (June 4–5), Washington (June 10-11), and Madrid (June 11)
  • AWS Cloud Infrastructure Day – On May 22, discover the latest innovations in AWS Cloud infrastructure technologies at this exclusive technical event.
  • AWS re:Inforce – Mark your calendars for AWS re:Inforce (June 16–18) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity.
  • AWS Partners Events – You’ll find a variety of AWS Partner events that will inspire and educate you, whether you’re just getting started on your cloud journey or you’re looking to solve new business challenges.
  • AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Zurich, Switzerland (May 22), Bengaluru, India (May 23), Yerevan, Armenia (May 24), Milwaukee, USA (June 5), and Nairobi, Kenya (June 14)

That’s all for this week. Check back next Monday for another Weekly Roundup!

Danilo

How Smartsheet boosts developer productivity with Amazon Bedrock and Roo Code

Post Syndicated from Rony Blum original https://aws.amazon.com/blogs/architecture/how-smartsheet-boosts-developer-productivity-with-amazon-bedrock-and-roo-code/

This post was co-written with JB Brown from Smartsheet.

Organizations are often seeking ways to enhance developer productivity while maintaining cost-efficiency. AI-powered coding assistants are effective solutions to accelerate development cycles, but implementing these solutions at scale while managing costs presents unique challenges. Roo Code, an AI-powered autonomous coding agent located directly in developers’ editors, represents the latest evolution in AI-assisted development. It helps developers code faster and smarter by offering capabilities ranging from code generation and refactoring to documentation writing and code base analysis through natural language interaction. This post explores how Smartsheet successfully deployed Roo Code with Amazon Bedrock and Anthropic’s Claude, achieving significant improvements in developer efficiency while optimizing costs through innovative caching strategies.

The Amazon Bedrock prompt caching capability, which became generally available in April 2025, represents a significant advancement in optimizing AI model interactions. With this feature, organizations can cache frequently used prompts across multiple API calls, reducing costs by up to 90% and lowing latency by up to 85%. The technology is particularly valuable for applications that repeatedly use similar context, such as coding assistants that need to maintain context about code files. Prompt caching supports multiple foundation models (FMs), including Anthropic’s Claude 3.5 Haiku and Anthropic’s Claude 3.7 Sonnet, as well as the Amazon Nova family of models. The cached context remains available for up to 5 minutes after each access, with each cache hit resetting this countdown. For organizations implementing AI-assisted development workflows, they can use this capability to balance performance with cost-efficiency.

Benefits of Amazon Bedrock prompt caching

Smartsheet is a leading cloud-based enterprise work management service, enabling millions of users worldwide to plan, manage, track, automate, and report on work at scale. Their engineering team identified an opportunity to enhance developer productivity by implementing AI-assisted coding capabilities across their organization. The solution needed to scale efficiently while maintaining reasonable costs, leading them to choose Roo Code with Amazon Bedrock with Anthropic’s Claude as their FM provider.

“Our engineering team has achieved a 60% reduction in operational costs and 20% improvement in response latency when using Roo Code with Amazon Bedrock prompt caching. These improvements directly translate to enhanced developer productivity across our organization,” says JB Brown, VP of Engineering at Smartsheet.

This fact is underscored by the team’s ability to rapidly generate comprehensive code documentation and architecture diagrams in a mere four hours, a task that previously consumed 2 weeks of manual effort. This newfound efficiency extends to critical areas like AWS cost management, where a vital analysis tool was built in just 30 minutes, and is projected to save Smartsheet $450,000 annually. Furthermore, the speed and accessibility of information have been significantly amplified, as evidenced by the 6–8 hours of senior engineer time saved by quickly explaining complex Terraform/Terragrunt infrastructure. These examples highlight how Amazon Bedrock prompt caching is not just about incremental gains, but about unlocking substantial time and resource savings, empowering our developers to focus on innovation rather than time-consuming manual processes.

Solution overview

When implementing AI-assisted coding at scale, Smartsheet faced several key challenges around managing costs associated with large language model (LLM) API calls, reducing latency for developer interactions and handling repetitive elements in prompts efficiently. The development team recognized that a significant portion of the prompts sent to Amazon Bedrock were repeated elements: the system prompt and the continuously building message history, presenting an opportunity for optimization.

To address these challenges, Smartsheet implemented a comprehensive solution centered around Roo Code integration with Amazon Bedrock and prompt caching. Smartsheet deployed Roo Code, powered by Amazon Bedrock and Anthropic’s Claude 3.7, across the organization and integrated seamlessly into existing development workflows. This provided developers with immediate access to AI assistance for code writing, reviewing, and debugging tasks.

The team then contributed a sophisticated Amazon Bedrock prompt caching system to Roo Code that identifies and stores common prompt elements, significantly reducing redundant data transmission to Bedrock. This caching layer proved crucial in optimizing both cost and performance metrics. The caching decision flow incorporates two key checks: first, it verifies if the selected model supports prompt caching, then it makes sure the system prompt meets a minimum token threshold to make caching worthwhile.

The solution integrates Roo Code directly into developers’ integrated development environments (IDE), using the Amazon Bedrock prompt caching capabilities. The architecture separates content into static elements (like system instructions and code context) and dynamic elements (like user queries). When a developer makes a request, static content is automatically marked with cache checkpoints and stored for 5 minutes, with each access refreshing this window.

For subsequent queries, the system checks for matching cached content before processing. When matches are found, it bypasses recomputation of these sections, significantly reducing both response time and costs.

The following diagram shows the integration points between Roo Code and Amazon Bedrock.

Technical workflow diagram showing data flow between user query, cache decision logic, and Amazon Bedrock with response handling

The following screenshot highlights the model provider settings and connection parameters required for implementation.

Amazon Bedrock configuration interface showing AWS authentication, region settings, and Claude model with caching enabled

Optimizing performance with prompt caching

When analyzing code files, much of the context—such as file contents and environment details—remains consistent across multiple queries. By caching this content, subsequent requests can reuse it from the cache, significantly reducing cost and latency.

In our example, we first asked Roo Code to analyze a Python file. We then sent multiple follow-up queries about different aspects of the code. The first query explained the overall structure, followed by questions about specific classes, unused modules, and potential improvements.

The usage section of the responses showed significant improvements through caching. During the first query, the model processed the input and wrote the cached content. For subsequent queries, 90% of the input was served from the cache. Because cache reads are 90% cheaper than processing new input tokens, this resulted in an 83% cost reduction for follow-up queries.

Here’s how the caching worked across queries:

  • First query – The model processed the entire file content and environment details, writing them to cache
  • Subsequent queries – The model loaded the cached content, only processing the new question text

To illustrate this, let’s look at the usage statistics from a series of queries:

Query 1: "Explain my @/src/models/product.py code"
{
    "inputTokens": 1051,
    "outputTokens": 902,
    "totalTokens": 19374,
    "cacheReadInputTokenCount": 0,
    "cacheWriteInputTokenCount": 12421
}

Query 2: "Describe what is ProductUpdate class used for"
{
    "inputTokens": 1078,
    "outputTokens": 694,
    "totalTokens": 21664,
    "cacheReadInputTokenCount": 19892,
    "cacheWriteInputTokenCount": 0
}

Query 3: "Which module is not in use?"
{
    "inputTokens": 4,
    "outputTokens": 774,
    "totalTokens": 22753,
    "cacheReadInputTokenCount": 19892,
    "cacheWriteInputTokenCount": 2083
}

Query 4: "Any improvement to implement to my code making it more efficient?"
{
    "inputTokens": 1097,
    "outputTokens": 1270,
    "totalTokens": 24342,
    "cacheReadInputTokenCount": 21975,
    "cacheWriteInputTokenCount": 0
} 

We can see from the results that the first query didn’t return cacheReadInputTokenCount but wrote to the cache the whole input. For subsequent queries, most of the input was read from the cache, except for the specific question.

The results clearly demonstrate the benefits of prompt caching:

  • 70% of total input was served from cache across queries
  • 90% of input for follow-up queries came from cache
  • Overall input costs reduced by 60%
  • Follow-up query costs reduced by 83%

This approach is particularly effective for development workflows where developers frequently query the same code base with different questions. The cached content provides necessary context for each query while significantly reducing processing overhead.

The following screenshot displays Roo Code’s interface, showing detailed metrics including token usage, cache hit rates, and associated API costs. The interface presents the cost savings achieved through caching and provides comprehensive usage analytics.

IDE showing Roo Code analysis with metrics, API costs, and detailed Python code explanation

Results and future innovation

The following line graph demonstrates the cost reduction trend over multiple queries runs, having the first query uncached, and the subsequent input using significant query responses from the cache. The y-axis shows costs in dollars, and the x-axis shows the cost per query. The graph shows a clear downward trend, with costs decreasing by 60% from the baseline.

Dual-axis graph displaying per-query and cumulative costs, demonstrating cost savings through caching over 4 queries

The implementation delivered remarkable improvements across key metrics. The prompt caching system achieved a 60% reduction in operational costs while simultaneously improving response latency by 20%.

The choice of Amazon Bedrock prompt caching proved particularly effective for Smartsheet’s coding assistance workflows. The 5-minute cache Time to Live (TTL) aligns perfectly with the natural rhythm of developer interactions with Roo Code, where multiple related queries typically occur within short timeframes. During active coding sessions, developers engage in rapid back-and-forth exchanges with the AI assistant—reviewing code, asking follow-up questions, and requesting modifications—while maintaining context through the cache. This agentic workflow, where each interaction builds upon previous context, takes full advantage of the prompt caching mechanism—each query refreshes the cache timer while preserving valuable context about the code being discussed.

Best practices and lessons learned

Through their implementation journey, Smartsheet developed several key best practices for organizations looking to implement similar solutions. Early implementation of prompt caching proved crucial, allowing the team to analyze prompt patterns and design efficient caching strategies from the start. The team found that focusing on developer experience through response time optimization and seamless tool integration was essential for successful adoption.

Continuous monitoring and analysis of usage patterns became a cornerstone of their optimization strategy. By tracking key metrics like response times and costs, the team could identify opportunities for further optimization and regularly adjust their caching strategies to maintain optimal performance.

Looking ahead

Smartsheet continues to innovate with Amazon Bedrock and Roo Code, exploring new ways to enhance developer productivity. Their engineering team is investigating advanced caching strategies and evaluating new FMs as they become available on Amazon Bedrock. The success of this implementation has established a strong foundation for future AI-assisted development initiatives.

Conclusion

Smartsheet’s implementation of Roo Code with Amazon Bedrock and prompt caching demonstrates how organizations can successfully deploy AI-assisted coding solutions while maintaining cost-efficiency. Their approach provides a blueprint for others looking to enhance developer productivity through AI while optimizing operational costs. The combination of strategic implementation, innovative caching solutions, and continuous optimization has enabled Smartsheet to achieve significant improvements in performance and cost metrics.

To learn more about implementing AI solutions at scale, refer to the Amazon Bedrock documentation and explore the AWS Machine Learning Blog. The frameworks and strategies outlined in this post can help organizations of varying size implement efficient, cost-effective AI-assisted development workflows.


About the Authors

Revolutionizing agricultural knowledge management using a multi-modal LLM: A reference architecture

Post Syndicated from Nitin Eusebius original https://aws.amazon.com/blogs/architecture/revolutionizing-agricultural-knowledge-management-using-a-multi-modal-llm-a-reference-architecture/

Handwritten documents are still an important form of data capture in agribusiness. Paper-based handwritten documents can be the result of business culture, lack of internet connectivity, lack of mobile devices or computers, or environmental conditions in the field or in an industrial setting. Because of the physical nature of the document, there might be a delay in transcription or even no transcription into a digital system for enterprise reporting, causing critical information to be unavailable. Using generative AI, handwritten notes can be scanned to record and analyze the document and establish automated workflows for product procurement, the supply chain, and entry into customer relationship management (CRM), enterprise resource planning (ERP), and farm management information systems (FMIS).

Multi-modal large language models (LLMs) are transforming the agriculture industry by integrating diverse data types such as text, images, video, and audio. This approach enhances AI’s understanding and decision-making in farming contexts. For example, a multi-modal LLM can analyze images to identify crop issues, then generate targeted recommendations for irrigation or pest control. Combining handwritten documents and satellite imagery with the power of LLMs can lead to better crop analytics and better yields.

In this blog post, we introduce a reference architecture that offers an intelligent document digitization solution that converts handwritten notes, scanned documents, and images into editable, searchable, and accessible formats. Powered by Anthropic’s Claude 3 on Amazon Bedrock, the solution uses the sophisticated vision capabilities of LLMs to process a wide range of visual formats, preserving the original formatting while extracting text, tables, and images. This enables businesses to digitize their knowledge bases, facilitate seamless collaboration, and integrate the digitized content into their existing digital workflows, enhancing productivity and unlocking the full potential of their information assets.

A comprehensive solution and reference architecture

This reference architecture helps agricultural companies to automatically capture, analyze, and process handwritten notes and images with data and reports that are generated by individuals working in farm fields. This is an example of how to create an end-to-end solution to ingest these documents in image format with Amazon Bedrock. The processed information can be consumed by downstream systems such as CRM, ERP, and FMIS to make better data driven decisions.

The solution uses Anthropic’s Claude 3 multi modal model hosted in Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Claude is Anthropic’s state-of-the-art LLM that offers important features for enterprises such as advanced reasoning, generating text from images, code generation, and multilingual processing. Claude 3 models have sophisticated vision capabilities and can process a wide range of visual formats, including photos, charts, graphs and technical diagrams. You can also use other models such as Llama 3.2 11B and 90B, which also support vision tasks.

The following diagram illustrates the reference solution.

Revolutionizing agricultural knowledge management using a multi-modal LLM: A reference architecture

The process includes the following steps:

  1. A field worker uploads handwritten notes in an image format using a static website on their mobile device. The static website is accessed through Amazon CloudFront and hosted in Amazon Simple Storage Service (Amazon S3).
  2. The worker is securely authenticated using Amazon Cognito.
  3. After the worker is authenticated, the uploaded handwritten notes are sent to Amazon Bedrock for processing using Amazon API Gateway.
  4. An AWS Lambda function stores and reads the image from Amazon S3. It sends the uploaded image and associated prompt information to Anthropic’s Claude 3 hosted in Amazon Bedrock.
  5. Anthropic’s Claude 3 processes the image. It recognizes the handwritten text and analyzes the converted text based on the given prompt.
  6. The converted digital text and analyzed information provided by Anthropic’s Claude 3 are stored in Amazon DynamoDB for further downstream processing.
  7. The field worker uses an app to access the converted digital text and newly processed information stored in Amazon DynamoDB through API Gateway.
  8. The processed information is published to Amazon Simple Notification Service (Amazon SNS) and is consumed by downstream systems.
  9. The field worker’s location details and processed image information are consumed by two different Amazon Simple Queue Service (Amazon SQS) queues to be stored in downstream systems.
  10. The downstream systems can include CRM, FMS, and FMIS.

Additionally, using this solution, geospatial information such as GPS and GIS information can be sent to the FMIS. This can help farmers in many ways including crop monitoring, soil health and nutrient management, pest control, water management, farm mapping, and much more.

Best practices and implementation guidelines

To implement a production-ready system, it’s important to consider the following best practices.

Responsible AI: Deployment of customer facing generative AI solutions raises concerns about responsible AI practices. To mitigate risks such as biased outputs, exposure of sensitive information, or misuse for malicious purposes, it’s crucial to implement robust safeguards and validation mechanisms. Amazon Bedrock Guardrails is a set of tools and services provided by AWS that you can use to implement safeguards and responsible AI practices when building applications with generative AI models.

Security: Follow secure coding practices throughout the development lifecycle to minimize vulnerabilities. Protect your web applications from common exploits by integrating with AWS WAF. The OWASP Top 10 for Large Language Model Applications is a set of guidelines that address the unique security risks associated with generative AI solutions. It covers vulnerabilities such as model inversion, membership inference, and adversarial attacks—all of which can compromise the confidentiality, integrity, and availability of LLMs.

Observability: Monitor all layers of a generative AI solution, including the application, prompt, LLM, knowledgebase, and response provided by the LLM. You can monitor health and performance using Amazon CloudWatch.

LLMOps: Implementing LLM operations (LLMOps) will help to scale your GenAI solutions. See FMOps/LLMOps: Operationalize generative AI and differences with MLOps for additional information.

Conclusion

In this post, we introduced a reference architecture for an intelligent document digitization solution in agriculture. This system uses Amazon Bedrock and the multi-modal capabilities of LLMs such as Anthropic’s Claude 3 to transform handwritten notes and multi-modal data into searchable, digital formats. We explored how this architecture bridges the gap between traditional field documentation and modern digital systems, enhancing data accessibility and decision-making in agribusiness.

The possibilities for customization and expansion are vast. For specific use cases, you can fine-tune the multi-modal model on your unique agricultural business data. You can also implement a combination of multi-modal processing and a specialized knowledge base using Amazon Bedrock Knowledge Bases, further enhancing the system’s accuracy and relevance.


About the Authors

AWS Weekly Roundup: Amazon Nova Premier, Amazon Q Developer, Amazon Q CLI, Amazon CloudFront, AWS Outposts, and more (May 5, 2025)

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-nova-premier-amazon-q-developer-amazon-q-cli-amazon-cloudfront-aws-outposts-and-more-may-5-2025/

Last week I went to Thailand to attend the AWS Summit Bangkok. It was an energizing and exciting event. We hosted the Developer Lounge, where developers can meet, discuss ideas, enjoy lightning talks, win SWAGs at AWS Builder ID Prize Wheel, take a challenge at Amazon Q Developer Coding Challenge, or learn Generative AI at Learn Amazon Bedrock booth.

Here’s a quick look:

Thank you to AWS Heroes, AWS Community Builders, AWS User Group leaders and developers for your collaboration.

Coming up next in ASEAN is AWS Summit Singapore—make sure you don’t miss it by registering now.

Last Week’s Launches
Here are some launches last week that caught my attention:

  • Amazon Nova Premier Now Generally Available — Amazon Nova Premier, our most capable model for complex tasks and teacher for model distillation, is now generally available in Amazon Bedrock. It excels at complex tasks requiring deep context understanding and multistep planning, while processing text, images, and videos with a 1M token context length. With Nova Premier and Amazon Bedrock Model Distillation, you can create highly capable, cost-effective, and low-latency versions of Nova Pro, Lite, and Micro, for your specific needs.

  • Amazon Q Developer elevates the IDE experience with new agentic coding experience — This new interactive, agentic coding experience for Visual Studio Code allows Q Developer to intelligently take actions on behalf of the developer. Amazon Q Developer introduces an interactive coding experience in Visual Studio Code, offering real-time collaboration for coding, documentation, and testing. It provides transparent reasoning, and supports automated or step-by-step changes in multiple languages.

  • New Foundation Models in Amazon Bedrock — Amazon Bedrock expands its model offerings with two significant additions:
    • Writer’s Palmyra X5 and X4 models feature extensive context windows (1M and 128K tokens respectively) and excel in complex reasoning for enterprise applications. They support multistep tool-calling and adaptive thinking with high reliability standards.
    • Meta’s Llama 4 Scout 17B and Maverick 17B models offer natively multimodal capabilities using mixture-of-experts architecture for enhanced reasoning and image understanding. They support multiple languages and extended context processing, with simplified integration through the Bedrock Converse API.
  • Second-Generation AWS Outposts Racks Released AWS announces the general availability of second-generation Outposts racks with significant enhancements including the latest x86 EC2 instances, simplified networking, and accelerated networking options. These improvements deliver doubled vCPU, memory, and network bandwidth, 40% better performance, and support for ultra-low latency workloads, making them ideal for demanding on-premises deployments.

  • Amazon CloudFront SaaS Manager Launches — Amazon CloudFront SaaS Manager helps SaaS providers and web hosting platforms efficiently manage content delivery across multiple customer domains. The service dramatically reduces operational complexity while providing high-performance content delivery and enterprise-grade security for every customer domain.

  • Amazon Aurora Now Supports PostgreSQL 17 — Amazon Aurora now supports PostgreSQL 17.4, offering community improvements and Aurora-specific enhancements like optimized memory management and faster failovers. The release includes new features for Babelfish, security fixes, and updated extensions, available in all AWS Regions.
  • CloudWatch Introduces Tiered Pricing for Lambda Logs — Amazon CloudWatch launches tiered pricing for AWS Lambda logs and new delivery destinations. Pricing in US East starts at $0.50/GB for CloudWatch and $0.25/GB for S3 and Firehose, both tiering down to $0.05/GB. This update enhances flexibility in log management across all supporting Regions.
  • RDS for MySQL Updates Minor VersionsAmazon RDS for MySQL now supports minor versions 8.0.42 and 8.4.5, delivering security fixes, bug fixes, and performance improvements. Users can upgrade automatically during maintenance windows or use Blue/Green deployments for safer updates.
  • Amazon Bedrock Model Distillation Generally AvailableAmazon Bedrock Model Distillation is now generally available, supporting new models like Amazon Nova and Claude 3.5. It enables smaller models to accurately predict function calling for Agents, delivering up to 500% faster responses and 75% lower costs with minimal accuracy loss for RAG use cases. The service includes automated workflows for data synthesis and student model training.
  • AI Search Flow Builder for Amazon OpenSearch Service Amazon OpenSearch Service now offers an AI search flow builder for OpenSearch 2.19+ domains. This low-code designer enables creation of sophisticated AI-enhanced search flows using AWS and third-party services, supporting use cases like RAG, query rewriting, and semantic encoding.

From Community.AWS
Here’s my personal favorites posts from community.aws:

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

  • AWS Summit — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Poland (6 May), Bengaluru (May 7 – 8), Hong Kong (May 8), Seoul (May 14-15), Singapore (May 29), and Sydney (June 4–5).
  • AWS re:Inforce – Mark your calendars for AWS re:Inforce (June 16–18) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. You can subscribe for event updates now!
  • AWS Partners Events – You’ll find a variety of AWS Partner events that will inspire and educate you, whether you are just getting started on your cloud journey or you are looking to solve new business challenges.
  • AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Yerevan, Armenia (May 24), Zurich, Switzerland (May 25), and Bengaluru, India (May 25).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Use an Amazon Bedrock powered chatbot with Amazon Security Lake to help investigate incidents

Post Syndicated from Madhunika Reddy Mikkili original https://aws.amazon.com/blogs/security/use-an-amazon-bedrock-powered-chatbot-with-amazon-security-lake-to-help-investigate-incidents/

In part 2 of this series, we showed you how to use Amazon SageMaker Studio notebooks with natural language input to assist with threat hunting. This is done by using SageMaker Studio to automatically generate and run SQL queries on Amazon Athena with Amazon Bedrock and Amazon Security Lake. The Security Lake service team and the Open Cybersecurity Schema Framework (OCSF) community continue to add additional log sources and OCSF mappings to enable Security Lake to provide a consolidated source for customers to conduct security investigation.

Because security logging data sources continually grow, organizations need to provide a mechanism for their security teams to understand and query those data sources. You might have existing investigation and response playbooks that your security teams need to be well-versed in and know when to use. It can take security teams an extended period of time to onboard and understand the available security data sources and playbooks and how to efficiently use them to reduce the mean time to respond.

In this post, we show you how to extend the functionality from the previous post. You will learn how to deploy a security chatbot with a graphical user interface (GUI) and a serverless backend powered by an Amazon Bedrock agent that incorporates existing playbooks to investigate or respond to a security event. The chatbot demonstrates purpose-built Amazon Bedrock agents that help address security concerns depending on the user’s natural language input. The solution has a single GUI that provides a direct interface with the Amazon Bedrock agent to create and invoke SQL queries or provide recommendations for internal incident response playbooks to investigate or respond to possible security events.

Security chatbot sample solution overview

Figure 1: Security chatbot sample solution architecture diagram

Figure 1: Security chatbot sample solution architecture diagram

Application flow as shown in Figure 1:

  1. User submits a query through the React UI.

    Note: The React UI used in this solution doesn’t have authentication built in. It’s recommended that you add authentication capabilities that follow your organization’s security requirements. You can add authentication capabilities by using Amazon Cognito and AWS Amplify UI.

  2. The user’s query is sent to an Amazon API Gateway REST API, which invokes the Invoke Agent AWS Lambda function.
  3. The Lambda function invokes the Amazon Bedrock agent with the user’s query.
  4. The Amazon Bedrock agent (using Anthropic’s Claude 3 Sonnet) processes the query and decides between retrieving information from the playbooks or by querying Security Lake using Amazon Athena.

For playbook knowledge base queries:

  1. The Amazon Bedrock agent queries the playbooks knowledge base and retrieves relevant results.

For Security Lake data queries:

  1. The Amazon Bedrock agent queries the schema knowledge base and retrieves the Security Lake table schemas to create an SQL query.
  2. The Amazon Bedrock agent invokes the SQL query action from the Amazon Bedrock action group, passing the SQL query as a parameter.
  3. The action group invokes the Execute SQL on Athena Lambda function, which executes the query on Athena and returns the results to the Amazon Bedrock agent.

After retrieving results from the knowledge base or action group:

  1. The Amazon Bedrock agent uses the retrieved information to formulate the final response and sends it back to the Invoke Agent Lambda function.
  2. The Lambda function sends the response back to the client using an API Gateway WebSocket API.
  3. API Gateway delivers the response to the React UI using a WebSocket connection to the client.
  4. The agent’s response is displayed to the user in the chat interface.

Prerequisites

Before deploying the sample solution, complete the following prerequisites:

  1. Enable Security Lake in your organization in AWS Organizations and specify a delegated administrator account to manage the Security Lake configuration for all member accounts in your organization. Configure Security Lake with the appropriate log sources: Amazon Virtual Private Cloud (Amazon VPC) Flow Logs, AWS Security Hub, AWS CloudTrail, and Amazon Route53.
  2. Create subscriber query access from the source Security Lake AWS account to the subscriber AWS account.
  3. Accept a resource share request in the subscriber AWS account in AWS Resource Access Manager (AWS RAM).
  4. Create a database link in AWS Lake Formation in the subscriber AWS account and grant access for the Athena tables in the Security Lake AWS account.
  5. Grant Anthropic’s Claude v3 model access for Amazon Bedrock in the AWS subscriber account where you will deploy the solution. If you try to use a model before you enable it in your AWS account, you will get an error message.

With the prerequisites in place, the sample solution architecture provisions the following resources:

  1. Amazon CloudFront with an Amazon Simple Storage Service (Amazon S3) origin.
  2. An Amazon S3 static website for the chatbot UI.
  3. An API Gateway to call a Lambda function.
  4. A Lambda function to invoke the Amazon Bedrock agent.
  5. An Amazon Bedrock agent with a knowledge base.
    1. An Amazon Bedrock agent action group to generate and invoke SQL queries on Athena.
      1. An Amazon Bedrock knowledge base to reference example Athena table schemas in Security Lake. Although the Amazon Bedrock agent can get rows directly from the Athena table, providing example table schemas improves SQL query generation accuracy for table columns in Security Lake.
      2. An Amazon Bedrock knowledge base to reference existing incident response playbooks. By incorporating this knowledge base, the Amazon Bedrock agent can suggest actions for investigation or response based on existing playbooks that have already been approved by your organization.

Cost

Before deploying the sample solution and walking through this post, it’s important to understand the cost of the AWS services being used. The cost will largely depend on the amount of data you interact with in Amazon Bedrock and by querying Security Lake with Athena.

  1. Security Lake costs are determined by the volume of log and event data ingested from AWS services. Security Lake orchestrates other AWS services on your behalf, which incur separate charges. You can find more information on pricing for the respective services: Amazon S3, AWS GlueAmazon EventBridgeAWS LambdaAmazon Simple Query Service (Amazon SQS), and Amazon Simple Notification Service (Amazon SNS).
  2. Amazon Bedrock on-demand pricing is based on the selected large language model (LLM) and the number of input and output tokens. A token is comprised of a few characters and refers to the basic unit of text that a model learns to understand the user input and prompts. For more details, see Amazon Bedrock pricing.
  3. The SQL queries generated by Amazon Bedrock are invoked using Athena. Athena cost is based on the amount of data scanned within Security Lake for that query. For more details, see Athena pricing.

Deploy the sample chatbot

You can deploy the sample solution by using AWS Cloud Development Kit (AWS CDK). For instructions and more information on using the AWS CDK, see Get Started with AWS CDK.

  1. Clone the sample-generative-ai-chatbot-for-amazon-security-lake repository.
  2. Navigate to the project’s root folder.
  3. Install the project dependencies.
  4. Build and deploy the app using the following commands:
    npm install -g aws-cdk
    npm install 
    cdk synth
    

  5. Run the following commands in your terminal while signed in to your subscriber AWS account. Replace <INSERT_AWS_ACCOUNT> with your account number and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to.
    cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>
    cdk deploy –all
    

As part of the CDK deployment, there is an Output value for the React Application URL (FrontendAppStack.ReactAppUrl). You will use this value to interact with the GenAI application. Wait up to 5 mins for the URL to be live.

Post-deployment configuration steps

Now that you’ve deployed the solution, you need to add permissions to allow the Lambda function’s AWS Identity and Access Management (IAM) role and Amazon Bedrock to interact with your Security Lake data.

Grant permission to the Security Lake database

  1. Copy the Lambda’s role ARN from the “BedrockAppStack” CloudFormation stack. The resource in the stack is named “athenaAgentSecurityLakeActionGroupLambdaServiceRole********”.
  2. Go to the Lake Formation console.
  3. Select the amazon_security_lake_glue_db_<YOUR-REGION> database. For example, if your Security Lake is in us-east-1, the value would be amazon_security_lake_glue_db_us_east_1
  4. For Actions, select Grant.
  5. In Grant Data Permissions, select SAML Users and Groups.
  6. Paste the Lambda function IAM role ARN from Step 1.
  7. In Database Permissions, select Describe, and then choose Grant.

Grant permission to Security Lake tables

You must repeat the following steps for each source configured within Security Lake. For example, if you have four sources configured within Security Lake, you must grant permissions for the Lambda function IAM role to each table. If you have multiple sources that are in separate Regions and you don’t have a rollup Region configured in Security Lake, you must repeat the steps for each source in each Region.

The following example grants permissions to the Security Hub table within Security Lake. For more information about granting table permissions, see the AWS Lake Formation user guide.

  1. Copy the Lambda’s role ARN from the “BedrockAppStack” CloudFormation stack. The resource in the stack is named as “athenaAgentSecurityLakeActionGroupLambdaServiceRole********”.
  2. Go to the Lake Formation console.
  3. Select the amazon_security_lake_glue_db_<YOUR-REGION> database.
    For example, if your Security Lake database is in us-east-1, the value would be amazon_security_lake_glue_db_us_east-1
  4. Choose View Tables.
  5. Select the amazon_security_lake_table_<YOUR-REGION>_sh_findings_1_0 table.
    For example, if your Security Lake table is in us-east-1, the value would be amazon_security_lake_table_us_east_1_sh_findings_1_0

    Note: Each table must be granted access individually. Selecting All Tables won’t grant the access needed to query Security Lake.

  6. For Actions, select Grant.
  7. In Grant Data Permissions, select SAML Users and Groups.
  8. Paste the Lambda function IAM role ARN from Step 1.
  9. In Table Permissions, select Describe, and then Grant.

Sync data sources

After you deploy the infrastructure, you need to sync the data sources in the Amazon Bedrock knowledge bases so that the data in Amazon S3 can be vectorized and made available in Amazon OpenSearch Serverless, which is the service used as a vector source by the knowledge bases in this solution.

  • In the Amazon Bedrock console, select Knowledge base and find the two Amazon Bedrock knowledge bases deployed in this solution: gen-ai-sec-lake-table-schema and gen-ai-sec-lake-runbooks. Navigate to each knowledge base and its data source. Then choose Sync for each data source.

Get the CloudFront distribution URL

As part of the sample solution, the chatbot uses an externally available CloudFront distribution URL. It’s recommended that you add appropriate security controls that align to your organization’s security requirements to the sample solution. For example, you might want to add authentication to CloudFront using Amazon Cognito and Lambda@Edge to help prevent unauthorized users from accessing this chatbot. You can also configure secure access and restrict access to the content.

  1. Navigate to CloudFormation in the console.
  2. In the Stacks section, select the FrontendAppStack.
  3. Select the Outputs tab.
  4. Copy the value ReactAppUrl.

Investigate with your security chatbot

Now that you’ve deployed the sample solution and configured the appropriate permissions, you’re ready to use natural language input to generate and invoke SQL queries and to recommend internal incident response playbooks.

Generate and invoke SQL queries

Although most security professionals already know how to create and invoke SQL queries, time is extremely important during a possible security event. Adding natural-language-to-SQL capability to the security agent can help reduce your mean time to respond, which can reduce the overall impact of a security event. Furthermore, you can use these same techniques for threat hunting in your AWS environment.

Question 1: Which AWS accounts have the most security findings for EC2 instances in the last 7 days?

Answer 1: The query results show the top two AWS accounts with the most security findings related to EC2 instances in the last 7 days. The account with ID XXXXXX10 had the highest number of findings at 252, followed by account XXXXXX28 with 197 findings.

The query covered the key requirements, so I provided the final answer summarizing the top accounts and their finding counts.

The question and response are shown in Figure 2.

Figure 2: Response for most EC2 security findings

Figure 2: Response for most EC2 security findings

The more context you provide to the chatbot, the better the query that will be generated. Providing a time window also gives a smaller data set to review and give quicker results.

Question 2: In those AWS accounts, what were the top EC2 security finding types?

Answer 2: Based on the query results, the top EC2 security finding types in the AWS accounts with the most EC2 instance findings in the last 7 days were:

For account XXXXXX10:

  1. “Compliance Finding: Update” with 252 findings

For account XXXXXX28:

  1. “Compliance Finding Update” with 166 findings
  2. “Compliance Finding Create” with 24 findings
  3. “Vulnerability Finding: Update” with 2 findings
  4. “Vulnerability Finding: Create” with 2 findings.

The question and response are shown in Figure 3.

Figure 3: Response for top EC2 security finding types

Figure 3: Response for top EC2 security finding types

The results show that the issues are primarily around Security Hub controls mapped to the type_name OCSF column. If you want to look at a different column, you can provide additional instructions or update the question.

Question 3: Which EC2 instances had vulnerability security findings?

Answer 3: Based on the query results, the following EC2 instances had vulnerability security findings in the AWS account with the most EC2 instance findings in the last 7 days:

In account XXXXXX28:

  1. arn:aws:ec2:us-east-1: XXXXXX28:instance/<instance-id>
  2. arn:aws:ec2:us-east-1: XXXXXX28:instance/<instance-id>

There were no vulnerability findings for EC2 instances in account XXXXXX10 during this time period.

The question and response are shown in Figure 4.

Figure 4: Response for EC2 instances with vulnerability security findings

Figure 4: Response for EC2 instances with vulnerability security findings

In this question, the search is narrowed down to two Amazon Elastic Compute Cloud (Amazon EC2) instances within a specific AWS account to take a deeper look into.

Recommend incident response playbooks

As part of this sample solution, we used the AWS Incident Response Playbook Samples GitHub repository and uploaded them to an S3 bucket. Alternatively, you can create or reference your own playbooks to investigate and respond to various security events to refer to. After each question, the chatbot provides an answer along with an Amazon S3 URL to the playbook. The security analyst must have the appropriate permissions to download the playbook from Amazon S3.

Question 1: Do I have any playbooks that would help investigate a compromised EC2 instance for crypto mining or malware?

Answer 1: Yes, the search results include a relevant playbook for investigating a potentially compromised EC2 instance that could be used for crypto mining, malware, or ransomware. The key steps outlined in the playbook are:

  1. Acquire, preserve, and document evidence – Identify indicators of compromise, determine attack vector, document impacted applications/users
  2. Contain the incident – Isolate the compromised instance if possible without service disruption, remove from auto-scaling group if applicable
  3. Eradicate the incident – Follow the steps to remove the malware/crypto miner, restore from known good backups/AMIs
  4. Recovery – Spin up new instances from trusted AMIs, restore the data, and monitor for persistence of the threat
  5. Lessons learned – Analyze the incident, identify root causes, and implement preventative measures

The playbook provides detailed guidance on each of these phases in the context of an EC2 ransomware incident, but the overall process can be adapted for investigating crypto mining, malware, or other threats on EC2 as well.

The question and response are shown in Figure 5.

Figure 5: Response for EC2 IR playbooks

Figure 5: Response for EC2 IR playbooks

The previous threat-hunting exercise identified two EC2 instances that had vulnerabilities present. After further investigation, if a security analyst believes the EC2 instance is compromised, they can ask for an IR playbook to respond to the security event. Once they determine which playbook makes the most sense, they can either click on the S3 URL or look at other playbooks to use.

Question 2: Do I have any playbooks that would help respond to ransomware in my S3 bucket?

Answer 2: Yes, the search results include a playbook that can help respond to a ransomware incident involving an S3 bucket. The “Incident Response Playbook Template for Unintended Access to an Amazon S3 Bucket” outlines steps for:

  1. Gathering evidence of the unintended access
  2. Containing the incident by restricting access to the affected bucket(s)
  3. Eradicating the issue by removing any malicious data/objects and restoring from backup
  4. Recovering by re-enablement access to the bucket for legitimate users
  5. Conducting post-incident activities like reviewing logs, determining root cause, and implementing preventative measures

While focused on generic unintended access rather than ransomware specifically, many of these steps could be adapted and applied to a ransomware scenario targeting data in S3. The playbook provides a solid framework for incident response on S3.

Additionally, the general “Ransomware Incident Response Playbook” template covers common steps for any ransomware incident, such as:

  1. Acquiring and preserving evidence of the ransomware infection
  2. Containing the incident by isolating affected systems
  3. Identifying and remove the ransomware
  4. Recovering data from backups or known good sources
  5. Conducting post-incident review

While not S3-specific, this playbook provides complementary guidance that could be used in conjunction with the S3 access playbook for an S3 ransomware incident.

The question and response are shown in Figure 6.

Figure 6: Response for S3 ransomware IR playbooks

Figure 6: Response for S3 ransomware IR playbooks

The response provides a high-level overview of the specific playbook that’s being referenced to provide the analyst with context before diving deeper into any single playbook.

Adding functionality to the security chatbot

This sample solution was developed to show the art of the possible. Each customer uses AWS resources to address their business needs in their own way, and security teams must be appropriately equipped to help secure their respective environments. Here are some possible enhancements that you can incorporate into the sample solution to align to your security use-cases and needs.

  1. Incorporate an Amazon DynamoDB table to use as part of reporting interactions tied to a specific event or finding GUID. By incorporating an audit trail, you can tie actions taken by the agent and associated resources to a security event and validate the outcome of the investigation before taking action.
  2. Tuning the backend chatbot agent to query Amazon Linux Security Center’s Common Vulnerabilities and Exposures (CVE) list or MITRE’s CVE list to see which AWS resources might be in scope and send out consolidated messages to resource owners with recommended actions.
  3. Tuning the backend chatbot agent to take natural language requests and respond with detectors or correlation rules for Amazon OpenSearch or query language for custom detections in your security information and event management (SIEM) tool.
  4. Adding a new data source to Athena, such as AWS Config, to provide the analyst with additional capabilities to query AWS resource configuration across the AWS environment that might have been impacted by a security event. For example, if a security finding shows that an S3 bucket has been made public, querying what and when other configuration changes were made to the S3 bucket.
  5. Incorporating multi-agent-orchestration to scale the use of multiple Amazon Bedrock agents that can be tuned towards niche security use cases by respective teams. The chatbot can speak directly to a classifier or controller, which then addresses the user’s natural language request and orchestrates across one or more agents to generate a response. For example, if a user asks which EC2 instances might have been impacted by a security event and which playbook to use to respond, the classifier agent could direct the initial query to the agent in this sample solution. In the same chat window, the analyst could ask if there are any open CVEs for the EC2 instances in scope to get a list of CVEs to address within the AWS account.
  6. For long running Athena queries, you can incorporate an AWS Step Function to the workflow and incorporate a task token to wait for the Athena results to return.

Clean up

If you deployed the security chatbot sample solution by using the Launch Stack button and the console with the CloudFormation template security_genai_chatbot_cfn, do the following to clean up:

  1. In the CloudFormation console for the account and Region where you deployed the solution, choose the SecurityGenAIChatbot stack.
  2. Choose the option to Delete the stack.

If you deployed the solution by using the AWS CDK, run the command cdk destroy --all.

Conclusion

The sample solution demonstrates how you can use task-oriented Amazon Bedrock agents and natural language input to help accelerate investigation and analysis and increase your overall security posture. We provided an example of a sample solution with a user interface that is powered by an Amazon Bedrock agent, which you can extend to add additional task-oriented agents, each with their own instructions, knowledge bases, and models. By extending the use of AI-powered agents, you can help your security team operate more efficiently across multiple security domains within your AWS environment.

The backend for the chatbot to investigate security events uses Security Lake, which normalizes data into Open Cybersecurity Schema Framework (OCSF); as long as the data schema is normalized, the solution can be applied to other data lakes within your AWS environment.

To learn more, see the other posts in this series:

Use the comments section to provide feedback. If you have questions about this post, start a new thread on the Generative AI on AWS re:Post or contact AWS Support.

Madhunika-Reddy-Mikkili

Madhunika Reddy Mikkili

Madhunika is a Data and Machine Learning Engineer at AWS. She is passionate about helping customers achieve their goals using data analytics and machine learning.

Author

Jonathan Nguyen

Jonathan is a Principal Security Solution Architect at AWS. He helps large financial services customers develop a comprehensive security strategy and solutions to meet their security and compliance requirements in AWS.

Harsh Asnani

Harsh Asnani

Harsh is a Machine Learning Engineer at AWS specializing in ML theory, MLOPs, and production generative AI frameworks. His background is in applied data science with a focus on operationalizing AI workloads in the cloud at scale.

Michael Massey

Michael Massey

Michael is a Cloud Application Architect at AWS, where he specializes in building frontend and backend cloud-centered applications. He designs and implements scalable and highly-available solutions and architectures that help customers achieve their business goals.t

Amazon Nova Premier: Our most capable model for complex tasks and teacher for model distillation

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-nova-premier-our-most-capable-model-for-complex-tasks-and-teacher-for-model-distillation/

Today we’re expanding the Amazon Nova family of foundation models announced at AWS re:Invent with the general availability of Amazon Nova Premier, our most capable model for complex tasks and teacher for model distillation.

Nova Premier joins the existing Amazon Nova understanding models available in Amazon Bedrock. Similar to Nova Lite and Pro, Premier can process input text, images, and videos (excluding audio). With its advanced capabilities, Nova Premier excels at complex tasks that require deep understanding of context, multistep planning, and precise execution across multiple tools and data sources. With a context length of one million tokens, Nova Premier can process extremely long documents or large code bases.

With Nova Premier and Amazon Bedrock Model Distillation, you can create highly capable, cost-effective, and low-latency versions of Nova Pro, Lite, and Micro, for your specific needs. For example, we used Nova Premier to distill Nova Pro for complex tool selection and API calling. The distilled Nova Pro had a 20% higher accuracy for API invocations compared to the base model and consistently matched the performance of the teacher, with the speed and cost benefits of Nova Pro.

Amazon Nova Premier benchmark evaluation
We evaluated Nova Premier on a broad range of benchmarks across text intelligence, visual intelligence, and agentic workflows. Nova Premier is the most capable model in the Nova family as measured across 17 benchmarks as shown in the table below.

Amazon Nova Premier Benchmark Evaluations

Nova Premier is also comparable to the best non-reasoning models in the industry and is equal or better on approximately half of these benchmarks when compared to other models in the same intelligence tier. Details of these evaluations are in the technical report.

Nova Premier is also the fastest and the most cost-effective model in Amazon Bedrock for its intelligence tier. For further details and comparison on pricing, please refer to the Bedrock pricing page.

Nova Premier can also be used as a teacher model for distillation, which means you can transfer its advanced capabilities for a specific use case into smaller, faster, and more efficient models like Nova Pro, Micro, and Lite for production deployments.

Using Amazon Nova Premier
To get started with Nova Premier, you first need to request access to the model in the Amazon Bedrock console. Navigate to Model access in the navigation pane, find Nova Premier, and toggle access.

Console screenshot.

Once you have access, you can use Nova Premier through the Amazon Bedrock Converse API providing in input a list of messages from the user and the assistant. Messages can include text, images, and videos. Here’s an example of a straightforward invocation using the AWS SDK for Python (Boto3):

import boto3
import json

AWS_REGION = "us-east-1"
MODEL_ID = "us.amazon.nova-premier-v1:0"

bedrock_runtime = boto3.client('bedrock-runtime', region_name=AWS_REGION)
messages = [
    {
        "role": "user",
        "content": [
            {
                "text": "Explain the differences between vector databases and traditional relational databases for AI applications."
            }
        ]
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages
)

response_text = response["output"]["message"]["content"][-1]["text"]

print(response_text)

This example shows how Nova Premier can provide detailed explanations for complex technical questions. But the real power of Premier comes with its ability to handle sophisticated workflows.

Multi-agent collaboration use case
Let’s explore a more complex scenario that showcases how Nova Premier works a multi-agent collaboration architecture for investment research.

The equity research process typically involves multiple stages: identifying relevant data sources for specific investments, retrieving required information from those sources, and synthesizing the data into actionable insights. This process becomes increasingly complex when dealing with different types of financial instruments like stock indices, individual equities, and currencies.

We can build this type of application using multi-agent collaboration in Amazon Bedrock, with Nova Premier powering the supervisor agent that orchestrates the entire workflow. The supervisor agent analyzes the initial query (for example, “What are the emerging trends in renewable energy investments?”), breaks it down into logical steps, determines which specialized subagents to engage, and synthesizes the final response.

For this scenario, I’ve created a system with the following components:

  1. A supervisor agent powered by Nova Premier
  2. Multiple specialized subagents powered by Nova Pro, each focusing on different financial data sources
  3. Tools that connect to financial databases, market analysis tools, and other relevant information sources

Multi-agent architectural diagram

When I submit a query about emerging trends in renewable energy investments, the supervisor agent powered by Nova Premier does the following:

  1. Analyzes the query to determine the underlying topics and sources to cover
  2. Selects the appropriate subagents specific to those topics and sources
  3. Each subagent retrieves their relevant economic indicators, technical analysis, and market sentiment data
  4. The supervisor agent synthesizes this information into a comprehensive report for review by a financial professional

Utilizing Nova Premier in a multi-agent collaboration architecture such as this streamlines the financial professional’s work and helps them formulate their investment analysis faster. The following video provides a visual description of this scenario.

The key advantage of using Nova Premier for the supervisor role is its accuracy in coordinating complex workflows, so that the right data sources are consulted in the optimal sequence and each subagent receives in input the correct information for their work, resulting in higher quality insights.

Multi-agent collaboration with model distillation
Although Nova Premier provides the highest level of accuracy of its family of models, you might want to optimize latency and cost in production environments. This is where the strength of Nova Premier as a teacher model for distillation becomes interesting. Using Amazon Bedrock Model Distillation, we can customize Nova Micro from the results of Nova Premier for this specific investment research use case.

Unlike traditional fine-tuning that requires human feedback and labeled examples, with model distillation you can generate high-quality training data by having a teacher model produce the desired outputs, streamlining the data acquisition process.

Amazon Bedrock Model Distillation diagram

The process to distill a model involves:

  1. Generating synthetic training data by capturing input and output from Nova Premier runs across multiple financial instruments
  2. Using this data as a reference to train a customized version of Nova Micro through custom fine-tuning tools
  3. Evaluating the difference in latency and performance of the customized Micro model
  4. Deploying the customized Micro model as the supervisor agent in production

With Amazon Bedrock, you can further streamline the process and use invocation logs for data preparation. To do that, you need to set the model invocation logging on and set up an Amazon Simple Storage Service (Amazon S3) bucket as the destination for the logs.

Customer voices
Some of our customers had early access to Nova Premier. This is what they shared with us:

“Amazon Nova Premier has been outstanding in its ability to execute interactive analysis workflows, while still being faster and nearly half the cost compared to other leading models in our tests,” said Curtis Allen, Senior Staff Engineer at Slack, a company bringing conversations, apps, and customers together in one place.

“Implementing new solutions built on top of Amazon Nova has helped us with our mission of democratizing finance for all,” said Dev Tagare, Head of AI and Data at Robinhood Markets, a company on a mission to democratize finance for all. “We’re particularly excited about the ability to explore new avenues like complex multi-agent collaborations that are not just highly performing but also cost effective and fast. The intelligence of Nova Premier and what it can transfer to the other models like Nova Micro, Nova Lite, and Nova Pro unlocks multi-agent collaboration at a performance, price, and speed that will make it accessible to everyday customers.”

“Accelerating real-world AI deployments—not just prototypes—requires the ability to build models that are specialized for the unique needs of real world applications,” said Henry Ehrenberg, co-founder of Snorkel AI, a technology company that empowers data scientists and developers to quickly turn data into accurate and adaptable AI applications. “We’re excited to see AWS pushing efficient model customization forward with Amazon Bedrock Model Distillation and Amazon Nova Premier. These new model capabilities have the potential to accelerate our enterprise customers in building production AI applications, including Q&A applications with multimodal data and more.”

Things to know

Nova Premier is available in Amazon Bedrock in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS Regions today via cross-Region inference. With Amazon Bedrock, you only pay for what you use. For more information, visit Amazon Bedrock pricing.

Customers in the US can also access Amazon Nova models at https://nova.amazon.com, a website to easily explore our FMs.

Nova Premier is our best teacher for distilling custom variants of Nova Pro, Micro, and Lite, which means you can capture the capabilities offered by Premier in smaller, faster models for production deployment.

Nova Premier includes built-in safety controls to promote responsible AI use, with content moderation capabilities that help maintain appropriate outputs across a wide range of applications.

To get started with Nova Premier, visit the Amazon Bedrock console today. For more information, see the Amazon Nova User Guide and send feedback to AWS re:Post for Amazon Bedrock. Explore the generative AI section of our community.aws site to see how our Builder communities are using Amazon Bedrock in their solutions.

Danilo


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Llama 4 models from Meta now available in Amazon Bedrock serverless

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/llama-4-models-from-meta-now-available-in-amazon-bedrock-serverless/

The newest AI models from Meta, Llama 4 Scout 17B and Llama 4 Maverick 17B, are now available as a fully managed, serverless option in Amazon Bedrock. These new foundation models (FMs) deliver natively multimodal capabilities with early fusion technology that you can use for precise image grounding and extended context processing in your applications.

Llama 4 uses an innovative mixture-of-experts (MoE) architecture that provides enhanced performance across reasoning and image understanding tasks while optimizing for both cost and speed. This architectural approach enables Llama 4 to offer improved performance at lower cost compared to Llama 3, with expanded language support for global applications.

The models were already available on Amazon SageMaker JumpStart, and you can now use them in Amazon Bedrock to streamline building and scaling generative AI applications with enterprise-grade security and privacy.

Llama 4 Maverick 17B – A natively multimodal model featuring 128 experts and 400 billion total parameters. It excels in image and text understanding, making it suitable for versatile assistant and chat applications. The model supports a 1 million token context window, giving you the flexibility to process lengthy documents and complex inputs.

Llama 4 Scout 17B – A general-purpose multimodal model with 16 experts, 17 billion active parameters, and 109 billion total parameters that delivers superior performance compared to all previous Llama models. Amazon Bedrock currently supports a 3.5 million token context window for Llama 4 Scout, with plans to expand in the near future.

Use cases for Llama 4 models
You can use the advanced capabilities of Llama 4 models for a wide range of use cases across industries:

Enterprise applications – Build intelligent agents that can reason across tools and workflows, process multimodal inputs, and deliver high-quality responses for business applications.

Multilingual assistants – Create chat applications that understand images and provide high-quality responses across multiple languages, making them accessible to global audiences.

Code and document intelligence – Develop applications that can understand code, extract structured data from documents, and provide insightful analysis across large volumes of text and code.

Customer support – Enhance support systems with image analysis capabilities, enabling more effective problem resolution when customers share screenshots or photos.

Content creation – Generate creative content across multiple languages, with the ability to understand and respond to visual inputs.

Research – Build research applications that can integrate and analyze multimodal data, providing insights across text and images.

Using Llama 4 models in Amazon Bedrock
To use these new serverless models in Amazon Bedrock, I first need to request access. In the Amazon Bedrock console, I choose Model access from the navigation pane to toggle access to Llama 4 Maverick 17B and Llama 4 Scout 17B models.

Console screenshot.

The Llama 4 models can be easily integrated into your applications using the Amazon Bedrock Converse API, which provides a unified interface for conversational AI interactions.

Here’s an example of how to use the AWS SDK for Python (Boto3) with Llama 4 Maverick for a multimodal conversation:

import boto3
import json
import os

AWS_REGION = "us-west-2"
MODEL_ID = "us.meta.llama4-maverick-17b-instruct-v1:0"
IMAGE_PATH = "image.jpg"


def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    extension = os.path.splitext(filename)[1].lower()[1:] or 'txt'
    if extension == 'jpg':
        extension = 'jpeg'
    return extension


def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name=AWS_REGION
)

request_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "What can you tell me about this image?"
                },
                {
                    "image": {
                        "format": get_file_extension(IMAGE_PATH),
                        "source": {"bytes": read_file(IMAGE_PATH)},
                    }
                },
            ],
        }
    ]
}

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=request_body["messages"]
)

print(response["output"]["message"]["content"][-1]["text"])

This example demonstrates how to send both text and image inputs to the model and receive a conversational response. The Converse API abstracts away the complexity of working with different model input formats, providing a consistent interface across models in Amazon Bedrock.

For more interactive use cases, you can also use the streaming capabilities of the Converse API:

response_stream = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=request_body['messages']
)

stream = response_stream.get('stream')
if stream:
    for event in stream:

        if 'messageStart' in event:
            print(f"\nRole: {event['messageStart']['role']}")

        if 'contentBlockDelta' in event:
            print(event['contentBlockDelta']['delta']['text'], end="")

        if 'messageStop' in event:
            print(f"\nStop reason: {event['messageStop']['stopReason']}")

        if 'metadata' in event:
            metadata = event['metadata']
            if 'usage' in metadata:
                print(f"Usage: {json.dumps(metadata['usage'], indent=4)}")
            if 'metrics' in metadata:
                print(f"Metrics: {json.dumps(metadata['metrics'], indent=4)}")

With streaming, your applications can provide a more responsive experience by displaying model outputs as they are generated.

Things to know
The Llama 4 models are available today with a fully managed, serverless experience in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) AWS Regions. You can also access Llama 4 in US East (Ohio) via cross-region inference.

As usual with Amazon Bedrock, you pay for what you use. For more information, see Amazon Bedrock pricing.

These models support 12 languages for text (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese) and English when processing images.

To start using these new models today, visit the Meta Llama models section in the Amazon Bedrock User Guide. You can also explore how our Builder communities are using Amazon Bedrock in their solutions in the generative AI section of our community.aws site.

Danilo


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Writer Palmyra X5 and X4 foundation models are now available in Amazon Bedrock

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/writer-palmyra-x5-and-x4-foundation-models-are-now-available-in-amazon-bedrock/

One thing we’ve witnessed in recent months is the expansion of context windows in foundation models (FMs), with many now handling sequence lengths that would have been unimaginable just a year ago. However, building AI-powered applications that can process vast amounts of information while maintaining the reliability and security standards required for enterprise use remains challenging.

For these reasons, we’re excited to announce that Writer Palmyra X5 and X4 models are available today in Amazon Bedrock as a fully managed, serverless offering. AWS is the first major cloud provider to deliver fully managed models from Writer. Palmyra X5 is a new model launched today by Writer. Palmyra X4 was previously available in Amazon Bedrock Marketplace.

Writer Palmyra models offer robust reasoning capabilities that support complex agent-based workflows while maintaining enterprise security standards and reliability. Palmyra X5 features a one million token context window, and Palmyra X4 supports a 128K token context window. With these extensive context windows, these models remove some of the traditional constraints for app and agent development, enabling deeper analysis and more comprehensive task completion.

With this launch, Amazon Bedrock continues to bring access to the most advanced models and the tools you need to build generative AI applications with security, privacy, and responsible AI.

As a pioneer in FM development, Writer trains and fine-tunes its industry leading models on Amazon SageMaker HyperPod. With its optimized distributed training environment, Writer reduces training time and brings its models to market faster.

Palmyra X5 and X4 use cases
Writer Palmyra X5 and X4 are designed specifically for enterprise use cases, combining powerful capabilities with stringent security measures, including System and Organization Controls (SOC) 2, Payment Card Industry Data Security Standard (PCI DSS), and Health Insurance Portability and Accountability Act (HIPAA) compliance certifications.

Palmyra X5 and X4 models excel in various enterprise use cases across multiple industries:

Financial services – Palmyra models power solutions across investment banking and asset and wealth management, including deal transaction support, 10-Q, 10-K and earnings transcript highlights, fund and market research, and personalized client outreach at scale.

Healthcare and life science – Payors and providers use Palmyra models to build solutions for member acquisition and onboarding, appeals and grievances, case and utilization management, and employer request for proposal (RFP) response. Pharmaceutical companies use these models for commercial applications, medical affairs, R&D, and clinical trials.

Retail and consumer goods – Palmyra models enable AI solutions for product description creation and variation, performance analysis, SEO updates, brand and compliance reviews, automated campaign workflows, and RFP analysis and response.

Technology – Companies across the technology sector implement Palmyra models for personalized and account-based marketing, content creation, campaign workflow automation, account preparation and research, knowledge support, job briefs and candidate reports, and RFP responses.

Palmyra models support a comprehensive suite of enterprise-grade capabilities, including:

Adaptive thinking – Hybrid models combining advanced reasoning with enterprise-grade reliability, excelling at complex problem-solving and sophisticated decision-making processes.

Multistep tool-calling – Support for advanced tool-calling capabilities that can be used in complex multistep workflows and agentic actions, including interaction with enterprise systems to perform tasks like updating systems, executing transactions, sending emails, and triggering workflows.

Enterprise-grade reliability – Consistent, accurate results while maintaining strict quality standards required for enterprise use, with models specifically trained on business content to align outputs with professional standards.

Using Palmyra X5 and X4 in Amazon Bedrock
As for all new serverless models in Amazon Bedrock, I need to request access first. In the Amazon Bedrock console, I choose Model access from the navigation pane to enable access to Palmyra X5 and Palmyra X4 models.

Console screenshot

When I have access to the models, I can start building applications with any AWS SDKs using the Amazon Bedrock Converse API. The models use cross-Region inference with these inference profiles:

  • For Palmyra X5: us.writer.palmyra-x5-v1:0
  • For Palmyra X4: us.writer.palmyra-x4-v1:0

Here’s a sample implementation with the AWS SDK for Python (Boto3). In this scenario, there is a new version of an existing product. I need to prepare a detailed comparison of what’s new. I have the old and new product manuals. I use the large input context of Palmyra X5 to read and compare the two versions of the manual and prepare a first draft of the comparison document.

import sys
import os
import boto3
import re

AWS_REGION = "us-west-2"
MODEL_ID = "us.writer.palmyra-x5-v1:0"
DEFAULT_OUTPUT_FILE = "product_comparison.md"

def create_bedrock_runtime_client(region: str = AWS_REGION):
    """Create and return a Bedrock client."""
    return boto3.client('bedrock-runtime', region_name=region)

def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    return os.path.splitext(filename)[1].lower()[1:] or 'txt'

def sanitize_document_name(filename: str) -> str:
    """Sanitize document name."""
    # Remove extension and get base name
    name = os.path.splitext(filename)[0]
    
    # Replace invalid characters with space
    name = re.sub(r'[^a-zA-Z0-9\s\-\(\)\[\]]', ' ', name)
    
    # Replace multiple spaces with single space
    name = re.sub(r'\s+', ' ', name)
    
    # Strip leading/trailing spaces
    return name.strip()

def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

def generate_comparison(client, document1: bytes, document2: bytes, filename1: str, filename2: str) -> str:
    """Generate a markdown comparison of two product manuals."""
    print(f"Generating comparison for {filename1} and {filename2}")
    try:
        response = client.converse(
            modelId=MODEL_ID,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "text": "Please compare these two product manuals and create a detailed comparison in markdown format. Focus on comparing key features, specifications, and highlight the main differences between the products."
                        },
                        {
                            "document": {
                                "format": get_file_extension(filename1),
                                "name": sanitize_document_name(filename1),
                                "source": {
                                    "bytes": document1
                                }
                            }
                        },
                        {
                            "document": {
                                "format": get_file_extension(filename2),
                                "name": sanitize_document_name(filename2),
                                "source": {
                                    "bytes": document2
                                }
                            }
                        }
                    ]
                }
            ]
        )
        return response['output']['message']['content'][0]['text']
    except Exception as e:
        raise Exception(f"Error generating comparison: {str(e)}")

def main():
    if len(sys.argv) < 3 or len(sys.argv) > 4:
        cmd = sys.argv[0]
        print(f"Usage: {cmd} <manual1_path> <manual2_path> [output_file]")
        sys.exit(1)

    manual1_path = sys.argv[1]
    manual2_path = sys.argv[2]
    output_file = sys.argv[3] if len(sys.argv) == 4 else DEFAULT_OUTPUT_FILE
    paths = [manual1_path, manual2_path]

    # Check each file's existence
    for path in paths:
        if not os.path.exists(path):
            print(f"Error: File does not exist: {path}")
            sys.exit(1)

    try:
        # Create Bedrock client
        bedrock_runtime = create_bedrock_runtime_client()

        # Read both manuals
        print("Reading documents...")
        manual1_content = read_file(manual1_path)
        manual2_content = read_file(manual2_path)

        # Generate comparison directly from the documents
        print("Generating comparison...")
        comparison = generate_comparison(
            bedrock_runtime,
            manual1_content,
            manual2_content,
            os.path.basename(manual1_path),
            os.path.basename(manual2_path)
        )

        # Save comparison to file
        with open(output_file, 'w') as f:
            f.write(comparison)

        print(f"Comparison generated successfully! Saved to {output_file}")

    except Exception as e:
        print(f"Error: {str(e)}")
        sys.exit(1)

if __name__ == "__main__":
    main()

To learn how to use Amazon Bedrock with AWS SDKs, browse the code samples in the Amazon Bedrock User Guide.

Things to know
Writer Palmyra X5 and X4 models are available in Amazon Bedrock today in the US West (Oregon) AWS Region with cross-Region inference. For the most up-to-date information on model support by Region, refer to the Amazon Bedrock documentation. For information on pricing, visit Amazon Bedrock pricing.

These models support English, Spanish, French, German, Chinese, and multiple other languages, making them suitable for global enterprise applications.

Using the expansive context capabilities of these models, developers can build more sophisticated applications and agents that can process extensive documents, perform complex multistep reasoning, and handle sophisticated agentic workflows.

To start using Writer Palmyra X5 and X4 models today, visit the Writer model section in the Amazon Bedrock User Guide. You can also explore how our Builder communities are using Amazon Bedrock in their solutions in the generative AI section of our community.aws site.

Let us know what you build with these powerful new capabilities!

Danilo


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

AWS Weekly Roundup: Amazon Q Developer, AWS Account Management updates, and more (April 28, 2025)

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-q-developer-aws-account-management-updates-and-more-april-28-2025/

Summit season is in full throttle! If you haven’t been to an AWS Summit, I highly recommend you check one out that’s nearby. They are large-scale all-day events where you can attend talks, watch interesting demos and activities, connect with AWS and industry people, and more. Best of all, they are free—so all you need to do is register! You can find a list of them here in the AWS Events page. Incidentally, you can also discover other AWS events going in your area on that same page; just use the filters on the side to find something that interests you.

Speaking of AWS Summits, this week is the AWS Summit London (April 30). It’s local for me, and I have been heavily involved in the planning. You do not want to miss this! Make sure to check it out and hopefully I’ll be seeing you there.

Ready to find out some highlights from last week’s exciting AWS launches? Let’s go!

New features and capabilities highlights
Let’s start by looking at some of the enhancements launched last week.

  • Amazon Q Developer releases state of the art agent for feature development — AWS has announced an update to Amazon Q Developer’s software development agent, which achieves state-of-the-art performance on industry benchmarks and can generate multiple candidate solutions for coding problems. This new agent provides more reliable suggestions helping to reduce debugging time and enabling developers to focus on higher-level design and innovation.
  • Amazon Cognito now supports refresh token rotation — Amazon Cognito now supports OAuth 2.0 refresh token rotation, allowing user pool clients to automatically replace existing refresh tokens with new ones at regular intervals, enhancing security without requiring users to re-authenticate. This feature helps customers achieve both seamless user experience and improved security by automatically updating refresh tokens frequently, rather than having to choose between long-lived tokens for convenience, or short-lived tokens for security.
  • Amazon Bedrock Intelligent Prompt Routing is now generally available — Amazon Bedrock’s Intelligent Prompt Routing, now generally available, automatically routes prompts to different foundation models within a model family to optimize response quality and cost. The service now offers increased configurability across multiple model families including Claude (Anthropic), Llama (Meta), and Nova (Amazon), allowing users to choose any two models from a family and set custom routing criteria.
  • Upgrades to Amazon Q Business integrations for M365 Word and Outlook — Amazon Q Business integrations for Microsoft Word and Outlook now have the ability to search company knowledge bases, support image attachments, and handle larger context windows for more detailed prompts. These enhancements enable users to seamlessly access indexed company data and incorporate richer content while working on documents and emails, without needing to switch between different applications or contexts.

Security
There were a few new security improvements released last week, but these are the ones that caught my eye:

  • AWS Account Management now supports account name update via authorized IAM principals — AWS now allows IAM principals to update account names, removing the previous requirement for root user access. This applies to both standalone accounts and member accounts within AWS Organizations, where authorized IAM principals in management and delegated admin accounts can manage account names centrally.
  • AWS Resource Explorer now supports AWS PrivateLink — AWS Resource Explorer now supports AWS PrivateLink across all commercial Regions, enabling secure resource discovery and search capabilities across AWS Regions and accounts within your VPC, without requiring public internet access.
  • Amazon SageMaker Lakehouse now supports attribute based access control — Amazon SageMaker Lakehouse now supports attribute-based access control (ABAC), allowing administrators to manage data access permissions using dynamic attributes associated with IAM identities rather than creating individual policies. This simplifies access management by enabling permissions to be automatically granted to any IAM principal with matching tags, making it more efficient to handle access control as teams grow.

Networking
As you may be aware, there is a growing industry push to adopt IPv6 as the default protocol for new systems while migrating existing infrastructure where possible. This week, two more services have added their support to help customers towards that goal:

Capacity and costs
Customers using Amazon Kinesis Data Streams can enjoy higher default quotas, while Amazon Redshift Serverless customers get a new cost saving opportunity.

For a full list of AWS announcements, be sure to visit the What’s New with AWS? page.

Recommended Learning Resources
Everyone’s talking about MCP recently! Here are two great blog posts that I think will help you catch up and learn more about the possibilities of how to use MCP on AWS.

Our Weekly Roundup is published every Monday to help you keep up with AWS launches, so don’t forget to check it again next week for more exciting news!

Enjoy the rest of your day!


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Announcing AWS Security Reference Architecture Code Examples for Generative AI

Post Syndicated from Ievgeniia Ieromenko original https://aws.amazon.com/blogs/security/announcing-aws-security-reference-architecture-code-examples-for-generative-ai/

Amazon Web Services (AWS) is pleased to announce the release of new Security Reference Architecture (SRA) code examples for securing generative AI workloads. The examples include two comprehensive capabilities focusing on secure model inference and RAG implementations, covering a wide range of security controls and best practices for AWS generative AI services.

These new code examples are available in the AWS SRA Examples Repository and include ready-to-deploy CloudFormation templates for implementing detective security controls such as network segmentation, identity management, encryption, prompt injection detection, and logging and monitoring. The solutions align with the AWS SRA Design Guidance page and demonstrate our commitment to helping customers secure their generative AI implementations.

Customers can get started with these examples by following the implementation instructions for each solution in the AWS SRA Examples Repository Solutions GenAI page. Additional documentation and implementation guidance is available in the AWS SRA Design Guidance Generative AI Architecture Deep Dive.

AWS strives to continuously provide security solutions that help customers meet their security architecture needs. Customers can reach out to the team by submitting an issue in the code repository.

If you have feedback about this post, submit comments in the Comments section below.

Ievgeniia Ieromenko

Ievgeniia Ieromenko

Ievgeniia a Security Engineer at AWS, focusing on cloud security architecture and best practices. She is a key contributor to the AWS Security Reference Architecture GitHub repository, helping customers implement secure cloud environments.

Liam Schneider

Liam Schneider

Liam is a Sr. Security Engineer with deep experience in cloud and application security, focused on reducing risk, improving system resilience, and aligning security with business needs. Liam has a strong background in compliance, team leadership, and building secure, scalable solutions across complex environments. He is known for practical, effective approaches to modern security challenges in both enterprise and cloud-first organizations.

Justin Kontny

Justin Kontny

Justin is a Sr. Security Engineer at AWS who combines his passion for software development with expertise in cloud security. He focuses on transforming security from a barrier to a business enabler through innovative AI-driven automation. When not pushing the boundaries of cloud security, Justin enjoys time with his children and being active outdoors.