Tag Archives: Amazon Machine Learning

Amazon Bedrock adds reinforcement ﬁne-tuning simplifying how developers build smarter, more accurate AI models

2025-12-03 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/improve-model-accuracy-with-reinforcement-fine-tuning-in-amazon-bedrock/

Organizations face a challenging trade-off when adapting AI models to their specific business needs: settle for generic models that produce average results, or tackle the complexity and expense of advanced model customization. Traditional approaches force a choice between poor performance with smaller models or the high costs of deploying larger model variants and managing complex infrastructure. Reinforcement fine-tuning is an advanced technique that trains models using feedback instead of massive labeled datasets, but implementing it typically requires specialized ML expertise, complicated infrastructure, and significant investment—with no guarantee of achieving the accuracy needed for specific use cases.

Today, we’re announcing reinforcement fine-tuning in Amazon Bedrock, a new model customization capability that creates smarter, more cost-effective models that learn from feedback and deliver higher-quality outputs for specific business needs. Reinforcement fine-tuning uses a feedback-driven approach where models improve iteratively based on reward signals, delivering 66% accuracy gains on average over base models.

Amazon Bedrock automates the reinforcement fine-tuning workflow, making this advanced model customization technique accessible to everyday developers without requiring deep machine learning (ML) expertise or large labeled datasets.

How reinforcement fine-tuning works
Reinforcement fine-tuning is built on top of reinforcement learning principles to address a common challenge: getting models to consistently produce outputs that align with business requirements and user preferences.

While traditional fine-tuning requires large, labeled datasets and expensive human annotation, reinforcement fine-tuning takes a different approach. Instead of learning from fixed examples, it uses reward functions to evaluate and judge which responses are considered good for particular business use cases. This teaches models to understand what makes a quality response without requiring massive amounts of pre-labeled training data, making advanced model customization in Amazon Bedrock more accessible and cost-effective.

Here are the benefits of using reinforcement fine-tuning in Amazon Bedrock:

Ease of use – Amazon Bedrock automates much of the complexity, making reinforcement fine-tuning more accessible to developers building AI applications. Models can be trained using existing API logs in Amazon Bedrock or by uploading datasets as training data, eliminating the need for labeled datasets or infrastructure setup.
Better model performance – Reinforcement fine-tuning improves model accuracy by 66% on average over base models, enabling optimization for price and performance by training smaller, faster, and more efficient model variants. This works with Amazon Nova 2 Lite model, improving quality and price performance for specific business needs, with support for additional models coming soon.
Security – Data remains within the secure AWS environment throughout the entire customization process, mitigating security and compliance concerns.

The capability supports two complementary approaches to provide flexibility for optimizing models:

Reinforcement Learning with Verifiable Rewards (RLVR) uses rule-based graders for objective tasks like code generation or math reasoning.
Reinforcement Learning from AI Feedback (RLAIF) employs AI-based judges for subjective tasks like instruction following or content moderation.

Getting started with reinforcement fine-tuning in Amazon Bedrock
Let’s walk through creating a reinforcement fine-tuning job.

First, I access the Amazon Bedrock console. Then, I navigate to the Custom models page. I choose Create and then choose Reinforcement fine-tuning job.

I start by entering the name of this customization job and then select my base model. At launch, reinforcement fine-tuning supports Amazon Nova 2 Lite, with support for additional models coming soon.

Next, I need to provide training data. I can use my stored invocation logs directly, eliminating the need to upload separate datasets. I can also upload new JSONL files or select existing datasets from Amazon Simple Storage Service (Amazon S3). Reinforcement fine-tuning automatically validates my training dataset and supports the OpenAI Chat Completions data format. If I provide invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to the Chat Completions format.

The reward function setup is where I define what constitutes a good response. I have two options here. For objective tasks, I can select Custom code and write custom Python code that gets executed through AWS Lambda functions. For more subjective evaluations, I can select Model as judge to use foundation models (FMs) as judges by providing evaluation instructions.

Here, I select Custom code, and I create a new Lambda function or use an existing one as a reward function. I can start with one of the provided templates and customize it for my specific needs.

I can optionally modify default hyperparameters like learning rate, batch size, and epochs.

For enhanced security, I can configure virtual private cloud (VPC) settings and AWS Key Management Service (AWS KMS) encryption to meet my organization’s compliance requirements. Then, I choose Create to start the model customization job.

During the training process, I can monitor real-time metrics to understand how the model is learning. The training metrics dashboard shows key performance indicators including reward scores, loss curves, and accuracy improvements over time. These metrics help me understand whether the model is converging properly and if the reward function is effectively guiding the learning process.

When the reinforcement fine-tuning job is completed, I can see the final job status on the Model details page.

Once the job is completed, I can deploy the model with a single click. I select Set up inference, then choose Deploy for on-demand.

Here, I provide a few details for my model.

After deployment, I can quickly evaluate the model’s performance using the Amazon Bedrock playground. This helps me to test the fine-tuned model with sample prompts and compare its responses against the base model to validate the improvements. I select Test in playground.

The playground provides an intuitive interface for rapid testing and iteration, helping me confirm that the model meets my quality requirements before integrating it into production applications.

Interactive demo
Learn more by navigating an interactive demo of Amazon Bedrock reinforcement fine-tuning in action.

Additional things to know
Here are key points to note:

Templates — There are seven ready-to-use reward function templates covering common use cases for both objective and subjective tasks.
Pricing — To learn more about pricing, refer to the Amazon Bedrock pricing page.
Security — Training data and custom models remain private and aren’t used to improve FMs for public use. It supports VPC and AWS KMS encryption for enhanced security.

Get started with reinforcement fine-tuning by visiting the reinforcement fine-tuning documentation and by accessing the Amazon Bedrock console.

Happy building!
— Donnie

Amazon Bedrock AgentCore adds quality evaluations and policy controls for deploying trusted AI agents

2025-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-bedrock-agentcore-adds-quality-evaluations-and-policy-controls-for-deploying-trusted-ai-agents/

Today, we’re announcing new capabilities in Amazon Bedrock AgentCore to further remove barriers holding AI agents back from production. Organizations across industries are already building on AgentCore, the most advanced agentic platform to build, deploy, and operate highly capable agents securely at any scale. In just 5 months since preview, the AgentCore SDK has been downloaded over 2 million times. For example:

PGA TOUR, a pioneer and innovation leader in sports has built a multi-agent content generation system to create articles for their digital platforms. The new solution, built on AgentCore, enables the PGA TOUR to provide comprehensive coverage for every player in the field, by increasing content writing speed by 1,000 percent while achieving a 95 percent reduction in costs.
Independent software vendors (ISVs) like Workday are building the software of the future on AgentCore. AgentCore Code Interpreter provides Workday Planning Agent with secure data protection and essential features for financial data exploration. Users can analyze financial and operational data through natural language queries, making financial planning intuitive and self-driven. This capability reduces time spent on routine planning analysis by 30 percent, saving approximately 100 hours per month.
Grupo Elfa, a Brazilian distributor and retailer, relies on AgentCore Observability for complete audit traceability and real-time metrics of their agents, transforming their reactive processes into proactive operations. Using this unified platform, their sales team can handle thousands of daily price quotes while the organization maintains full visibility of agent decisions, helping achieve 100 percent traceability of agent decisions and interactions, and reduced problem resolution time by 50 percent.

As organizations scale their agent deployments, they face challenges around implementing the right boundaries and quality checks to confidently deploy agents. The autonomy that makes agents powerful also makes them hard to confidently deploy at scale, as they might access sensitive data inappropriately, make unauthorized decisions, or take unexpected actions. Development teams must balance enabling agent autonomy while ensuring they operate within acceptable boundaries and with the quality you require to put them in front of customers and employees.

The new capabilities available today take the guesswork out of this process and help you build and deploy trusted AI agents with confidence:

Policy in AgentCore (Preview) – Defines clear boundaries for agent actions by intercepting AgentCore Gateway tool calls before they run using policies with fine-grained permissions.
AgentCore Evaluations (Preview) – Monitors the quality of your agents based on real-world behavior using built-in evaluators for dimensions such as correctness and helpfulness, plus custom evaluators for business-specific requirements.

We’re also introducing features that expand what agents can do:

Episodic functionality in AgentCore Memory – A new long-term strategy that helps agents learn from experiences and adapt solutions across similar situations for improved consistency and performance in similar future tasks.
Bidirectional streaming in AgentCore Runtime – Deploys voice agents where both users and agents can speak simultaneously following a natural conversation flow.

Policy in AgentCore for precise agent control
Policy gives you control over the actions agents can take and are applied outside of the agent’s reasoning loop, treating agents as autonomous actors whose decisions require verification before reaching tools, systems, or data. It integrates with AgentCore Gateway to intercept tool calls as they happen, processing requests while maintaining operational speed, so workflows remain fast and responsive.

You can create policies using natural language or directly use Cedar—an open source policy language for fine-grained permissions—simplifying the process to set up, understand, and audit rules without writing custom code. This approach makes policy creation accessible to development, security, and compliance teams who can create, understand, and audit rules without specialized coding knowledge.

The policies operate independently of how the agent was built or which model it uses. You can define which tools and data agents can access—whether they are APIs, AWS Lambda functions, Model Context Protocol (MCP) servers, or third-party services—what actions they can perform, and under what conditions.

Teams can define clear policies once and apply them consistently across their organization. With policies in place, developers gain the freedom to create innovative agentic experiences, and organizations can deploy their agents to act autonomously while knowing they’ll stay within defined boundaries and compliance requirements.

Using Policy in AgentCore
You can start by creating a policy engine in the new Policy section of the AgentCore console and associate it with one or more AgentCore gateways.

A policy engine is a collection of policies that are evaluated at the gateway endpoint. When associating a gateway with a policy engine, you can choose whether to enforce the result of the policy—effectively permitting or denying access to a tool call—or to only emit logs. Using logs helps you test and validate a policy before enabling it in production.

Then, you can define the policies to apply to have granular control over access to the tools offered by the associated AgentCore gateways.

To create a policy, you can start with a natural language description (that should include information of the authentication claims to use) or directly edit Cedar code.

Natural language-based policy authoring provides a more accessible way for you to create fine-grained policies. Instead of writing formal policy code, you can describe rules in plain English. The system interprets your intent, generates candidate policies, validates them against the tool schema, and uses automated reasoning to check safety conditions—identifying prompts that are overly permissive, overly restrictive, or contain conditions that can never be satisfied.

Unlike generic large language model (LLM) translations, this feature understands the structure of your tools and generates policies that are both syntactically correct and semantically aligned with your intent, while flagging rules that cannot be enforced. It is also available as a Model Context Protocol (MCP) server, so you can author and validate policies directly in your preferred AI-assisted coding environment as part of your normal development workflow. This approach reduces onboarding time and helps you write high-quality authorization rules without needing Cedar expertise.

The following sample policy uses information from the OAuth claims in the JWT token used to authenticate to an AgentCore gateway (for the role) and the arguments passed to the tool call (context.input) to validate access to the tool processing a refund. Only an authenticated user with the refund-agent role can access the tool but for amounts (context.input.amount) lower than $200 USD.

permit(
  principal is AgentCore::OAuthUser,
  action == AgentCore::Action::"RefundTool__process_refund",
  resource == AgentCore::Gateway::"<GATEWAY_ARN>"
)
when {
  principal.hasTag("role") &&
  principal.getTag("role") == "refund-agent" &&
  context.input.amount < 200
};

AgentCore Evaluations for continuous, real-time quality intelligence
AgentCore Evaluations is a fully managed service that helps you continuously monitor and analyze agent performance based on real-world behavior. With AgentCore Evaluations, you can use built-in evaluators for common quality dimensions such as correctness, helpfulness, tool selection accuracy, safety, goal success rate, and context relevance. You can also create custom model-based scoring systems configured with your choice of prompt and model for business-tailored scoring while the service samples live agent interactions and scores them continuously.

All results from AgentCore Evaluations are visualized in Amazon CloudWatch alongside AgentCore Observability insights, providing one place for unified monitoring. You can also set up alerts and alarms on the evaluation scores to proactively monitor agent quality and respond when metrics fall outside acceptable thresholds.

You can use AgentCore Evaluations during the testing phase where you can check an agent against the baseline before deployment to stop faulty versions from reaching users, and in production for continuous improvement of your agents. When quality metrics drop below defined thresholds—such as a customer service agent satisfaction declining or politeness scores dropping by more than 10 percent over an 8-hour period—the system triggers immediate alerts, helping to detect and address quality issues faster.

Using AgentCore Evaluations
You can create an online evaluation in the new Evaluations section of the AgentCore console. You can use as data source an AgentCore agent endpoint or a CloudWatch log group used by an external agent. For example, I use here the same sample customer support agent I shared when we introduced AgentCore in preview.

Then, you can select the evaluators to use, including custom evaluators that you can define starting from the existing templates or build from scratch.

For example, for a customer support agent, you can select metrics such as:

Correctness – Evaluates whether the information in the agent’s response is factually accurate
Faithfulness – Evaluates whether information in the response is supported by provided context/sources
Helpfulness – Evaluates from user’s perspective how useful and valuable the agent’s response is
Harmfulness – Evaluates whether the response contains harmful content
Stereotyping – Detects content that makes generalizations about individuals or groups

The evaluators for tool selection and tool parameter accuracy can help you understand if an agent is choosing the right tool for a task and extracting the correct parameters from the user queries.

To complete the creation of the evaluation, you can choose the sampling rate and optional filters. For permissions, you can create a new AWS Identity and Access Management (IAM) service role or pass an existing one.

The results are published, as they are evaluated, on Amazon CloudWatch in the AgentCore Observability dashboard. You can choose any of the bar chart sections to see the corresponding traces and gain deeper insight into the requests and responses behind that specific evaluation.

Because the results are in CloudWatch, you can use all of its feature to create, for example, alarms and automations.

Creating custom evaluators in AgentCore Evaluations
Custom evaluators allow you to define business-specific quality metrics tailored to your agent’s unique requirements. To create a custom evaluator, you provide the model to use as a judge, including inference parameters such as temperature and max output tokens, and a tailored prompt with the judging instructions. You can start from the prompt used by one of the built-in evaluators or enter a new one.

Then, you define the scale to produce in output. It can be either numeric values or custom text labels that you define. Finally, you configure whether the evaluation is computed by the model on single traces, full sessions, or for each tool call.

AgentCore Memory episodic functionality for experience-based learning
AgentCore Memory, a fully managed service that gives AI agents the ability to remember past interactions, now includes a new long-term memory strategy that gives agents the ability to learn from past experiences and apply those lessons to provide more helpful assistance in future interactions.

Consider booking travel with an agent: over time, the agent learns from your booking patterns—such as the fact that you often need to move flights to later times when traveling for work due to client meetings. When you start your next booking involving client meetings, the agent proactively suggests flexible return options based on these learned patterns. Just like an experienced assistant who learns your specific travel habits, agents with episodic memory can now recognize and adapt to your individual needs.

When you enable the new episodic functionality, AgentCore Memory captures structured episodes that record the context, reasoning process, actions taken, and outcomes of agent interactions, while a reflection agent analyzes these episodes to extract broader insights and patterns. When facing similar tasks, agents can retrieve these learnings to improve decision-making consistency and reduce processing time. This reduces the need for custom instructions by including in the agent context only the specific learnings an agent needs to complete a task instead of a long list of all possible suggestions.

AgentCore Runtime bidirectional streaming for more natural conversations
With AgentCore Runtime, you can deploy agentic applications with few lines of code. To simplify deploying conversational experiences that feel natural and responsive, AgentCore Runtime now supports bidirectional streaming. This capability enables voice agents to listen and adapt while users speak, so that people can interrupt agents mid-response and have the agent immediately adjust to the new context—without waiting for the agent to finish its current output. Rather than traditional turn-based interaction where users must wait for complete responses, bidirectional streaming creates flowing, natural conversations where agents dynamically change their response based on what the user is saying.

Building these conversational experiences from the ground up requires significant engineering effort to handle the complex flow of simultaneous communication. Bidirectional streaming simplifies this by managing the infrastructure needed for agents to process input while generating output, handling interruptions gracefully, and maintaining context throughout dynamic conversation shifts. You can now deploy agents that naturally adapt to the fluid nature of human conversation—supporting mid-thought interruptions, context switches, and clarifications without losing the thread of the interaction.

Things to know
Amazon Bedrock AgentCore, including the preview of Policy, is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland) AWS Regions . The preview of AgentCore Evaluations is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt) Regions. For Regional availability and future roadmap, visit AWS Capabilities by Region.

With AgentCore, you pay for what you use with no upfront commitments. For detailed pricing information, visit the Amazon Bedrock pricing page. AgentCore is also a part of the AWS Free Tier that new AWS customers can use to get started at no cost and explore key AWS services.

These new features work with any open source framework such as CrewAI, LangGraph, LlamaIndex, and Strands Agents, and with any foundation model. AgentCore services can be used together or independently, and you can get started using your favorite AI-assisted development environment with the AgentCore open source MCP server.

To learn more and get started quickly, visit the AgentCore Developer Guide.

— Danilo

Amazon Bedrock adds 18 fully managed open weight models, including the new Mistral Large 3 and Ministral 3 models

2025-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-bedrock-adds-fully-managed-open-weight-models/

Today, we’re announcing the general availability of an additional 18 fully managed open weight models in Amazon Bedrock from Google, Moonshot AI, MiniMax AI, Mistral AI, NVIDIA, OpenAI, and Qwen, including the new Mistral Large 3 and Ministral 3 3B, 8B, and 14B models.

With this launch, Amazon Bedrock now provides nearly 100 serverless models, offering a broad and deep range of models from leading AI companies, so customers can choose the precise capabilities that best serve their unique needs. By closely monitoring both customer needs and technological advancements, we regularly expand our curated selection of models based on customer needs and technological advancements to include promising new models alongside established industry favorites.

This ongoing expansion of high-performing and differentiated model offerings helps customers stay at the forefront of AI innovation. You can access these models on Amazon Bedrock through the unified API, evaluate, switch, and adopt new models without rewriting applications or changing infrastructure.

New Mistral AI models
These four Mistral AI models are now available first on Amazon Bedrock, each optimized for different performance and cost requirements:

Mistral Large 3 – This open weight model is optimized for long-context, multimodal, and instruction reliability. It excels in long document understanding, agentic and tool use workflows, enterprise knowledge work, coding assistance, advanced workloads such as math and coding tasks, multilingual analysis and processing, and multimodal reasoning with vision.
Ministral 3 3B – The smallest in the Ministral 3 family is edge-optimized for single GPU deployment with strong language and vision capabilities. It shows robust performance in image captioning, text classification, real-time translation, data extraction, short content generation, and lightweight real-time applications on edge or low-resource devices.
Ministral 3 8B – The best-in-class Ministral 3 model for text and vision is edge-optimized for single GPU deployment with high performance and minimal footprint. This model is ideal for chat interfaces in constrained environments, image and document description and understanding, specialized agentic use cases, and balanced performance for local or embedded systems.
Ministral 3 14B – The most capable Ministral 3 model delivers state-of the-art text and vision performance optimized for single GPU deployment. You can use advanced local agentic use cases and private AI deployments where advanced capabilities meet practical hardware constraints.

More open weight model options
You can use these open weight models for a wide range of use cases across industries:

Model provider	Model name	Description	Use cases
Google	Gemma 3 4B	Efficient text and image model that runs locally on laptops. Multilingual support for on-device AI applications.	On-device AI for mobile and edge applications, privacy-sensitive local inference, multilingual chat assistants, image captioning and description, and lightweight content generation.
	Gemma 3 12B	Balanced text and image model for workstations. Multi-language understanding with local deployment for privacy-sensitive applications.	Workstation-based AI applications; local deployment for enterprises; multilingual document processing, image analysis and Q&A; and privacy-compliant AI assistants.
	Gemma 3 27B	Powerful text and image model for enterprise applications. Multi-language support with local deployment for privacy and control.	Enterprise local deployment, high-performance multimodal applications, advanced image understanding, multilingual customer service, and data-sensitive AI workflows.
Moonshot AI	Kimi K2 Thinking	Deep reasoning model that thinks while using tools. Handles research, coding and complex workflows requiring hundreds of sequential actions.	Complex coding projects requiring planning, multistep workflows, data analysis and computation, and long-form content creation with research.
MiniMax AI	MiniMax M2	Built for coding agents and automation. Excels at multi-file edits, terminal operations and executing long tool-calling chains efficiently.	Coding agents and integrated development environment (IDE) integration, multi-file code editing, terminal automation and DevOps, long-chain tool orchestration, and agentic software development.
Mistral AI	Magistral Small 1.2	Excels at math, coding, multilingual tasks, and multimodal reasoning with vision capabilities for efficient local deployment.	Math and coding tasks, multilingual analysis and processing, and multimodal reasoning with vision.
	Voxtral Mini 1.0	Advanced audio understanding model with transcription, multilingual support, Q&A, summarization, and function-calling.	Voice-controlled applications, fast speech-to-text conversion, and offline voice assistants.
	Voxtral Small 1.0	Features state-of-the-art audio input with best-in-class text performance; excels at speech transcription, translation, and understanding.	Enterprise speech transcription, multilingual customer service, and audio content summarization.
NVIDIA	NVIDIA Nemotron Nano 2 9B	High efficiency LLM with hybrid transformer Mamba design, excelling in reasoning and agentic tasks.	Reasoning, tool calling, math, coding, and instruction following.
NVIDIA	NVIDIA Nemotron Nano 2 VL 12B	Advanced multimodal reasoning model for video understanding and document intelligence, powering Retrieval-Augmented Generation (RAG) and multimodal agentic applications.	Multi-image and video understanding, visual Q&A, and summarization.
OpenAI	gpt-oss-safeguard-20b	Content safety model that applies your custom policies. Classifies harmful content with explanations for trust and safety workflows.	Content moderation and safety classification, custom policy enforcement, user-generated content filtering, trust and safety workflows, and automated content triage.
OpenAI	gpt-oss-safeguard-120b	Larger content safety model for complex moderation. Applies custom policies with detailed reasoning for enterprise trust and safety teams.	Enterprise content moderation at scale, complex policy interpretation, multilayered safety classification, regulatory compliance checking, high-stakes content review.
Qwen	Qwen3-Next-80B-A3B	Fast inference with hybrid attention for ultra-long documents. Optimized for RAG pipelines, tool use & agentic workflows with quick responses.	RAG pipelines with long documents, agentic workflows with tool calling, code generation and software development, multi-turn conversations with extended context, multilingual content generation.
Qwen	Qwen3-VL-235B-A22B	Understands images and video. Extracts text from documents, converts screenshots to working code, and automates clicking through interfaces.	Extracting text from images and PDFs, converting UI designs or screenshots to working code, automating clicks and navigation in applications, video analysis and understanding, reading charts and diagrams.

When implementing publicly available models, give careful consideration to data privacy requirements in your production environments, check for bias in output, and monitor your results for data security, responsible AI, and model evaluation.

You can access the enterprise-grade security features of Amazon Bedrock and implement safeguards customized to your application requirements and responsible AI policies with Amazon Bedrock Guardrails. You can also evaluate and compare models to identify the optimal models for your use cases by using Amazon Bedrock model evaluation tools.

To get started, you can quickly test these models with a few prompts in the playground of the Amazon Bedrock console or use any AWS SDKs to include access to the Bedrock InvokeModel and Converse APIs. You can also use these models with any agentic framework that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore and Strands Agents. To learn more, visit Code examples for Amazon Bedrock using AWS SDKs in the Amazon Bedrock User Guide.

Now available
Check the full Region list for availability and future updates of new models or search your model name in the AWS CloudFormation resources tab of AWS Capabilities by Region. To learn more, check out the Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give these models a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Introducing Amazon Nova 2 Lite, a fast, cost-effective reasoning model

2025-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-2-lite-a-fast-cost-effective-reasoning-model/

Today, we’re releasing Amazon Nova 2 Lite, a fast, cost-effective reasoning model for everyday workloads. Available in Amazon Bedrock, the model offers industry-leading price performance and helps enterprises and developers build capable, reliable, and efficient agentic-AI applications. For organizations who need AI that truly understands their domain, Nova 2 Lite is the best model to use with Nova Forge to build their own frontier intelligence.

Nova 2 Lite supports extended thinking, including step-by-step reasoning and task decomposition, before providing a response or taking action. Extended thinking is off by default to deliver fast, cost-optimized responses, but when deeper analysis is needed, you can turn it on and choose from three thinking budget levels: low, medium, or high, giving you control over the speed, intelligence, and cost tradeoff.

Nova 2 Lite supports text, image, video, document as input and offers a one million-token context window, enabling expanded reasoning and richer in-context learning. In addition, Nova 2 Lite can be customized for your specific business needs. The model also includes access to two built-in tools: web grounding and a code interpreter. Web grounding retrieves publicly available information with citations, while the code interpreter allows the model to run and evaluate code within the same workflow.

Amazon Nova 2 Lite demonstrates strong performance across diverse evaluation benchmarks. The model excels in core intelligence across multiple domains including instruction following, math, and video understanding with temporal reasoning. For agentic workflows, Nova 2 Lite shows reliable function calling for task automation and precise UI interaction capabilities. The model also demonstrates strong code generation and practical software engineering problem-solving abilities.

Nova 2 Lite is built to meet your company’s needs
Nova 2 Lite can be used for a broad range of your everyday AI tasks. It offers the best combination of price, performance, and speed. Early customers are using Nova 2 Lite for customer service chatbots, document processing, and business process automation.

Nova 2 Lite can help support workloads across many different use cases:

Business applications – Automate business process workflow, intelligent document processing (IDP), customer support, and web search to improve productivity and outcomes
Software engineering – Generate code, debugging, refactoring, and migrating systems to accelerate development and increase efficiency
Business intelligence and research – Use long-horizon reasoning and web grounding to analyze internal and external sources to uncover insights, and make informed decisions

For specific requirements, Nova 2 Lite is also available for customization on both Amazon Bedrock and Amazon SageMaker AI.

Using Amazon Nova 2 Lite
In the Amazon Bedrock console, you can use the Chat/Text playground to quickly test the new model with your prompts. To integrate the model into your applications, you can use any AWS SDKs with the Amazon Bedrock InvokeModel and Converse API. Here’s a sample invocation using the AWS SDK for Python (Boto3).

import boto3

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Enable extended thinking for complex problem-solving
response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=[{
        "role": "user",
        "content": [{"text": "I need to optimize a logistics network with 5 warehouses, 12 distribution centers, and 200 retail locations. The goal is to minimize total transportation costs while ensuring no location is more than 50 miles from a distribution center. What approach should I take?"}]
    }],
    additionalModelRequestFields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

# The response will contain reasoning blocks followed by the final answer
for block in response["output"]["message"]["content"]:
    if "reasoningContent" in block:
        reasoning_text = block["reasoningContent"]["reasoningText"]["text"]
        print(f"Nova's thinking process:\n{reasoning_text}\n")
    elif "text" in block:
        print(f"Final recommendation:\n{block['text']}")

You can also use the new model with agentic frameworks that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore. In this way, you can build agents for a broad range of tasks. Here’s the sample code for an interactive multi-agent system using the Strands Agents SDK. The agents have access to multiple tools, including read and write file access and the possibility to run shell commands.

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

SYSTEM_PROMPT = (
    "You are a helpful assistant. "
    "Follow the instructions from the user. "
    "To help you with your tasks, you can dynamically create specialized agents and orchestrate complex workflows."
)

bedrock_model = BedrockModel(
    region_name=AWS_REGION,
    model_id=MODEL_ID,
    additional_request_fields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

agent = Agent(
    model=bedrock_model,
    system_prompt=SYSTEM_PROMPT,
    tools=[calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think]
)

while True:
    try:
        prompt = input("\nEnter your question (or 'quit' to exit): ").strip()
        if prompt.lower() in ['quit', 'exit', 'q']:
            break
        if len(prompt) > 0:
            agent(prompt)
    except KeyboardInterrupt:
        break
    except EOFError:
        break

print("\nGoodbye!")

Things to know
Amazon Nova 2 Lite is now available in Amazon Bedrock via global cross-Region inference in multiple locations. For Regional availability and future roadmap, visit AWS Capabilities by Region.

Nova 2 Lite includes built-in safety controls to promote responsible AI use, with content moderation capabilities that help maintain appropriate outputs across a wide range of applications.

To understand the costs, see Amazon Bedrock pricing. To learn more, visit the Amazon Nova User Guide.

Start building with Nova 2 Lite today. To experiment with the new model, visit the Amazon Nova interactive website. Try the model in the Amazon Bedrock console, and share your feedback on AWS re:Post.

— Danilo

Introducing Amazon Nova Forge: Build your own frontier models using Nova

2025-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-forge-build-your-own-frontier-models-using-nova/

Organizations are rapidly expanding their use of generative AI across all parts of the business. Applications requiring deep domain expertise or specific business context need models that truly understand their proprietary knowledge, workflows, and unique requirements.

While techniques like prompt engineering and Retrieval Augmented Generation (RAG) work well for many use cases, they have fundamental limitations when it comes to embedding specialized knowledge into a model’s core understanding. Supervised fine-tuning and reinforcement learning help in customizing the model, but they operate too late in the development lifecycle, layering modifications on top of models that are a fully trained, and therefore difficult to steer to specific domains of interest.

When organizations attempt deeper customization through Continued Pre-Training (CPT) using only their proprietary data, they often encounter catastrophic forgetting, where models lose their foundational capabilities as they learn new content. At the same time, the data, compute, and cost needed for training a model from scratch are still a prohibitive barrier for most organizations.

Today, we’re introducing Amazon Nova Forge, a new service to build your own frontier models using Nova. Nova Forge customers can start their development from early model checkpoints, blend their datasets with Amazon Nova-curated training data, and host their custom models securely on AWS. Nova Forge is the easiest and most cost-effective way to build your own frontier model.

Use cases and applications
Nova Forge is designed for organizations with access to proprietary or industry-specific data who want to build AI that truly understands their domain. This includes:

Manufacturing and automation – Building models that understand specialized processes, equipment data, and industry-specific workflows
Research and development – Creating models trained on proprietary research data and domain-specific knowledge
Content and media – Developing models that understand brand voice, content standards, and specific moderation requirements
Specialized industries – Training models on industry-specific terminology, regulations, and best practices

Depending on the specific use cases, Nova Forge can be used to add differentiated capabilities, enhance task-specific accuracy, reduce costs, and lower latency.

How Nova Forge works
Nova Forge addresses the limitations of current customization approaches by allowing you to start model development from early checkpoints across pre-training, mid-training, and post-training phases. You can blend your proprietary data with Amazon Nova-curated data throughout all training phases, running training using proven recipes on Amazon SageMaker AI fully managed infrastructure. This data mixing approach significantly reduces catastrophic forgetting compared to training with raw data alone, helping preserve foundational skills—including core intelligence, general instruction following capabilities, and safety benefits—while incorporating your specialized knowledge.

Nova Forge provides the ability to use reward functions in your own environment for reinforcement learning (RL). This allows the model to learn from feedback generated in environments that are representative of your use cases. Beyond single-step evaluations, you can also use your own orchestrator to manage multi-turn rollouts, enabling RL training for complex agent workflows and sequential decision-making tasks. Whether you’re using chemistry tools to score molecular designs, or robotics simulations that reward efficient task completion and penalize collisions, you can connect your proprietary environments directly.

You can also take advantage of the built-in responsible AI toolkit available in Nova Forge to configure the safety and content moderation settings of your model. You can adjust settings to meet your specific business needs in areas like safety, security, and handling of sensitive content.

Getting started with Nova Forge
Nova Forge integrates seamlessly with your existing AWS workflows. You can use the familiar tools and infrastructure in Amazon SageMaker AI to run your training, then import your custom Nova models as private models on Amazon Bedrock. This gives you the same security, consistent APIs, and broader AWS integrations as any model in Amazon Bedrock.

In Amazon SageMaker Studio, you can now build your frontier model with Amazon Nova.

To start building the model, choose which checkpoint to use: pre-trained, mid-trained, or post-trained. You can also upload your dataset here or use existing datasets.

You can blend your training data by mixing in curated datasets provided by Nova. These datasets, categorized by domain, can help your model to preserve general performance and prevent overfitting or catastrophic forgetting.

Optionally, you can choose to use Reinforcement Fine-Tuning (RFT) to improve factual accuracy and reduce hallucinations in specific domains.

When training completes, import the model into Amazon Bedrock and start using it in your applications.

Things to know
Amazon Nova Forge is available in the US East (N. Virginia) AWS Region. The program includes access to multiple Nova model checkpoints, training recipes to mix proprietary data with Amazon Nova-curated training data, proven training recipes, and integration with Amazon SageMaker AI and Amazon Bedrock.

Learn more in the Amazon Nova User Guide and explore Nova Forge from the Amazon SageMaker AI console.

Organizations interested in expert assistance can also reach out to our Generative AI Innovation Center for additional support with their model development initiatives.

— Danilo

Building with AI-DLC using Amazon Q Developer

2025-11-29 Will Matos

Post Syndicated from Will Matos original https://aws.amazon.com/blogs/devops/building-with-ai-dlc-using-amazon-q-developer/

The AI-Driven Development Life Cycle (AI-DLC) methodology marks a significant change in software development by strategically assigning routine tasks to AI while maintaining human oversight for critical decisions. Amazon Q Developer, a generative AI coding assistant, supports the entire software development lifecycle and offers the Project Rules feature, allowing users to tailor their development practices within the platform.

Recently, AWS made its AI-DLC workflow open-source, enabling developers to create software using this methodology. This workflow is implemented in Amazon Q Developer through its Project Rules customization feature. In this post, we will demonstrate how the AI-DLC workflow operates in Amazon Q Developer using an example use case.

AI-DLC Workflow Overview

The AI-DLC workflow is the practical implementation of the AI-DLC methodology for executing software development tasks. As outlined in the AI-DLC Method Definition Paper, the workflow has three phases. These phases are Inception, Construction, and Operations. Inception involves planning and architecture. Construction focuses on design and implementation. Operations cover deployment and monitoring. Each phase includes distinct stages. These stages address specific software development life cycle functions. The workflow adapts to project requirements. It analyzes requests, codebases, and complexity. This analysis determines the necessary stages. Simple bug fixes skip planning. They go directly to code generation. Complex features need requirements analysis. They also require architectural design and detailed testing.

The workflow maintains quality and control through structured milestones and transparent decision-making. At each phase, AI-DLC asks clarifying questions, creates execution plans, and waits for approval. Every decision, input, and response is logged in an audit trail for traceability. Whether building a new microservice, refactoring legacy code, or fixing a production bug, AI-DLC scales its rigor to match needs—comprehensive when complex, efficient when simple, and always in control. Figure 1 shows the phases and stages within the adaptive AI-DLC workflow. The stages shown in green boxes are mandatory, while those in yellow boxes are conditional.

AI-DLC workflow diagram showing three phases: Inception Phase (blue) with mandatory steps for Workspace Detection, Requirements Analysis, and Workflow Planning, plus conditional steps for Reverse Engineering, User Stories, Application Design, and Units Generation; Construction Phase (green) with conditional steps for Functional Design, NFR Requirements, NFR Design, and Infrastructure Design, followed by mandatory Code Generation and Build and Test steps that loop for each unit; and Operations Phase (orange) with an Operations step. The workflow flows from User Request at the top to Complete at the bottom.

Figure 1. SDLC phases and stages in AI-DLC workflow

Prerequisites

Before we begin the walk-through, we must have an AWS account or AWS Builder Id for authenticating Amazon Q Developer. If you don’t have one, sign up for AWS account or create an AWS builder id. You can use any of the Integrated Development Environments (IDEs) supported by Amazon Q Developer and install the extension as per the AWS documentation. In this post, we’ll be using the Amazon Q Developer extension in VS Code IDE. Once the plug-in is installed, you’ll need to authenticate Q Developer with the AWS cloud backend. Refer to the AWS documentation for Q Developer authentication instructions.

The AI-DLC workflow generates various Mermaid diagrams in markdown files. To view these diagrams within your IDE, you can install a Mermaid viewer plugin.

Let’s Begin Building!

Let’s construct a simple River Crossing Puzzle as a web UI app using AI-DLC. By choosing a straightforward app, we can concentrate more on learning the AI-DLC workflow and less on the project’s technical intricacies.

The sections below outline the individual steps in the AI-DLC development process using Amazon Q Developer. We’ll showcase screenshots of our IDE with the Amazon Q Developer plug-in and demonstrate how to interact with the workflow.

Although we’ve used the Amazon Q Developer IDE plug-in in this blog post, you can also use Kiro Command Line Interface (CLI) to build with AI-DLC without any additional setup. The workflow remains the same, except that you’ll interact through the command line instead of the graphical interface in the IDE.

As we progress through the workflow, your AI-DLC experience will be tailored to your specific problem statement. You’ll also notice the probabilistic nature of large language models (LLMs), as the questions and artifacts generated by them will vary from one run to another for the same problem statement. For example, if you attempt to replicate the same problem statement we used in this blog post, your experience will likely differ. This is expected and desirable. Despite these minor variations, we’ll eventually find a solution to the problem we initially set out to address.

Step 1: Clone GitHub repo containing the AI-DLC Q Developer Rules

Clone the GitHub repo containing the AI-DLC Q Developer Rules:

git clone https://github.com/awslabs/aidlc-workflows.git

Step 2: Load Q Developer Rules in your project workspace

Follow the README.md instructions in the GitHub repo to copy the rules files over to your project folder.

Step 3: Install and authenticate Amazon Q Developer Extension in IDE

Open the project folder you created in Step 2 in VS Code. Open the Amazon Q Chat Panel in the IDE and ensure that the AI-DLC workflow rules are loaded in Q Developer, as shown in Figure 2. If you don’t see what’s shown in Figure 2, please double-check the steps you performed in Step 2.

Screenshot showing four steps to access AI-DLC rules in Amazon Q: Step 1 shows opening Amazon Q Chat Panel from the left sidebar; Step 2 shows opening a chat session in Amazon Q at the top; Step 3 shows clicking on the Rules button in the chat interface; Step 4 shows ensuring AI-DLC rules are loaded in the rules panel on the right side of the screen.

Figure 2: AI-DLC rules enabling in Amazon Q Developer

Step 4: Start the AI-DLC workflow by entering a high-level problem statement

Our development environment is now set up, and we’re ready to begin application development using AI-DLC. In our Q Developer chat session, we enter the following problem statement:

Using AI-DLC let's build a web application to solve the river crossing puzzle.

Notice that we’ve prefixed our problem statement with “Using AI-DLC …” to ensure that Q Developer engages the AI-DLC workflow. Figure 3 shows what happens next. The AI-DLC workflow is triggered within Q Developer. It greets us with a welcome message and provides a brief overview of the AI-DLC methodology.

Figure 3 shows an expanded view of the AI-DLC workflow rules folder structure on the left. You’ll notice that a single aws-aidlc-rules/core-workflow.md is placed in the designated .amazonq/rules folder, while the rest of the rules files are placed in an ordinary aws-aidlc-rule-details folder. This arrangement is designed to optimize model efficiency. By placing the aws-aidlc-rules/core-workflow.md file in the .amazonq/rules folder, , it serves as additional context, ensuring that the core workflow structure is always accessible without incurring additional token consumption. Conversely, the detailed phase and stage-level behavior rules are stored in the aws-aidlc-rule-details folder and are dynamically loaded as required. This approach conserves Amazon Q’s context window and token usage by retaining only the necessary information within the context at any given time, thereby enhancing model efficiency.

The rules files under the aws-aidlc-rule-details folder are organized into three sub-folders, each representing a phase of AI-DLC. Within each phase, there are stage-specific files. A common folder houses cross-cutting rules applicable to all AI-DLC phases and stages such as the “human-in-the-loop”.

The AI-DLC workflow is self-guided and provides us with a clear understanding of what to expect next. It informs us that it will enter the AI-DLC Inception phase next, starting with the Workspace Detection stage within it.

Screenshot of Amazon Q interface showing AI-DLC workflow initialization. The left sidebar displays a file tree with various workflow stages and configuration files. The main chat area shows a problem statement input box at the top with placeholder text 'Using AI-DLC let's build a web application to solve the most pressing problem.' Below is a welcome message explaining AI-DLC (AI-Driven Development Life Cycle) and its capabilities. Three callout annotations highlight: 1) The core workflow dynamically utilizes detailed instructions for different phases and stages, loading and unloading them as required; 2) AI-DLC workflow kicks off with a welcome message and precise overview; 3) The workflow is structured to load a single Q Developer Rule file, one workflow.md, which then dynamically loads and unloads the stage definitions housed in the 'aws-aidlc-rule-details' folder as needed.

Figure 3: User enters high level problem statement in Amazon Q. AI-DLC workflow is triggered.

Step 5: Workspace Detection

We enter the Workspace Detection stage within the Inception phase. In this stage, AI-DLC analyzes the current workspace and determines whether it’s a greenfield (new) or brownfield (existing) application. Since AI-DLC is an adaptive workflow, it decides whether the next stage will be Reverse Engineering (for brownfield projects) or Requirements Analysis (for greenfield projects).

Since we’re building a greenfield application and there’s no existing code in the workspace to reverse engineer, the workflow will guide us to Requirements Analysis next. If we were working on a brownfield application, the workflow would have performed Reverse Engineering first and then moved on to Requirements Analysis. This demonstrates the adaptive nature of the workflow.

Figure 4 illustrates the process in our IDE when we enter this stage. The workflow requests our permission to create an aidlc-docs folder under the project root. This folder will serve as the repository for all the artifacts generated by AI-DLC during the workflow execution. Subsequently, the workflow generates two files within this folder: aidlc-state.md and audit.md. The purpose of these files is explained in Figure 4.

Screenshot of AI-DLC workspace detection phase showing the Amazon Q chat interface. The left sidebar displays the file tree with an 'aidlc-doc' folder highlighted. The main chat area shows the Inception Phase - Workspace Detection stage with explanatory text about analyzing the workspace. Five callout annotations explain: 1) Workflow creates aidlc-doc directory for storing AI-DLC generated artifacts; 2) The workflow tracks its progress in aidlc-metadata.json for error recovery and session continuity; 3) The audit.md file stores user's prompts; 4) Workflow highlights the AI-DLC phase and stage name with a clear heading for easy tracking; 5) Workflow loads detailed stage-level behavior files dynamically such that they don't consume the context window statically. At the bottom, a user approval prompt shows 'mkdir -p /Users/[...]/NewConsumerPortal/aidlc-docs' with the user asked to approve the 'mkdir' command.

Figure 4: Workspace Detection

The Workspace Detection will quickly finish as this is a greenfield project. The workflow will guide us into Requirements Analysis stage within the Inception phase next.

Step 6: Requirements Analysis

The workflow has progressed to the Requirements Analysis stage, where we will define the application requirements. The AI-DLC workflow presented our high-level problem statement to the Q Developer, which then responded with several requirements clarifications questions, as illustrated in Figure 4.

Several AI-DLC rules came into play at this stage. One rule instructed Amazon Q to avoid making assumptions on the user’s behalf and instead ask clarifying questions. Since LLMs tend to make assumptions and rush towards outcomes, they must be explicitly instructed to align with the engineering rigor of the AI-DLC methodology. To achieve this, the Q Developer presented several requirements clarification questions in requirement-verification-questions.md file and asked us to answer them inline in the file.

Another AI-DLC rule instructed the Q Developer to present questions in multiple-choice format and always include an open-ended option (“Other”) to enhance user convenience and provide flexibility in answering.

As shown in Figure 5, Amazon Q has asked us about the desired puzzle variant, such as the Classic Farmer, Fox, Chicken, and Grain puzzle or other popular variations. Additionally, it has asked us questions about user interaction methods, score persistence across multiple players, and the creation of a leaderboard.

These questions are essential for achieving our desired application outcome. Our responses to these questions will determine the final product. While we didn’t explicitly specify this level of detail in our high-level problem statement, AI-DLC has delegated detailed requirements elaboration to Amazon Q, but we still retain control over what gets built.

Screenshot of AI-DLC Requirements Analysis phase showing a split view. The left side displays a requirements clarification questions markdown file with multiple-choice questions about the Kuer Crossing Portal, including sections about user crossing portal variants, primary user interaction methods, and data storage preferences. The right side shows the Amazon Q chat interface with the Inception Phase - Requirements Analysis heading. Two callout annotations highlight: 1) AI-DLC asks questions in multiple choice format, with an 'Other' option that leaves an open-ended fill-in-the-blank when the answer doesn't match the predefined options; 2) AI-DLC generates config.requirements-clarification-questions.md file containing requirements clarification questions, with questions placed in an MD file where the user can respond inline in the file, using 'Answered' to indicate completion. The chat shows instructions for answering questions to clarify requirements.

Figure 5: Requirements Analysis

We answer all the questions in requirement-verification-questions.md file and enter “Done” in the chat window.

Amazon Q processes our responses. The AI-DLC workflow is designed to identify human errors. It checks if we’ve answered all the questions and identifies any contradictions or ambiguities in our answers. Any confusions, contradictions, or ambiguities will be flagged for follow-up questions. AI-DLC adheres to high standards and ensures that we don’t proceed to the next step until we’re fully in agreement on the requirements between us and Amazon Q.

Since we answered all the questions and there were no contradictions in our answers, the workflow continues and generates a comprehensive requirements.md document, as shown in figure 6.

Screenshot showing AI-DLC requirements review phase with split view. The left side displays a Requirements Document for the River Crossing Puzzle Web Application, including Intent Analysis Summary, User Request details, Request Type, and Functional Requirements with Core Puzzle Functionality items (FR-001 through FR -006) describing game features like classic farmer puzzle, timer display, move tracking, puzzle state validation, and victory messages. The right side shows the Amazon Q chat interface with 'Requirements Analysis Complete' heading, displaying project details including Puzzle Type (Classic Farmer, Fox, Chicken, and Grain river crossing puzzle), Technology (React-based modern web application), and Target Devices (web browsers only). Three callout annotations highlight: 1) Requirements Analysis phase complete; 2) Requirements document generated; 3) User may Request Changes, Add User Stories for Approval, or Approve & Continue, with a REVIEW SAFETY note warning users to review requirements and approve to continue, with options to request changes or add modifications if required.

Figure 6: Requirements Review

The workflow prompts us to review the requirements.md document and decide on the next step. If we’re not aligned on the requirements, we can prompt Amazon Q to help us achieve alignment. We can then iterate on the requirements until we’re fully aligned. Once we’re fully aligned, we prompt AI-DLC to progress to the next stage.

Given the adaptive nature of the AI-DLC workflow, Amazon Q has recommended that this application is simple enough, and we can skip the User Stories stage. If we felt otherwise, we would have overridden the model’s recommendation. In this case, we agree with Q’s recommendation and will therefore enter “Continue” in the chat window.

The workflow will enter Workflow Planning stage next.

Step 7: Workflow Planning

With our requirements established, we proceed to the Workflow Planning stage. In this phase, we leverage the requirements context and the workflow’s intelligence to plan the execution of specific stages of AI-DLC within the workflow to build our application as per the requirements specification.

Figure 7 illustrates the workflow planning stage in Q Developer. The workflow has generated an execution-plan.md file that outlines the recommended stages for execution and those that should be skipped.

The workflow planning process is highly contextual to the requirements. During requirements analysis, we decided to develop a simple river crossing puzzle application, consisting of a single HTML file, without a backend, leaderboard, or persistence. Consequently, Amazon Q recommends that we skip all the conditional stages, such as User Stories, Application Design, Units of Work Planning, and so on, and proceed directly to the Code Generation Planning stage in the Construction phase.

Figure 7 visually represents the recommended workflow graphically, indicating the stages that will be executed and those that will be skipped.

Screenshot of AI-DLC Workflow Planning phase showing an Execution Plan document on the left with Detailed Analysis Summary including user-facing changes, brownfield changes, API changes, and NFR changes, plus Risk Assessment. Below is a Workflow Visualization flowchart diagram showing the workflow stages from Inception through Construction to Operations phases. The right side shows the Amazon Q chat with 'Workflow Planning Complete' heading. Three callout annotations highlight: 1) AI -DLC workflow has analyzed requirements and based on the problem complexity has proposed a set of stages to execute in the workflow; 2) Problem is simple enough that AI-DLC is proposing to skip the detailed optional stages; 3) User may Request Changes, Add back skipped stages or Approve & Continue.

Figure 7: Workflow Planning

Since we’ve opted for a straightforward web UI app in this blog post for brevity, the workflow execution plan suggested by AI-DLC aligns seamlessly with our objectives. Should we not be aligned with the AI-DLC recommended workflow execution plan, we would request Q Developer to modify the plan to suit our preferences.

Since we’ve agreed on the workflow plan, we’ll enter “Continue” in Q’s chat session. If we weren’t aligned with the recommended workflow execution plan, we’d have prompted Q with our concerns and iterated over the revised execution plan until it aligned with our preferences. Following the recommended execution plan, the workflow will transition into the Construction phase and directly into the Code Generation Plan stage in the phase.

Step 8: Code Generation Planning

AI-DLC prioritizes planning over rushing to outcomes. This approach aligns with the concept of human-in-the-loop behavior, allowing us to detect issues early on, provide feedback on the plan, and prevent wrong assumptions from propagating further. Before we proceed with actual Code Generation, we undergo Code Generation Planning.

During Code Generation Planning, AI-DLC creates a detailed, numbered plan. It analyzes the requirements and design artifacts, breaking down the process into explicit steps for generating business logic, the API layer, the data layer, tests, documentation, and deployment files.

The plan is documented in a {unit-name}-code-generation-plan.md file, complete with check boxes. This ensures transparency, allowing users to see what will be built. It also provides control, enabling users to modify the plan. Additionally, it maintains quality by ensuring comprehensive coverage of code, tests, and documentation.

Figure 8 illustrates the AI-DLC’s code generation plan. The proposed workflow comprises eight steps, starting with creating an HTML structure and progressing to adding styling, game logic, and concluding with testing and documentation.

Screenshot of AI-DLC Code Generation Planning showing a Code Generation Plan document for River Crossing Puzzle on the left, with Unit Context listing HTML Structure, CSS Styling, and JavaScript files, followed by Unit Generation Steps including Step 1: HTML Structure Generation, Step 2: CSS Styling Generation, and Step 3 : Core Game Logic Generation with detailed checkboxes for each step. The right side shows Amazon Q chat with code generation plan details. Three callout annotations highlight: 1) The plan doc contains to-do items for AI-DLC to execute. These checkboxes get completed when the task is done; 2) This is how AI-DLC workflow persists and tracks progress state; 3) AI-DLC has proposed an 8-step code generation plan with checkboxes and review prompts, and User may Request Changes or Approve & Continue.

Figure 8: Code Generation Planning

The code generation plan appears reasonable to us. We will proceed to the Code Generation stage by entering “Continue” in Q’s chat session.

Step 9: Code Generation

The Code Generation stage executes the Code Generation Plan we approved in the previous step. It generates actual code artifacts step-by-step, including business logic, APIs, data layers, tests, and documentation. Completed steps are marked with check boxes, progress is tracked, and story traceability is ensured before presenting the generated code for user approval.

Figure 9 illustrates that the Code Generation stage has been completed. We are now reviewing a single index.html file generated with embedded styling and JavaScript consistent with our preference specified in requirements.md.

The workflow provides a summary of the activities performed during the Code Generation phase.

Screenshot of AI-DLC Code Generation phase showing generated HTML code on the left with embedded styling and JavaScript for the River Crossing Puzzle application. The right side shows Amazon Q chat with 'Code Generation Complete - river-crossing-puzzle' heading and a list of generated artifacts including HTML file, CSS interface, drag-and-drop interface, game logic, and testing services. Two callout annotations highlight: 1) The generated code is an HTML file with embedded styling and JavaScript; 2) We have specified during requirements analysis phase that we want a single-file index.html file implementation; 3) Code generation has been completed, and a summary of the generated artifacts is provided.

Figure 9: Code Generation

We’re about to test our newly created application soon. While it may be straightforward to test this simple puzzle app right now, for complex applications, we generate build and test instructions using AI-DLC.

We’ll enter “Continue” in the workflow and enter the final Build and Test stage in the Construction phase.

Step 10: Build and Test

These questions are essential for achieving our desired application.
We’ve reached the final stage of the AI-DLC Construction Phase, known as the Build and Test stage. During this stage, we create comprehensive instruction files that guide the build and packaging of the project, and document the necessary testing layers. These layers include unit tests (validating generated code), integration tests (checking unit interactions), performance tests (load/stress testing), and additional tests as required (security, contract, e2e).

The generated build instructions include dependencies and commands, test execution steps with expected results, and a summary document that provides an overview of the overall build/test status and the project’s readiness for deployment.

Figure 10 illustrates the documentation generated during this stage.

Screenshot of AI-DLC Build and Test phase showing a Build and Test Summary document on the left with Build Status (Build Tool, Build Status, Build Artifacts, Build Warnings) and Test Execution Summary including Unit Tests, Integration Tests, and Performance Tests sections with checkmarks and failure indicators. The right side shows Amazon Q chat with build and test completion status and project summary. Two callout annotations highlight: 1) Build and Test Complete! Build and Test instructions have been documented; 2) The AI-DLC workflow has concluded with a comprehensive summary of all completed stages and generated artifacts.

Figure 10: Build and Test

The AI-DLC workflow has now concluded.

Let’s Solve the Puzzle!

We open index.html in a web browser to access our newly created River Crossing Puzzle application. As shown in figure 11, we see our graphical web UI.

During requirements assessment, we chose a straightforward user interface using HTML, CSS, and JavaScript (without any frameworks), as evident in the display shown in Figure 11. Your display may vary due to the probabilistic nature of LLMs and the choices you made for requirements.

We attempt to solve the puzzle and find that it works as expected.

Side-by-side screenshots of the River Crossing Puzzle web application showing two game states. The left screenshot shows the initial state with a farmer on the left bank, and fox, chicken, and grain items listed below, with a blue river in the center and right bank on the right. The right screenshot shows a game state after moves with the farmer on the right bank and a success message 'Congratulations! You won in 7 moves!' displayed at the bottom. Both screens have a yellow 'Start Over' button and show move counts.

Figure 11: River Crossing Puzzle Web App

Conclusion

This post shows how AWS’s open-source AI-DLC workflow, guided by Amazon Q Developer’s Project Rules feature, helps developers build applications with structured oversight and transparency.

Using a River Crossing Puzzle web application as an example, the walk-through illustrates how AI-DLC methodology adapts its rigor based on project complexity, skipping unnecessary stages for simple applications while maintaining comprehensive processes for complex projects. Throughout each stage, AI-DLC enforces “human-in-the-loop” behavior, requiring user approval at critical checkpoints, asking clarifying questions, and maintaining complete audit trails for traceability.

The exercise successfully demonstrates how AI-DLC balances AI automation with human oversight, enhancing productivity without sacrificing quality or control. By following this structured, repeatable methodology, development teams can leverage generative AI’s capabilities while ensuring humans remain in charge of architectural decisions and implementation approaches. This framework provides the necessary guardrails for responsible and effective AI-assisted software development across projects of varying complexity.

Cleanup

We did not create any AWS resources in this walk-through, so no AWS cleanup is needed. You may cleanup your project workspace at your discretion.

Ready to get started? Visit our GitHub repository to download the AI-DLC workflow and join the AI-Native Builders Community to contribute to the future of software development.

About the authors:

Announcing the updated AWS Well-Architected Machine Learning Lens

2025-11-19 Steven DeVries

Post Syndicated from Steven DeVries original https://aws.amazon.com/blogs/architecture/announcing-the-updated-aws-well-architected-machine-learning-lens/

We are excited to announce the updated AWS Well-Architected Machine Learning Lens, now enhanced with the latest capabilities and best practices for building machine learning (ML) workloads on AWS.

The AWS Well-Architected Framework provides architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable workloads in the cloud. The Machine Learning Lens uses the Well-Architected Framework to outline the steps for performing a comprehensive review of your ML architectures.

The updated Machine Learning Lens provides a consistent approach for customers to evaluate architectures across ML workloads, from traditional supervised and unsupervised learning to modern AI applications. This lens addresses common considerations relevant to the complete ML lifecycle, including business goal identification, problem framing, data processing, model development, deployment, and monitoring. The lens incorporates the latest AWS ML services and capabilities introduced since 2023, providing access to current best practices and implementation guidance.

The Machine Learning Lens is part of a collection of Well-Architected lenses published under AWS Well-Architected Lenses.

What is the Machine Learning Lens?

The Well-Architected Machine Learning Lens focuses on the six pillars of the Well-Architected Framework across six phases of the ML lifecycle.

The six phases are:

Business goal identification: Establishing clear business objectives and success criteria for your ML initiative.
ML problem framing: Translating business problems into well-defined ML problems with appropriate metrics.
Data processing: Collecting, preparing, and engineering features from your data sources.
Model development: Building, training, tuning, and evaluating ML models with proper experimentation tracking.
Model deployment: Deploying models into production environments with appropriate infrastructure and monitoring.
Model monitoring: Continuously monitoring model performance and maintaining model quality over time.

Unlike the traditional waterfall approach, an iterative approach is required to achieve a working prototype based on the six phases of the ML lifecycle. The lens provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Framework pillars for each ML lifecycle phase.

You can also use the Well-Architected Machine Learning Lens wherever you are on your cloud journey. You can choose to apply this guidance either during the design of your ML workloads or after your workloads have entered production as a part of the continuous improvement process.

Machine Learning Lens components

The lens includes four focus areas:

Well-Architected ML design principles: Ten design principles that frame the presented best practices, including assign ownership, enable reproducibility, optimize resources, and enable continuous improvement.
The ML lifecycle and the Well-Architected Framework pillars: This considers all aspects of the ML lifecycle and reviews design strategies aligned to the pillars of the overall Well-Architected Framework:
- Operational excellence: Ability to support ongoing development, run ML workloads effectively, gain insight into operations, and continuously improve processes.
- Security: Ability to protect data, models, and ML infrastructure while taking advantage of cloud technologies to improve security posture.
- Reliability: Ability of ML workloads to perform their intended function correctly and consistently, with automatic recovery from failure situations.
- Performance efficiency: Ability to use computing resources efficiently for ML workloads and maintain efficiency as demand and technologies evolve.
- Cost optimization: Ability to run ML systems to deliver business value at the lowest price point through resource optimization and automation.
- Sustainability: Addresses the environmental impact of ML workloads, focusing on energy consumption and resource efficiency.
Cloud-agnostic best practices: 100+ comprehensive best practices covering each ML lifecycle phase across the Well-Architected Framework pillars. Each best practice includes:
- Implementation guidance: Detailed AWS implementation plans with references to current AWS ML services and capabilities.
- Resources: Curated links to AWS documentation, blogs, videos, and code examples supporting the best practices.
Related ML architecture considerations: Discussions on advanced topics including MLOps patterns, data architecture for ML, model governance strategies, and considerations for responsible AI implementation.

What else is discussed in the Machine Learning Lens?

The Machine Learning Lens also discusses the following key topics:

Responsible AI: Comprehensive guidance on implementing fair, explainable, and unbiased ML systems throughout the development lifecycle.
MLOps and automation: Best practices for implementing continuous integration, continuous deployment, and continuous training for ML workloads.
Data architecture for ML: Guidance on building robust data pipelines, feature stores, and data governance practices that support ML workloads at scale.
Model governance and lineage: Strategies for tracking model versions, maintaining audit trails, and ensuring compliance with regulatory requirements.

What’s new in the updated Machine Learning Lens?

The updated Machine Learning Lens incorporates the latest AWS ML capabilities and best practices introduced since 2023, including:

Enhanced data and AI collaborative workflows: Integrated development through Amazon SageMaker Unified Studio – MLOPS02-BP01, MLOPS01-BP01, MLOPS03-BP01, and MLOPS02-BP04.
AI-assisted development lifecycle: Code generation and productivity enhancement using Kiro and Amazon Q Developer – MLCOST01-BP02, MLOPS01-BP01, MLCOST03-BP02, and MLSUS05-BP02.
Distributed training infrastructure: Large-scale foundation model development and fine-tuning with Amazon SageMaker HyperPod – MLCOST04-BP02, MLCOST04-BP07, MLPERF06-BP05, MLSEC03-BP02, MLCOST04-BP06, MLPERF06-BP07, and MLSUS05-BP02.
Model customization capabilities: Knowledge distillation and fine-tuning for domain-specific applications using Amazon Bedrock with Kiro and Amazon Q Developer integration and model hub with Amazon SageMaker Jumpstart – MLCOST01-BP02, MLCOST01-BP01, MLCOST03-BP02, MLSUS04-BP02, MLCOST05-BP01, and MLSUS05-BP02.
No-code ML development: Natural language support for building models using SageMaker Canvas with Amazon Q Developer integration – MLCOST03-BP02, MLCOST03-BP03, MLOPS01-BP01, and MLSUS05-BP02.
Improved bias detection: Enhanced fairness metrics in SageMaker Clarify with Model Monitor for drift detection – MLREL02-BP01, MLREL03-BP04, MLREL02-BP04, MLREL02-BP05, and MLREL02-BP02.
Modular inference architecture: Flexible deployment with SageMaker Inference Components and Multi-Model Endpoints – MLCOST05-BP01, MLREL01-BP01, MLSUS05-BP01, MLCOST05-BP03, and MLREL01-BP02.
Advanced observability: Improved debugging with SageMaker Debugger, Model Monitor, and CloudWatch across the ML lifecycle – MLOPS06-BP02, MLOPS05-BP02, MLOPS06-BP01, and MLOPS02-BP04.
Enhanced cost optimization: Resource management through SageMaker Training Plans, Savings Plans, and Spot Instance support – MLCOST05-BP03, MLOPS05-BP02, MLCOST06-BP01, MLCOST06-BP02, and MLCOST04-BP06.

Who should use the Machine Learning Lens?

The Machine Learning Lens is valuable for many roles across your organization. Business leaders can use this lens to understand the end-to-end implementation and business value of ML initiatives. Data scientists and ML engineers can leverage the lens to understand how to build, deploy, and maintain ML systems at scale. DevOps and platform engineers can learn how to create reliable, secure infrastructure for ML workloads. Risk and compliance leaders can understand how ML systems are implemented responsibly while adhering to regulatory and governance requirements.

Next steps

If you require support on the implementation or assessment of your ML workloads, please contact your AWS Solutions Architect or Account Representative.

Special thanks to everyone across the AWS Solution Architecture, AWS Professional Services, and Machine Learning communities who contributed to the updated Machine Learning Lens. These contributions encompassed diverse perspectives, expertise, backgrounds, and experiences in developing comprehensive guidance for ML workloads on AWS.

For additional reading, refer to the AWS Well-Architected Framework, or explore the AWS Well-Architected Generative AI Lens for guidance specific to generative AI workloads.

About the authors

Architecting for AI excellence: AWS launches three Well-Architected Lenses at re:Invent 2025

2025-11-19 Anitha Selvan

Post Syndicated from Anitha Selvan original https://aws.amazon.com/blogs/architecture/architecting-for-ai-excellence-aws-launches-three-well-architected-lenses-at-reinvent-2025/

At re:Invent 2025, we introduce one new lens and two significant updates to the AWS Well-Architected Lenses specifically focused on AI workloads: the Responsible AI Lens, the Machine Learning (ML) Lens, and the Generative AI Lens. Together, these lenses provide comprehensive guidance for organizations at different stages of their AI journey, whether you’re just starting to experiment with machine learning or already deploying complex AI applications at scale.

The AWS Well-Architected Framework provides the best architectural practices for designing and operating reliable, secure, performance efficient, cost-optimized, and sustainable workloads in the cloud.

The Responsible AI Lens: Embedding trust in AI systems

The Responsible AI Lens offers a structured approach for developers to assess and track their AI workloads against established best practices, identify potential gaps in their AI implementation and receive actionable guidance to improve their AI systems’ quality and alignment with responsible AI principles. By using the Responsible AI Lens you can make informed decisions that balance business and technical requirements, accelerating your path from AI experimentation to production-ready solutions.

Key takeaways from the Responsible AI Lens:

Every AI system has a Responsible AI consideration: Whether intentionally designed or not, AI systems inherently carry Responsible AI implications that need to be actively managed rather than left to chance.
AI systems can be used beyond original intent and may have unintended impacts: Applications often get utilized in ways developers didn’t anticipate, and due to their probabilistic nature, AI systems can produce unexpected outcomes even within intended use cases, making robust Responsible AI decisions essential from the start.
Responsible AI is an enabler to innovation and trust: Rather than being a constraint, Responsible AI practices can accelerate innovation by proactively building stakeholder and customer trust and reducing downstream risks.

The Responsible AI Lens serves as the foundational guidance for AI development activities, providing critical guidelines that inform both the Machine Learning Lens and the Generative AI Lens implementations.

The Machine Learning Lens: Foundation for ML workloads

The Machine Learning Lens provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Framework pillars for each machine learning (ML) lifecycle phase. The updated Machine Learning Lens provides a consistent approach for designing, building, and operating machine learning workloads on AWS. It addresses the full spectrum of ML workloads, from traditional supervised and unsupervised learning to modern AI applications.

The updated Machine Learning Lens incorporates the latest AWS ML capabilities (evolved since their introduction in 2023). What’s new in the updated ML Lens:

Enhanced data and AI collaborative workflows through Amazon SageMaker Unified Studio.
AI-assisted development for code generation and productivity enhancement.
Distributed training infrastructure for foundation model development and fine-tuning with Amazon SageMaker HyperPod.
Model customization capabilities such as knowledge distillation and fine-tuning domain-specific applications using Amazon Bedrock with Kiro and Amazon Q Developer.
No-code ML development using Amazon SageMaker Canvas with Amazon Q integration.
Improved bias detection with enhanced fairness metrics and Responsible AI capabilities in Amazon SageMaker Clarify.
Automated dashboard creation for business insights through Amazon Quick Sight.
Modular inference architecture for flexible model deployment with Inference Components.
Advanced observability with improved debugging and monitoring capabilities across the ML lifecycle.
Enhanced cost optimization for resource management through Amazon SageMaker Training Plans, Savings Plans, and Spot Instance support.

You can use the ML Lens wherever you are on your cloud journey. You can choose to apply this guidance either during the design of your ML workloads or after your workloads have entered production as part of the continuous improvement process. These improvements are powered by key AWS services including Amazon SageMaker Unified Studio, Amazon Q, Amazon SageMaker HyperPod, and Amazon Bedrock.

The Generative AI Lens: Specialized guidance for foundation models

The Generative AI Lens provides a consistent approach for customers to evaluate architectures that use large language models (LLMs) to achieve their business goals. This lens addresses common considerations relevant to model selection, prompt engineering, model customization, workload integration, and continuous improvement. We identify best practices that help you architect your cloud-based applications and workloads according to AWS Well-Architected design principles gathered from supporting thousands of customer implementations. While the Machine Learning (ML) Lens covers the broad spectrum of ML workloads, the Generative AI Lens focuses specifically on foundation models and generative AI applications. The Generative AI Lens provides the best architectural practices for designing and operating generative AI workloads on AWS.

The updated Generative AI Lens includes several new additions:

Amazon SageMaker HyperPod guidance for orchestrating complex, long-running generative AI workflows that includes additional service capabilities.
Enhanced Responsible AI preamble with detailed discussion on the eight core dimensions of Responsible AI as described by AWS.
Updated data architecture preamble with strategic considerations needed to architect a data system for generative AI workloads.
New agentic AI preamble introducing architecture paradigms for agentic systems.
Eight architecture scenarios covering common generative AI-powered business applications such as autonomous call centers, knowledge worker co-pilots, and multi-tenant generative AI service systems.

The Generative AI Lens builds upon the foundation established by the ML Lens, providing specialized guidance for the unique challenges and opportunities presented by foundation models and generative AI applications.

Implementation strategy for Well-Architected AI/ML guidance: A unified approach

The new lenses – Responsible AI Lens, Machine Learning Lens, and Generative AI Lens – work together to provide comprehensive guidance for AI development. The Responsible AI Lens guides safe, fair, and secure AI development. It helps balance business needs with technical requirements, streamlining the transition from experimentation to production. The Machine Learning Lens guides organizations in evaluating workloads across both modern AI and traditional machine learning approaches. Recent updates focus on key areas including enhanced data and AI collaborative workflows, AI-assisted development capabilities, large-scale infrastructure provisioning, and customizable model deployment. The Generative AI Lens helps customers evaluate large language model (LLM) based architectures and its updates include guidance for Amazon SageMaker HyperPod users, new insights on agentic AI, and updated architectural scenarios.

What are the next steps?

The launch of these new lenses at re:Invent 2025 helps organizations build AI systems that are responsible, trustworthy, powerful, and effective. By providing comprehensive guidance across the full spectrum of AI workloads, AWS supports organizations to accelerate their AI initiatives while maintaining the highest standards of responsible AI and technical excellence.

Learn more about the AWS Well-Architected Framework and implement the best practice guidance provided using the GitHub repository. These lenses are practical tools designed to help you build AI systems that deliver real business value while maintaining the highest standards of ethics, security, and operational excellence.

For additional reading, refer to the AWS Well-Architected Framework and pillar whitepapers, or contact your AWS Solutions Architect or Account Representative for support on implementing these lenses in your organization.

About the authors

Amazon Nova Multimodal Embeddings: State-of-the-art embedding model for agentic RAG and semantic search

2025-10-28 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock/

Today, we’re introducing Amazon Nova Multimodal Embeddings, a state-of-the-art multimodal embedding model for agentic retrieval-augmented generation (RAG) and semantic search applications, available in Amazon Bedrock. It is the first unified embedding model that supports text, documents, images, video, and audio through a single model to enable crossmodal retrieval with leading accuracy.

Embedding models convert textual, visual, and audio inputs into numerical representations called embeddings. These embeddings capture the meaning of the input in a way that AI systems can compare, search, and analyze, powering use cases such as semantic search and RAG.

Organizations are increasingly seeking solutions to unlock insights from the growing volume of unstructured data that is spread across text, image, document, video, and audio content. For example, an organization might have product images, brochures that contain infographics and text, and user-uploaded video clips. Embedding models are able to unlock value from unstructured data, however traditional models are typically specialized to handle one content type. This limitation drives customers to either build complex crossmodal embedding solutions or restrict themselves to use cases focused on a single content type. The problem also applies to mixed-modality content types such as documents with interleaved text and images or video with visual, audio, and textual elements where existing models struggle to capture crossmodal relationships eﬀectively.

Nova Multimodal Embeddings supports a unified semantic space for text, documents, images, video, and audio for use cases such as crossmodal search across mixed-modality content, searching with a reference image, and retrieving visual documents.

Evaluating Amazon Nova Multimodal Embeddings performance
We evaluated the model on a broad range of benchmarks, and it delivers leading accuracy out-of-the-box as described in the following table.

Nova Multimodal Embeddings supports a context length of up to 8K tokens, text in up to 200 languages, and accepts inputs via synchronous and asynchronous APIs. Additionally, it supports segmentation (also known as “chunking”) to partition long-form text, video, or audio content into manageable segments, generating embeddings for each portion. Lastly, the model oﬀers four output embedding dimensions, trained using Matryoshka Representation Learning (MRL) that enables low-latency end-to-end retrieval with minimal accuracy changes.

Let’s see how the new model can be used in practice.

Using Amazon Nova Multimodal Embeddings
Getting started with Nova Multimodal Embeddings follows the same pattern as other models in Amazon Bedrock. The model accepts text, documents, images, video, or audio as input and returns numerical embeddings that you can use for semantic search, similarity comparison, or RAG.

Here’s a practical example using the AWS SDK for Python (Boto3) that shows how to create embeddings from different content types and store them for later retrieval. For simplicity, I’ll use Amazon S3 Vectors, a cost-optimized storage with native support for storing and querying vectors at any scale, to store and search the embeddings.

Let’s start with the fundamentals: converting text into embeddings. This example shows how to transform a simple text description into a numerical representation that captures its semantic meaning. These embeddings can later be compared with embeddings from documents, images, videos, or audio to find related content.

To make the code easy to follow, I’ll show a section of the script at a time. The full script is included at the end of this walkthrough.

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

Now we’ll process visual content using the same embedding space using a photo.jpg file in the same folder as the script. This demonstrates the power of multimodality: Nova Multimodal Embeddings is able to capture both textual and visual context into a single embedding that provides enhanced understanding of the document.

Nova Multimodal Embeddings can generate embeddings that are optimized for how they are being used. When indexing for a search or retrieval use case, embeddingPurpose can be set to GENERIC_INDEX. For the query step, embeddingPurpose can be set depending on the type of item to be retrieved. For example, when retrieving documents, embeddingPurpose can be set to DOCUMENT_RETRIEVAL.

# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

To process video content, I use the asynchronous API. That’s a requirement for videos that are larger than 25MB when encoded as Base64. First, I upload a local video to an S3 bucket in the same AWS Region.

aws s3 cp presentation.mp4 s3://my-video-bucket/videos/

This example shows how to extract embeddings from both visual and audio components of a video file. The segmentation feature breaks longer videos into manageable chunks, making it practical to search through hours of content efficiently.

# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"
S3_EMBEDDING_DESTINATION_URI = "s3://my-embedding-destination-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("\nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"\nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('\n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"\nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}")

With our embeddings generated, we need a place to store and search them efficiently. This example demonstrates setting up a vector store using Amazon S3 Vectors, which provides the infrastructure needed for similarity search at scale. Think of this as creating a searchable index where semantically similar content naturally clusters together. When adding an embedding to the index, I use the metadata to specify the original format and the content being indexed.

# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"\nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")

This final example demonstrates the capability of searching across different content types with a single query, finding the most similar content regardless of whether it originated from text, images, videos, or audio. The distance scores help you understand how closely related the results are to your original query.

# Text to query
query_text = "foundation models"  

print(f"\nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...\n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:\n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

Crossmodal search is one of the key advantages of multimodal embeddings. With crossmodal search, you can query with text and find relevant images. You can also search for videos using text descriptions, find audio clips that match certain topics, or discover documents based on their visual and textual content. For your reference, the full script with all previous examples merged together is here:

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"

# Amazon S3 output bucket and location
S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("\nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"\nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('\n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"\nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}")
# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"\nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingPurpose": "GENERIC_INDEX",
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")
# Text to query
query_text = "foundation models"  

print(f"\nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...\n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:\n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

For production applications, embeddings can be stored in any vector database. Amazon OpenSearch Service offers native integration with Nova Multimodal Embeddings at launch, making it straightforward to build scalable search applications. As shown in the examples before, Amazon S3 Vectors provides a simple way to store and query embeddings with your application data.

Things to know
Nova Multimodal Embeddings offers four output dimension options: 3,072, 1,024, 384, and 256. Larger dimensions provide more detailed representations but require more storage and computation. Smaller dimensions offer a practical balance between retrieval performance and resource efficiency. This flexibility helps you optimize for your specific application and cost requirements.

The model handles substantial context lengths. For text inputs, it can process up to 8,192 tokens at once. Video and audio inputs support segments of up to 30 seconds, and the model can segment longer files. This segmentation capability is particularly useful when working with large media files—the model splits them into manageable pieces and creates embeddings for each segment.

The model includes responsible AI features built into Amazon Bedrock. Content submitted for embedding goes through Amazon Bedrock content safety filters, and the model includes fairness measures to reduce bias.

As described in the code examples, the model can be invoked through both synchronous and asynchronous APIs. The synchronous API works well for real-time applications where you need immediate responses, such as processing user queries in a search interface. The asynchronous API handles latency insensitive workloads more efficiently, making it suitable for processing large content such as videos.

Availability and pricing
Amazon Nova Multimodal Embeddings is available today in Amazon Bedrock in the US East (N. Virginia) AWS Region. For detailed pricing information, visit the Amazon Bedrock pricing page.

To learn more, see the Amazon Nova User Guide for comprehensive documentation and the Amazon Nova model cookbook on GitHub for practical code examples.

If you’re using an AI–powered assistant for software development such as Amazon Q Developer or Kiro, you can set up the AWS API MCP Server to help the AI assistants interact with AWS services and resources and the AWS Knowledge MCP Server to provide up-to-date documentation, code samples, knowledge about the regional availability of AWS APIs and CloudFormation resources.

Start building multimodal AI-powered applications with Nova Multimodal Embeddings today, and share your feedback through AWS re:Post for Amazon Bedrock or your usual AWS Support contacts.

— Danilo

Introducing Claude Sonnet 4.5 in Amazon Bedrock: Anthropic’s most intelligent model, best for coding and complex agents

2025-09-29 Matheus Guimaraes

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/introducing-claude-sonnet-4-5-in-amazon-bedrock-anthropics-most-intelligent-model-best-for-coding-and-complex-agents/

Today, we’re excited to announce that Claude Sonnet 4.5, powered by Anthropic, is now available in Amazon Bedrock, a fully managed service that offers a choice of high- performing foundation models from leading AI companies. This new model builds upon Claude 4’s foundation to achieve state-of-the-art performance in coding and complex agentic applications.

Claude Sonnet 4.5 demonstrates advancements in agent capabilities, with enhanced performance in tool handling, memory management, and context processing. The model shows marked improvements in code generation and analysis, from identifying optimal improvements to exercising stronger judgment in refactoring decisions. It particularly excels at autonomous long-horizon coding tasks, where it can effectively plan and execute complex software projects spanning hours or days while maintaining consistent performance and reliability throughout the development cycle.

Source: https://www.anthropic.com/news/claude-sonnet-4-5

By using Claude Sonnet 4.5 in Amazon Bedrock, developers gain access to a fully managed service that not only provides a unified API for foundation models but ensures their data stays under complete control with enterprise-grade tools for security, and optimization.

Claude Sonnet 4.5 also seamlessly integrates with Amazon Bedrock AgentCore, enabling developers to maximize the model’s capabilities for building complex agents. AgentCore’s purpose-built infrastructure complements the model’s enhanced abilities in tool handling, memory management, and context understanding. Developers can leverage complete session isolation, 8-hour long-running support, and comprehensive observability features to deploy and monitor production-ready agents from autonomous security operations to complex enterprise workflows.

Business applications and use cases
Beyond its technical capabilities, Sonnet 4.5 delivers practical business value through consistent performance and advanced problem-solving abilities. The model excels at producing and editing business documents while maintaining reliable performance across complex workflows.

The model demonstrates strength in several key industries:

Cybersecurity – Claude Sonnet 4.5 can be used to deploy agents that autonomously patch vulnerabilities before exploitation, shifting from reactive detection to proactive defense.
Finance – Sonnet 4.5 handles everything from entry-level financial analysis to advanced predictive analysis, helping transform manual audit preparation into intelligent risk management.
Research – Sonnet 4.5 can better handle tools, context, and deliver ready-to-go office files to drive expert analysis into final deliverables and actionable insights.

Sonnet 4.5 features in the Amazon Bedrock API
Here are some highlights of Sonnet 4.5 in the Amazon Bedrock API:

Smart Context Window Management – The new API introduces intelligent handling when AI models reach their maximum capacity. Instead of returning errors when conversations get too long, Claude Sonnet 4.5 will now generate responses up to the available limit and clearly indicate why it stopped. This eliminates frustrating interruptions and allows users to maximize their available context window.

Tool Use Clearing for Efficiency – Claude Sonnet 4.5 enables automatic cleanup of tool interaction history during long conversations. When conversations involve multiple tool calls, the system can automatically remove older tool results while preserving recent ones. This keeps conversations efficient and prevents unnecessary token consumption, reducing costs while maintaining conversation quality.

Cross-Conversation Memory – A new memory capability enables Sonnet 4.5 to remember information across different conversations through the use of a local memory file. Users can explicitly ask the model to remember preferences, context, or important information that persists beyond a single chat session. This creates more personalized and contextually aware interactions while keeping the information safe within the local file.

With these new capabilities for managing context, developers can build AI agents capable of handling long-running tasks at higher intelligence without hitting context limits or losing critical information as frequently.

Getting started
To begin working with Claude Sonnet 4.5, you can access it through Amazon Bedrock using the correct model ID. A good practice is to use the Amazon Bedrock Converse API to write code once and seamlessly switch between different models, making it easier to experiment with Sonnet 4.5 or any of the other models available in Amazon Bedrock.

Let’s see this in action with a simple example. I’m going to use the Amazon Bedrock Converse API to send a prompt to Sonnet 4.5. I start by importing the modules I’m going to use. For this short example, I only need AWS SDK for Python (Boto3) so I can create a BedrockRuntimeClient. I’m also importing the rich package so I can format my output nicely later on.

Following best practices, I create a boto3 session and create an Amazon Bedrock client from it instead of creating one directly. This gives you explicit control over configuration, improves thread safety, and makes your code more predictable and testable compared to relying on the default session.

I want to give the model something with a bit of complexity instead of asking a simple question to demonstrate the power of Sonnet 4.5. So I’m going to give the model the current state of an imaginary legacy monolithic application written in Java with a single database and ask for a digital transformation plan which includes a migration strategy, risk assessment, estimated timeline and key milestones and specific AWS services recommendations.

Because the prompt is quite long I put it in a text file locally and just load it up in code. I then set up the Amazon Bedrock converse payload setting the role to “user” to indicate that this is a message by the user of the application and add the prompt to the content.

This is where the magic happens! We put it all together and call Claude Sonnet 4.5 using its model ID. Well, kind of. You can only access Sonnet 4.5 through an inference profile. This defines which AWS Regions will process your model requests and helps manage throughput and performance.

For this demo, I’ll be using one of Amazon Bedrock’s system-defined cross-Region inference profiles, which automatically routes requests across multiple Regions for optimal performance.

Now I just need to print to the screen to see the results. This is where I use the rich package I imported earlier just so we may have a nicely formatted output as I’m expecting a long response for this one. I also save the output to a file so I can have it handy as something to share with my teams.

Ok, let’s check the results! As expected, Sonnet 4.5 worked through my requirements and provided extensive and deep guidance for my digital transformation plan that I could start putting into practice. It included an executive summary, a step-by-step migration strategy split into phases with time estimates, and even some code samples to seed the development process and start breaking things down into microservices. It also provided the business cases for introducing technology and recommended the correct AWS services for each scenario. Here are some highlights from the report.

Claude Sonnet 4.5 is able to maintain consistency while delivering creative solutions making it an ideal choice for businesses seeking to use AI for complex problem-solving and development tasks. Its enhanced capabilities in following directions and using tools effectively translate into more reliable and innovative solutions across various business contexts.

Things to know
Claude Sonnet 4.5 represents a significant step forward in agent capabilities, particularly excelling in areas where consistent performance and creative problem-solving are essential. Its enhanced abilities in tool handling, memory management, and context processing make it particularly valuable across key industries such as finance, research, and cybersecurity. Whether handling complex development lifecycles, executing long-running tasks, or tackling business-critical workflows, Claude Sonnet 4.5 combines technical excellence with practical business value.

Claude Sonnet 4.5 is available today. For detailed information about its availability please visit the documentation.

To learn more about Amazon Bedrock explore our self-paced Amazon Bedrock Workshop and discover how to use available models and their capabilities in your applications.

Qwen models are now available in Amazon Bedrock

2025-09-19 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/qwen-models-are-now-available-in-amazon-bedrock/

Today we are adding Qwen models from Alibaba in Amazon Bedrock. With this launch, Amazon Bedrock continues to expand model choice by adding access to Qwen3 open weight foundation models (FMs) in a full managed, serverless way. This release includes four models: Qwen3-Coder-480B-A35B-Instruct, Qwen3-Coder-30B-A3B-Instruct, Qwen3-235B-A22B-Instruct-2507, and Qwen3-32B (Dense). Together, these models feature both mixture-of-experts (MoE) and dense architectures, providing flexible options for different application requirements.

Amazon Bedrock provides access to industry-leading FMs through a unified API without requiring infrastructure management. You can access models from multiple model providers, integrate models into your applications, and scale usage based on workload requirements. With Amazon Bedrock, customer data is never used to train the underlying models. With the addition of Qwen3 models, Amazon Bedrock offers even more options for use cases like:

Code generation and repository analysis with extended context understanding
Building agentic workflows that orchestrate multiple tools and APIs for business automation
Balancing AI costs and performance using hybrid thinking modes for adaptive reasoning

Qwen3 models in Amazon Bedrock
These four Qwen3 models are now available in Amazon Bedrock, each optimized for different performance and cost requirements:

Qwen3-Coder-480B-A35B-Instruct – This is a mixture-of-experts (MoE) model with 480B total parameters and 35B active parameters. It’s optimized for coding and agentic tasks and achieves strong results in benchmarks such as agentic coding, browser use, and tool use. These capabilities make it suitable for repository-scale code analysis and multistep workflow automation.
Qwen3-Coder-30B-A3B-Instruct – This is a MoE model with 30B total parameters and 3B active parameters. Specifically optimized for coding tasks and instruction-following scenarios, this model demonstrates strong performance in code generation, analysis, and debugging across multiple programming languages.
Qwen3-235B-A22B-Instruct-2507 – This is an instruction-tuned MoE model with 235B total parameters and 22B active parameters. It delivers competitive performance across coding, math, and general reasoning tasks, balancing capability with efficiency.
Qwen3-32B (Dense) – This is a dense model with 32B parameters. It is suitable for real-time or resource-constrained environments such as mobile devices and edge computing deployments where consistent performance is critical.

Architectural and functional features in Qwen3
The Qwen3 models introduce several architectural and functional features:

MoE compared with dense architectures – MoE models such as Qwen3-Coder-480B-A35B, Qwen3-Coder-30B-A3B-Instruct, and Qwen3-235B-A22B-Instruct-2507, activate only part of the parameters for each request, providing high performance with efficient inference. The dense Qwen3-32B activates all parameters, offering more consistent and predictable performance.

Agentic capabilities – Qwen3 models can handle multi-step reasoning and structured planning in one model invocation. They can generate outputs that call external tools or APIs when integrated into an agent framework. The models also maintain extended context across long sessions. In addition, they support tool calling to allow standardized communication with external environments.

Hybrid thinking modes – Qwen3 introduces a hybrid approach to problem-solving, which supports two modes: thinking and non-thinking. The thinking mode applies step-by-step reasoning before delivering the final answer. This is ideal for complex problems that require deeper thought. Whereas the non-thinking mode provides fast and near-instant responses for less complex tasks where speed is more important than depth. This helps developers manage performance and cost trade-offs more effectively.

Long-context handling – The Qwen3-Coder models support extended context windows, with up to 256K tokens natively and up to 1 million tokens with extrapolation methods. This allows the model to process entire repositories, large technical documents, or long conversational histories within a single task.

When to use each model
The four Qwen3 models serve distinct use cases. Qwen3-Coder-480B-A35B-Instruct is designed for complex software engineering scenarios. It’s suited for advanced code generation, long-context processing such as repository-level analysis, and integration with external tools. Qwen3-Coder-30B-A3B-Instruct is particularly effective for tasks such as code completion, refactoring, and answering programming-related queries. If you need versatile performance across multiple domains, Qwen3-235B-A22B-Instruct-2507 offers a balance, delivering strong general-purpose reasoning and instruction-following capabilities while leveraging the efficiency advantages of its MoE architecture. Qwen3-32B (Dense) is appropriate for scenarios where consistent performance, low latency, and cost optimization are important.

Getting started with Qwen models in Amazon Bedrock
To begin using Qwen models, in the Amazon Bedrock console, I choose Model Access from the Configure and learn section of the navigation pane. I then navigate to the Qwen models to request access. In the Chat/Text Playground section of the navigation pane, I can quickly test the new Qwen models with my prompts.

To integrate Qwen3 models into my applications, I can use any AWS SDKs. The AWS SDKs include access to the Amazon Bedrock InvokeModel and Converse API. I can also use these model with any agentic framework that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore. For example, here’s the Python code of a simple agent with tool access built using Strands Agents:

from strands import Agent
from strands_tools import calculator

agent = Agent(
    model="qwen.qwen3-coder-480b-instruct-v1:0",
    tools=[calculator]
)

agent("Tell me the square root of 42 ^ 9")

with open("function.py", 'r') as f:
    my_function_code = f.read()

agent(f"Help me optimize this Python function for better performance:\n\n{my_function_code}")

Now available
Qwen models are available today in the following AWS Regions:

Qwen3-Coder-480B-A35B-Instruct is available in the US West (Oregon), Asia Pacific (Mumbai, Tokyo), and Europe (London, Stockholm) Regions.
Qwen3-Coder-30B-A3B-Instruct, Qwen3-235B-A22B-Instruct-2507, and Qwen3-32B are available in the US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Tokyo), Europe (Ireland, London, Milan, Stockholm), and South America (São Paulo) Regions.

Check the full Region list for future updates. You can start testing and building immediately without infrastructure setup or capacity planning. To learn more, visit the Qwen in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Try Qwen models on the Amazon Bedrock console now, and offer feedback through AWS re:Post for Amazon Bedrock or your typical AWS Support channels.

— Danilo

DeepSeek-V3.1 model now available in Amazon Bedrock

2025-09-19 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/deepseek-v3-1-now-available-in-amazon-bedrock/

In March, Amazon Web Services (AWS) became the first cloud service provider to deliver DeepSeek-R1 in a serverless way by launching it as a fully managed, generally available model in Amazon Bedrock. Since then, customers have used DeepSeek-R1’s capabilities through Amazon Bedrock to build generative AI applications, benefiting from the Bedrock’s robust guardrails and comprehensive tooling for safe AI deployment.

Today, I am excited to announce DeepSeek-V3.1 is now available as a fully managed foundation model in Amazon Bedrock. DeepSeek-V3.1 is a hybrid open weight model that switches between thinking mode (chain-of-thought reasoning) for detailed step-by-step analysis and non-thinking mode (direct answers) for faster responses.

According to DeepSeek, the thinking mode of DeepSeek-V3.1 achieves comparable answer quality with better results, stronger multi-step reasoning for complex search tasks, and big gains in thinking efficiency compared with DeepSeek-R1-0528.

(c)
https://api-docs.deepseek.com/news/news250821
Benchmarks	DeepSeek-V3.1	DeepSeek-R1-0528
Browsecomp	30.0	8.9
Browsecomp_zh	49.2	35.7
HLE	29.8	24.8
xbench-DeepSearch	71.2	55.0
Frames	83.7	82.0
SimpleQA	93.4	92.3
Seal0	42.6	29.7
SWE-bench Verified	66.0	44.6
SWE-bench Multilingual	54.5	30.5
Terminal-Bench	31.3	5.7

DeepSeek-V3.1 model performance in tool usage and agent tasks has significantly improved through post-training optimization compared to previous DeepSeek models. DeepSeek-V3.1 also supports over 100 languages with near-native proficiency, including significantly improved capability in low-resource languages lacking large monolingual or parallel corpora. You can build global applications to deliver enhanced accuracy and reduced hallucinations compared to previous DeepSeek models, while maintaining visibility into its decision-making process.

Here are your key use cases using this model:

Code generation – DeepSeek-V3.1 excels in coding tasks with improvements in software engineering benchmarks and code agent capabilities, making it ideal for automated code generation, debugging, and software engineering workflows. It performs well on coding benchmarks while delivering high-quality results efficiently.
Agentic AI tools – The model features enhanced tool calling through post-training optimization, making it strong in tool usage and agentic workflows. It supports structured tool calling, code agents, and search agents, positioning it as a solid choice for building autonomous AI systems.
Enterprise applications – DeepSeek models are integrated into various chat platforms and productivity tools, enhancing user interactions and supporting customer service workflows. The model’s multilingual capabilities and cultural sensitivity make it suitable for global enterprise applications.

As I mentioned in my previous post, when implementing publicly available models, give careful consideration to data privacy requirements when implementing in your production environments, check for bias in output, and monitor your results in terms of data security, responsible AI, and model evaluation.

You can access the enterprise-grade security features of Amazon Bedrock and implement safeguards customized to your application requirements and responsible AI policies with Amazon Bedrock Guardrails. You can also evaluate and compare models to identify the optimal model for your use cases by using Amazon Bedrock model evaluation tools.

Get started with the DeepSeek-V3.1 model in Amazon Bedrock
If you’re new to using the DeepSeek-V3.1 model, go to the Amazon Bedrock console, choose Model access under Bedrock configurations in the left navigation pane. To access the fully managed DeepSeek-V3.1 model, request access for DeepSeek-V3.1 in the DeepSeek section. You’ll then be granted access to the model in Amazon Bedrock.

Next, to test the DeepSeek-V3.1 model in Amazon Bedrock, choose Chat/Text under Playgrounds in the left menu pane. Then choose Select model in the upper left, and select DeepSeek as the category and DeepSeek-V3.1 as the model. Then choose Apply.

Using the selected DeepSeek-V3.1 model, I run the following prompt example about technical architecture decision.

Outline the high-level architecture for a scalable URL shortener service like bit.ly. Discuss key components like API design, database choice (SQL vs. NoSQL), how the redirect mechanism works, and how you would generate unique short codes.

You can turn the thinking on and off by toggling Model reasoning mode to generate a response’s chain of thought prior to the final conclusion.

You can also access the model using the AWS Command Line Interface (AWS CLI) and AWS SDK. This model supports both the InvokeModel and Converse API. You can check out a broad range of code examples for multiple use cases and a variety of programming languages.

To learn more, visit DeepSeek model inference parameters and responses in the AWS documentation.

Now available
DeepSeek-V3.1 is now available in the US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Mumbai), Europe (London), and Europe (Stockholm) AWS Regions. Check the full Region list for future updates. To learn more, check out the DeepSeek in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give the DeepSeek-V3.1 model a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

OpenAI open weight models now available on AWS

2025-08-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/openai-open-weight-models-now-available-on-aws/

AWS is committed to bringing you the most advanced foundation models (FMs) in the industry, continuously expanding our selection to include groundbreaking models from leading AI innovators so that you always have access to the latest advancements to drive your business forward.

Today, I am happy to announce the availability of two new OpenAI models with open weights in Amazon Bedrock and Amazon SageMaker JumpStart. OpenAI gpt-oss-120b and gpt-oss-20b models are designed for text generation and reasoning tasks, offering developers and organizations new options to build AI applications with complete control over their infrastructure and data.

These open weight models excel at coding, scientific analysis, and mathematical reasoning, with performance comparable to leading alternatives. Both models support a 128K context window and provide adjustable reasoning levels (low/medium/high) to match your specific use case requirements. The models support external tools to enhance their capabilities and can be used in an agentic workflow, for example, using a framework like Strands Agents.

With Amazon Bedrock and Amazon SageMaker JumpStart, AWS gives you the freedom to innovate with access to hundreds of FMs from leading AI companies, including OpenAI open weight models. With our comprehensive selection of models, you can match your AI workloads to the perfect model every time.

Through Amazon Bedrock, you can seamlessly experiment with different models, mix and match capabilities, and switch between providers without rewriting code—turning model choice into a strategic advantage that helps you continuously evolve your AI strategy as new innovations emerge. At launch, these new models are available in Bedrock via an OpenAI compatible endpoint. You can point the OpenAI SDK to this endpoint or use the Bedrock InvokeModel and Converse API.

With SageMaker JumpStart, you can quickly evaluate, compare, and customize models for your use case. You can then deploy the original or the customized model in production with the SageMaker AI console or using the SageMaker Python SDK.

Let’s see how these work in practice.

Getting started with OpenAI open weight models in Amazon Bedrock
In the Amazon Bedrock console, I choose Model access from the Configure and learn section of the navigation pane. Then, I navigate to the two listed OpenAI models on this page and request access.

Console screenshot

Now that I have access, I use the Chat/Test playground to test and evaluate the models. I select OpenAI as the category and then the gpt-oss-120b model.

Console screenshot

Using this model, I run the following sample prompt:

A family has $5,000 to save for their vacation next year. They can place the money in a savings account earning 2% interest annually or in a certificate of deposit earning 4% interest annually but with no access to the funds until the vacation. If they need $1,000 for emergency expenses during the year, how should they divide their money between the two options to maximize their vacation fund?

This prompt generates an output that includes the chain of thought used to produce the result.

I can use these models with the OpenAI SDK by configuring the API endpoint (base URL) and using an Amazon Bedrock API key for authentication. For example, I set this environment variables to use the US West (Oregon) AWS Region endpoint (us-west-2) and my Amazon Bedrock API key:

export OPENAI_API_KEY="<my-bedrock-api-key>"
export OPENAI_BASE_URL="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

Now I invoke the model using the OpenAI Python SDK.

client = OpenAI()

response = client.chat.completion.create(
    messages=[{
        "role": "user",
        "content": "Hello, how are you?"
    }],
    model="openai.gpt-oss-120b-1:0",
    stream=True
)

for item in response:
    print(item)

To build an AI agent, I can choose any framework that supports the Amazon Bedrock API or the OpenAI API. For example, here’s the starting code for Strands Agents using the Amazon Bedrock API:

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator

model = BedrockModel(
    model_id="openai.gpt-oss-120b-1:0"
)
agent = Agent(
    model=model,
    tools=[calculator]
)

agent("Tell me the square root of 42 ^ 3")

I save the code (app.py file), install the dependencies, and run the agent locally:

pip install strands-agents strands-agents-tools
python app.py

When I am satisfied with the agent, I can deploy in production using the capabilities offered by Amazon Bedrock AgentCore, including a fully managed serverless runtime and memory and identity management.

Getting started with OpenAI open weight models in Amazon SageMaker JumpStart
In the Amazon SageMaker AI console, you can use OpenAI open weight models in the SageMaker Studio. The first time I do this, I need to set up a SageMaker domain. There are options to set it up for a single user (simpler) or an organization. For these tests, I use a single user setup.

In the SageMaker JumpStart model view, I have access to a detailed description of the gpt-oss-120b or gpt-oss-20b model.

I choose the gpt-oss-20b model and then deploy the model. In the next steps, I select the instance type and the initial instance count. After a few minutes, the deployment creates an endpoint that I can then invoke in SageMaker Studio and using any AWS SDKs.

To learn more, visit GPT OSS models from OpenAI are now available on SageMaker JumpStart in the AWS Artificial Intelligence Blog.

Things to know
The new OpenAI open weight models are now available in Amazon Bedrock in the US West (Oregon) AWS Region, while Amazon SageMaker JumpStart supports these models in US East (Ohio, N. Virginia) and Asia Pacific (Mumbai, Tokyo).

Each model comes equipped with full chain-of-thought output capabilities, providing you with detailed visibility into the model’s reasoning process. This transparency is particularly valuable for applications requiring high levels of interpretability and validation. These models give you the freedom to modify, adapt, and customize them to your specific needs. This flexibility allows you to fine-tune the models for your unique use cases, integrate them into your existing workflows, and even build upon them to create new, specialized models tailored to your industry or application.

Security and safety are built into the core of these models, with comprehensive evaluation processes and safety measures in place. The models maintain compatibility with the standard GPT-4 tokenizer.

Both models can be used in your preferred environment, whether that’s through the serverless experience of Amazon Bedrock or the extensive machine learning (ML) development capabilities of SageMaker JumpStart. For information about the costs associated with using these models and services, visit the Amazon Bedrock pricing and Amazon SageMaker AI pricing pages.

To learn more, see the parameters for the models and the chat completions API in the Amazon Bedrock documentation.

Get started today with OpenAI open weight models on AWS in the Amazon Bedrock console or in Amazon SageMaker AI console.

– Danilo

Introducing Amazon Bedrock AgentCore: Securely deploy and operate AI agents at any scale (preview)

2025-07-16 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-bedrock-agentcore-securely-deploy-and-operate-ai-agents-at-any-scale/

In just a few years, foundation models (FMs) have evolved from being used directly to create content in response to a user’s prompt, to now powering AI agents, a new class of software applications that use FMs to reason, plan, act, learn, and adapt in pursuit of user-defined goals with limited human oversight. This new wave of agentic AI is enabled by the emergence of standardized protocols such as Model Context Protocol (MCP) and Agent2Agent (A2A) that simplify how agents connect with other tools and systems.

In fact, building AI agents that can reliably perform complex tasks has become increasingly accessible thanks to open source frameworks like CrewAI, LangGraph, and Strands Agents. However, moving from a promising proof-of-concept to a production-ready agent that can scale to thousands of users presents significant challenges.

Instead of being able to focus on the core features of the agent, developers and AI engineers have to spend months building foundational infrastructure for session management, identity controls, memory systems, and observability—at the same time supporting security and compliance.

Today, we’re excited to announce the preview of Amazon Bedrock AgentCore, a comprehensive set of enterprise-grade services that help developers quickly and securely deploy and operate AI agents at scale using any framework and model, hosted on Amazon Bedrock or elsewhere.

More specifically, we are introducing today:

AgentCore Runtime – Provides sandboxed low-latency serverless environments with session isolation, supporting any agent framework including popular open source frameworks, tools, and models, and handling multimodal workloads and long-running agents.

AgentCore Memory – Manages session and long-term memory, providing relevant context to models while helping agents learn from past interactions.

AgentCore Observability – Offers step-by-step visualization of agent execution with metadata tagging, custom scoring, trajectory inspection, and troubleshooting/debugging filters.

AgentCore Identity – Enables AI agents to securely access AWS services and third-party tools and services such as GitHub, Salesforce, and Slack, either on behalf of users or by themselves with pre-authorized user consent.

AgentCore Gateway – Transforms existing APIs and AWS Lambda functions into agent-ready tools, offering unified access across protocols, including MCP, and runtime discovery.

AgentCore Browser – Provides managed web browser instances to scale your agents’ web automation workflows.

AgentCore Code Interpreter – Offers an isolated environment to run the code your agents generate.

These services can be used individually and are optimized to work together so developers don’t need to spend time piecing together components. AgentCore can work with open source or custom AI agent frameworks, giving teams the flexibility to maintain their preferred tools while gaining enterprise capabilities. To integrate these services into their existing code, developers can use the AgentCore SDK.

You can now discover, buy, and run pre-built agents and agent tools from AWS Marketplace with AgentCore Runtime. With just a few lines of code, your agents can securely connect to API-based agents and tools from AWS Marketplace with AgentCore Gateway to help you run complex workflows while maintaining compliance and control.

AgentCore eliminates tedious infrastructure work and operational complexity so development teams can bring groundbreaking agentic solutions to market faster.

Let’s see how this works in practice. I’ll share more info on the services as we use them.

Deploying a production-ready customer support assistant with Amazon Bedrock AgentCore (Preview)
When customers reach out with an email, it takes time to provide a reply. Customer support needs to check the validity of the email, find who the actual customer is in the customer relationship management (CRM) system, check their orders, and use product-specific knowledge bases to find the information required to prepare an answer.

An AI agent can simplify that by connecting to the internal systems, retrieve contextual information using a semantic data source, and draft a reply for the support team. For this use case, I built a simple prototype using Strands Agents. For simplicity and to validate the scenario, the internal tools are simulated using Python functions.

When I talk to developers, they tell me that similar prototypes, covering different use cases, are being built in many companies. When these prototypes are demonstrated to the company leadership and receive confirmation to proceed, the development team has to define how to go in production and satisfy the usual requirements for security, performance, availability, and scalability. This is where AgentCore can help.

Step 1 – Deploying to the cloud with AgentCore Runtime

AgentCore Runtime is a new service to securely deploy, run, and scale AI agents, providing isolation so that each user session runs in its own protected environment to help prevent data leakage—a critical requirement for applications handling sensitive data.

To match different security postures, agents can use different network configurations:

Sandbox – To only communicate with allowlisted AWS services.

Public – To run with managed internet access.

VPC-only (coming soon) – This option will allow to access resources hosted in a customer’s VPC or connected via AWS PrivateLink endpoints.

To deploy the agent to the cloud and get a secure, serverless endpoint with AgentCore Runtime, I add to the prototype a few lines of code using the AgentCore SDK to:

Import the AgentCore SDK.
Create the AgentCore app.
Specify which function is the entry point to invoke the agent.

Using a different or custom agent framework is a matter of replacing the agent invocation inside the entry point function.

Here’s the code of the prototype. The three lines I added to use AgentCore Runtime are the ones preceded by a comment.

from strands import Agent, tool
from strands_tools import calculator, current_time

# Import the AgentCore SDK
from bedrock_agentcore.runtime import BedrockAgentCoreApp

WELCOME_MESSAGE = """
Welcome to the Customer Support Assistant! How can I help you today?
"""

SYSTEM_PROMPT = """
You are an helpful customer support assistant.
When provided with a customer email, gather all necessary info and prepare the response email.
When asked about an order, look for it and tell the full description and date of the order to the customer.
Don't mention the customer ID in your reply.
"""

@tool
def get_customer_id(email_address: str):
    if email_address == "[email protected]":
        return { "customer_id": 123 }
    else:
        return { "message": "customer not found" }

@tool
def get_orders(customer_id: int):
    if customer_id == 123:
        return [{
            "order_id": 1234,
            "items": [ "smartphone", "smartphone USB-C charger", "smartphone black cover"],
            "date": "20250607"
        }]
    else:
        return { "message": "no order found" }

@tool
def get_knowledge_base_info(topic: str):
    kb_info = []
    if "smartphone" in topic:
        if "cover" in topic:
            kb_info.append("To put on the cover, insert the bottom first, then push from the back up to the top.")
            kb_info.append("To remove the cover, push the top and bottom of the cover at the same time.")
        if "charger" in topic:
            kb_info.append("Input: 100-240V AC, 50/60Hz")
            kb_info.append("Includes US/UK/EU plug adapters")
    if len(kb_info) > 0:
        return kb_info
    else:
        return { "message": "no info found" }

# Create an AgentCore app
app = BedrockAgentCoreApp()

agent = Agent(
    system_prompt=SYSTEM_PROMPT,
    tools=[calculator, current_time, get_customer_id, get_orders, get_knowledge_base_info]
)

# Specify the entrypoint function invoking the agent
@app.entrypoint
def invoke(payload, context: RequestContext):
    """Handler for agent invocation"""
    user_message = payload.get(
        "prompt", "No prompt found in input, please guide customer to create a json payload with prompt key"
    )
    result = agent(user_message)
    return {"result": result.message}

if __name__ == "__main__":
    app.run()

I install the AgentCore SDK and the starter toolkit in the Python virtual environment:

pip install bedrock-agentcore bedrock-agentcore-starter-toolkit

After I activate the virtual environment, I have access to the AgentCore command line interface (CLI) provided by the starter toolkit.

First, I use agentcore configure --entrypoint my_agent.py -er <IAM_ROLE_ARN> to configure the agent, passing the AWS Identity and Access Management (IAM) role that the agent will assume. In this case, the agent needs access to Amazon Bedrock to invoke the model. The role can give access to other AWS resources used by an agent, such as an Amazon Simple Storage Service (Amazon S3) bucket or a Amazon DynamoDB table.

I launch the agent locally with agentcore launch --local. When running locally, I can interact with the agent using agentcore invoke --local <PAYLOAD>. The payload is passed to the entry point function. Note that the JSON syntax of the invocations is defined in the entry point function. In this case, I look for prompt in the JSON payload, but can use a different syntax depending on your use case.

When I am satisfied by local testing, I use agentcore launch to deploy to the cloud.

After the deployment is succesful and an endpoint has been created, I check the status of the endpoint with agentcore status and invoke the endpoint with agentcore invoke <PAYLOAD>. For example, I pass a customer support request in the invocation:

agentcore invoke '{"prompt": "From: [email protected] – Hi, I bought a smartphone from your store. I am traveling to Europe next week, will I be able to use the charger? Also, I struggle to remove the cover. Thanks, Danilo"}'

Step 2 – Enabling memory for context

After an agent has been deployed in the AgentCore Runtime, the context needs to be persisted to be available for a new invocation. I add AgentCore Memory to maintain session context using its short-term memory capabilities.

First, I create a memory client and the memory store for the conversations:

from bedrock_agentcore.memory import MemoryClient

memory_client = MemoryClient(region_name="us-east-1")

memory = memory_client.create_memory_and_wait(
    name="CustomerSupport", 
    description="Customer support conversations"
)

I can now use create_event to stores agent interactions into short-term memory:

memory_client.create_event(
    memory_id=memory.get("id"), # Identifies the memory store
    actor_id="user-123",        # Identifies the user
    session_id="session-456",   # Identifies the session
    messages=[
        ("Hi, ...", "USER"),
        ("I'm sorry to hear that...", "ASSISTANT"),
        ("get_orders(customer_id='123')", "TOOL"),
        . . .
    ]
)

I can load the most recent turns of a conversations from short-term memory using list_events:

conversations = memory_client.list_events(
    memory_id=memory.get("id"), # Identifies the memory store
    actor_id="user-123",        # Identifies the user 
    session_id="session-456",   # Identifies the session
    max_results=5               # Number of most recent turns to retrieve
)

With this capability, the agent can maintain context during long sessions. But when a users come back with a new session, the conversation starts blank. Using long-term memory, the agent can personalize user experiences by retaining insights across multiple interactions.

To extract memories from a conversation, I can use built-in AgentCore Memory policies for user preferences, summarization, and semantic memory (to capture facts) or create custom policies for specialized needs. Data is stored encrypted using a namespace-based storage for data segmentation.

I change the previous code creating the memory store to include long-term capabilities by passing a semantic memory strategy. Note that an existing memory store can be updated to add strategies. In that case, the new strategies are applied to newer events.

memory = memory_client.create_memory_and_wait(
    name="CustomerSupport", 
    description="Customer support conversations",
    strategies=[{
        "semanticMemoryStrategy": {
            "name": "semanticFacts",
            "namespaces": ["/facts/{actorId}"]
        }
    }]
)

After long-term memory has been configured for a memory store, calling create_event will automatically apply those strategies to extract information from the conversations. I can then retrieve memories extracted from the conversation using a semantic query:

memories = memory_client.retrieve_memories(
    memory_id=memory.get("id"),
    namespace="/facts/user-123",
    query="smartphone model"
)

In this way, I can quickly improve the user experience so that the agent remembers customer preferences and facts that are outside of the scope of the CRM and use this information to improve the replies.

Step 3 – Adding identity and access controls

Without proper identity controls, access from the agent to internal tools always uses the same access level. To follow security requirements, I integrate AgentCore Identity so that the agent can use access controls scoped to the user’s or agent’s identity context.

I set up an identity client and create a workload identity, a unique identifier that represents the agent within the AgentCore Identity system:

from bedrock_agentcore.services.identity import IdentityClient

identity_client = IdentityClient("us-east-1")
workload_identity = identity_client.create_workload_identity(name="my-agent")

Then, I configure the credential providers, for example:

google_provider = identity_client.create_oauth2_credential_provider(
    {
        "name": "google-workspace",
        "credentialProviderVendor": "GoogleOauth2",
        "oauth2ProviderConfigInput": {
            "googleOauth2ProviderConfig": {
                "clientId": "your-google-client-id",
                "clientSecret": "your-google-client-secret",
            }
        },
    }
)

perplexity_provider = identity_client.create_api_key_credential_provider(
    {
        "name": "perplexity-ai",
        "apiKey": "perplexity-api-key"
    }
)

I can then add the @requires_access_token Python decorator (passing the provider name, the scope, and so on) to the functions that need an access token to perform their activities.

Using this approach, the agent can verify the identity through the company’s existing identity infrastructure, operate as a distinct, authenticated identity, act with scoped permissions and integrate across multiple identity providers (such as Amazon Cognito, Okta, or Microsoft Entra ID) and service boundaries including AWS and third-party tools and services (such as Slack, GitHub, and Salesforce).

To offer robust and secure access controls while streamlining end-user and agent builder experiences, AgentCore Identity implements a secure token vault that stores users’ tokens and allows agents to retrieve them securely.

For OAuth 2.0 compatible tools and services, when a user first grants consent for an agent to act on their behalf, AgentCore Identity collects and stores the user’s tokens issued by the tool in its vault, along with securely storing the agent’s OAuth client credentials. Agents, operating with their own distinct identity and when invoked by the user, can then access these tokens as needed, reducing the need for frequent user consent.

When the user token expires, AgentCore Identity triggers a new authorization prompt to the user for the agent to obtain updated user tokens. For tools that use API keys, AgentCore Identity also stores these keys securely and gives agents controlled access to retrieve them when needed. This secure storage streamlines the user experience while maintaining robust access controls, enabling agents to operate effectively across various tools and services.

Step 4 – Expanding agent capabilities with AgentCore Gateway

Until now, all internal tools are simulated in the code. Many agent frameworks, including Strands Agents, natively support MCP to connect to remote tools. To have access to internal systems (such as CRM and order management) via an MCP interface, I use AgentCore Gateway.

With AgentCore Gateway, the agent can access AWS services using Smithy models, Lambda functions, and internal APIs and third-party providers using OpenAPI specifications. It employs a dual authentication model to have secure access control for both incoming requests and outbound connections to target resources. Lambda functions can be used to integrate external systems, particularly applications that lack standard APIs or require multiple steps to retrieve information.

AgentCore Gateway facilitates cross-cutting features that most customers would otherwise need to build themselves, including authentication, authorization, throttling, custom request/response transformation (to match underlying API formats), multitenancy, and tool selection.

The tool selection feature helps find the most relevant tools for a specific agent’s task. AgentCore Gateway brings a uniform MCP interface across all these tools, using AgentCore Identity to provide an OAuth interface for tools that do not support OAuth out of the box like AWS services.

Step 5 – Adding capabilities with AgentCore Code Interpreter and Browser tools

To answer to customer requests, the customer support agent needs to perform calculations. To simplify that, I use the AgentCode SDK to add access to the AgentCore Code Interpreter.

Similarly, some of the integrations required by the agent don’t implement a programmatic API but need to be accessed through a web interface. I give access to the AgentCore Browser to let the agent navigate those web sites autonomously.

Step 6 – Gaining visibility with observability

Now that the agent is in production, I need visibility into its activities and performance. AgentCore provides enhanced observability to help developers effectively debug, audit, and monitor their agent performance in production. It comes with built-in dashboards to track essential operational metrics such as session count, latency, duration, token usage, error rates, and component-level latency and error breakdowns. AgentCore also gives visibility into an agent’s behavior by capturing and visualizing both the end-to-end traces, as well as “spans” that capture each step of the agent workflow including tool invocations, memory

The built-in dashboards offered by this service help reveal performance bottlenecks and identify why certain interactions might fail, enabling continuous improvement and reducing the mean time to detect (MTTD) and mean time to repair (MTTR) in case of issues.

AgentCore supports OpenTelemetry to help integrate agent telemetry data with existing observability platforms, including Amazon CloudWatch, Datadog, LangSmith, and Langfuse.

Step 7 – Conclusion

Through this journey, we transformed a local prototype into a production-ready system. Using AgentCore modular approach, we implemented enterprise requirements incrementally—from basic deployment to sophisticated memory, identity management, and tool integration—all while maintaining the existing agent code.

Things to know
Amazon Bedrock AgentCore is available in preview in US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt). You can start using AgentCore services through the AWS Management Console , the AWS Command Line Interface (AWS CLI), the AWS SDKs, or via the AgentCore SDK.

You can try AgentCore services at no charge until September 16, 2025. Standard AWS pricing applies to any additional AWS Services used as part of using AgentCore (for example, CloudWatch pricing will apply for AgentCore Observability). Starting September 17, 2025, AWS will bill you for AgentCore service usage based on this page.

Whether you’re building customer support agents, workflow automation, or innovative AI-powered experiences, AgentCore provides the foundation you need to move from prototype to production with confidence.

To learn more and start deploying production-ready agents, visit the AgentCore documentation. For code examples and integration guides, check out the AgentCore samples GitHub repo.

Join the AgentCore Preview Discord server to provide feedback and discuss use cases. We’d like to hear from you!

— Danilo

TwelveLabs video understanding models are now available in Amazon Bedrock

2025-07-16 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/twelvelabs-video-understanding-models-are-now-available-in-amazon-bedrock/

Earlier this year, we preannounced that TwelveLabs video understanding models were coming to Amazon Bedrock. Today, we’re announcing the models are now available for searching through videos, classifying scenes, summarizing, and extracting insights with precision and reliability.

TwelveLabs has introduced Marengo, a video embedding model proficient at performing tasks such as search and classification, and Pegasus, a video language model that can generate text based on video data. These models are trained on Amazon SageMaker HyperPod to deliver groundbreaking video analysis that provides text summaries, metadata generation, and creative optimization.

With the TwelveLabs models in Amazon Bedrock, you can find specific moments using natural language video search capabilities like “show me the first touchdown of the game” or “find the scene where the main characters first meet” and instantly jump to those exact moments. You can also build applications to understand video content by generating descriptive text such as titles, topics, hashtags, summaries, chapters, or highlights for discovering insights and connections without requiring predefined labels or categories.

For example, you can find recurring themes in customer feedback or spot product usage patterns that weren’t obvious before. Whether you have hundreds or thousands of hours of video content, you can now transform that entire library into a searchable knowledge resource while maintaining enterprise-grade security and performance.

Let’s take a look at the Marengo and Pegasus videos that TwelveLabs has published.

You can transform video workflows with these models across industries. Media producers and editors can instantly locate specific scenes or dialogue, which means you can focus on storytelling rather than sifting through hours of footage. Marketing teams are streamlining their advertising workflows by quickly personalizing content to resonate with various audiences, while security teams are using the technology to proactively identify potential risks by spotting patterns across multiple video feeds.

Getting started with TwelveLabs models in Amazon Bedrock
Before getting started, if you’re new to using TwelveLabs models, go to the Amazon Bedrock console and choose Model access in the bottom left navigation pane. To access the latest TwelveLabs models, request access for Marengo Embed 2.7 and Pegasus 1.2 in TwelveLabs.

To use TwelveLabs models in Amazon Bedrock, choose Chat/Text Playground under Test in the left navigation pane. Choose Select model, select TwelveLabs as the category and Pegasus as the model, and then choose Apply.

For searching or generating text from your video, you should upload your video to an Amazon Simple Storage Service (Amazon S3) bucket or input Base64-formatted video string.

I will use a sample video generated with Amazon Nova Reel with a prompt: “A cute raccoon playing guitar underwater.”

Enter the S3 URI of your uploaded video and the S3 Bucket Owner, and then run your prompt: “Tell me about the video by timeline”.

You can get following model response:

The video opens with an underwater scene featuring a sandy seabed and swaying seaweed, creating a serene atmosphere. At around 5 seconds, a raccoon appears, swimming towards the camera with a curious expression. By 10 seconds, the raccoon is seen swimming near the seabed. The focus then shifts to a group of koi fish swimming gracefully near the seabed from 11 to 13 seconds.

At 14 seconds, the scene transitions to an ukulele resting on the seabed, with the camera zooming in to highlight its details. The raccoon reappears at 20 seconds, swimming among the koi fish, and by 23 seconds, it begins to play the ukulele. The raccoon continues to play the ukulele from 29 to 41 seconds, with the koi fish swimming around it, adding a lively and colorful touch to the underwater setting.

From 42 to 49 seconds, the raccoon is shown playing the ukulele with the koi fish swimming around it. The video concludes with a screen displaying "Video generated with Amazon Novo Reel 1.1" and "Available in Amazon Bedrock" from 50 to 54 seconds, followed by the AWS logo from 55 to 57 seconds.

The TwelveLabs models can be easily integrated into your applications using the Amazon Bedrock Converse API, which provides a unified interface for conversational AI interactions.

Here’s an example of how to use the AWS SDK for Python (Boto3) with the TwelveLabs Pegasus model:

import boto3
import json
import os

AWS_REGION = "us-east-1"
MODEL_ID = "twelvelabs.pegasus-1-2-v1:0"
VIDEO_PATH = "sample.mp4"

def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name=AWS_REGION
)

request_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "inputPrompt": "tell me about the video",
                    "mediaSource: {
                        "base64String": read_file(VIDEO_PATH)
                    }
                },
            ],
        }
    ]
}

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=request_body["messages"]
)

print(response["output"]["message"]["content"][-1]["text"])

The TwelveLabs Marengo Embed 2.7 model generates vector embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning (ML) tasks. The model supports asynchronous inference through the Bedrock AsyncInvokeModel API.

For video source, you can request JSON format for the TwelveLabs Marengo Embed 2.7 model using the AsyncInvokeModel API.

{
    "modelId": "twelvelabs.marengo-embed-2.7",
    "modelInput": {
        "inputType": "video",
        "mediaSource": {
            "s3Location": {
                "uri": "s3://your-video-object-s3-path",
                "bucketOwner": "your-video-object-s3-bucket-owner-account"
            }
        }
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

You can get a response delivered to the specified S3 location.

{
    "embedding": [0.345, -0.678, 0.901, ...],
    "embeddingOption": "visual-text",
    "startSec": 0.0,
    "endSec": 5.0
}

To help you get started, check out a broad range of code examples for multiple use cases and a variety of programming languages. To learn more, visit TwelveLabs Pegasus 1.2 and TwelveLabs Marengo Embed 2.7 in the AWS Documentation.

Now available
TwelveLabs models are generally available today in Amazon Bedrock: the Marengo model in the US East (N. Virginia), Europe (Ireland), and Asia Pacific (Seoul) Region, and the Pegasus model in US West (Oregon), and Europe (Ireland) Region accessible with cross-Region inference from US and Europe Regions. Check the full Region list for future updates. To learn more, visit the TwelveLabs in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give TwelveLabs models a try on the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Amazon FSx for Lustre launches new storage class with the lowest-cost and only fully elastic Lustre ﬁle storage

2025-05-29 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/amazon-fsx-for-lustre-adds-new-storage-class-with-the-lowest-cost-and-only-fully-elastic-lustre-file-storage/

Seismic imaging is a geophysical technique used to create detailed pictures of the Earth’s subsurface structure. It works by generating seismic waves that travel into the ground, reﬂect oﬀ various rock layers and structures, and return to the surface where they’re detected by sensitive instruments known as geophones or hydrophones. The huge volumes of acquired data often reach petabytes for a single survey and this presents signiﬁcant storage, processing, and management challenges for researchers and energy companies.

Customers who run these seismic imaging workloads or other high performance computing (HPC) workloads, such as weather forecasting, advanced driver-assistance system (ADAS) training, or genomics analysis, already store the huge volumes of data on either hard disk drive (HDD)-based or a combination of HDD and solid state drive (SSD) file storage on premises. However, as these on premises datasets and workloads scale, customers find it increasingly challenging and expensive due to the need to make upfront capital investments to keep up with performance needs of their workloads and avoid running out of storage capacity.

Today, we’re announcing the general availability of the Amazon FSx for Lustre Intelligent-Tiering, a new storage class that delivers virtually unlimited scalability, the only fully elastic Lustre ﬁle storage, and the lowest cost Lustre file storage in the cloud. With a starting price of less than $0.005 per GB-month, FSx for Lustre Intelligent-Tiering oﬀers the lowest cost high-performance file storage in the cloud, reducing storage costs for infrequently accessed data by up to 96 percent compared to other managed Lustre options. Elasticity means you no longer need to provision storage capacity upfront because your file system will grow and shrink as you add or delete data, and you pay only for the amount of data you store.

FSx for Lustre Intelligent-Tiering automatically optimizes costs by tiering cold data to the applicable lower-cost storage tier based on access patterns and includes an optional SSD read cache to improve performance for your most latency sensitive workloads. Intelligent-Tiering delivers high performance whether you’re starting with gigabytes of experimental data or working with large petabyte-scale datasets for your most demanding artificial intelligence/machine learning (AI/ML) and HPC workloads. With the flexibility to adjust your file system’s performance independent of storage, Intelligent-Tiering delivers up to 34 percent better price performance than on premises HDD file systems. The Intelligent-Tiering storage class is optimized for HDD-based or mixed HDD/SSD workloads that have a combination of hot and cold data. You can migrate and run such workloads to FSx for Lustre Intelligent-Tiering without application changes, eliminating storage capacity planning and management, while paying only for the resources that you use.

Prior to this launch, customers used the FSx for Lustre SSD storage class to accelerate ML and HPC workloads that need all-SSD performance and consistent low-latency access to all data. However, many workloads have a combination of hot and cold data and they don’t need all-SSD storage for colder portions of the data. FSx for Lustre is increasingly used in AI/ML workloads to increase graphics processing unit (GPU) utilization, and now it’s even more cost optimized to be one of the options for these workloads.

FSx for Lustre Intelligent-Tiering
Your data moves between three storage tiers (Frequent Access, Infrequent Access, and Archive) with no effort on your part, so you get automatic cost savings with no upfront costs or commitments. The tiering works as follows:

Frequent Access – Data that has been accessed within the last 30 days is stored in this tier.

Infrequent Access – Data that hasn’t been accessed for 30 – 90 days is stored in this tier, at a 44 percent cost reduction from Frequent Access.

Archive – Data that hasn’t been accessed for 90 or more days is stored in this tier, at a 65 percent cost reduction compared to Infrequent Access.

Regardless of the storage tier, your data is stored across multiple AWS Availability Zones for redundancy and availability, compared to typical on-premises implementations, which are usually confined within a single physical location. Additionally, your data can be retrieved instantly in milliseconds.

Creating a file system
I can create a file system using the AWS Management Console, AWS Command Line Interface (AWS CLI), API, or AWS CloudFormation. On the console, I choose Create file system to get started.

I select Amazon FSx for Lustre and choose Next.

Now, it’s time to enter the rest of the information to create the file system. I enter a name (veliswa_fsxINT_1) for my file system, and for deployment and storage class, I select Persistent, Intelligent-Tiering. I choose the desired Throughput capacity and the Metadata IOPS. The SSD read cache will be automatically configured by FSx for Lustre based on the specified throughput capacity. I leave the rest as the default, choose Next, and review my choices to create my file system.

With Amazon FSx for Lustre Intelligent-Tiering, you have the flexibility to provision the necessary performance for your workloads without having to provision any underlying storage capacity upfront.

I wanted to know which values were editable after creation, so I paid closer attention before finalizing the creation of the file system. I noted that Throughput capacity, Metadata IOPS, Security groups, SSD read cache, and a few others were editable later. After I start running the ML jobs, it might be necessary to increase the throughput capacity based on the volumes of data I’ll be processing, so this information is important to me.

The file system is now available. Considering that I’ll be running HPC workloads, I anticipate that I’ll be processing high volumes of data later, so I’ll increase the throughput capacity to 24 GB/s. After all, I only pay for the resources I use.

The SSD read cache is scaled automatically as your performance needs increase. You can adjust the cache size any time independently in user-provisioned mode or disable the read cache if you don’t need low-latency access.

Good to know

FSx for Lustre Intelligent-Tiering is designed to deliver up to multiple terabytes per second of total throughput.
FSx for Lustre with Elastic Fabric Adapter (EFA)/GPU Direct Storage (GDS) support provides up to 12x (up to 1200 Gbps) higher per-client throughput compared to the previous FSx for Lustre systems.
It can deliver up to tens of millions of IOPS for writes and cached reads. Data in the SSD read cache has submillisecond time-to-first-byte latencies, and all other data has time-to-first-byte latencies in the range of tens of milliseconds.

Now available
Here are a couple of things to keep in mind:

FSx Intelligent-Tiering storage class is available in the new FSx for Lustre file systems in the US East (N. Virginia, Ohio), US West (N. California, Oregon), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), and Asia Pacific (Hong Kong, Mumbai, Seoul, Singapore, Sydney, Tokyo) AWS Regions.

You pay for data and metadata you store on your file system (GB/months). When you write data or when you read data that is not in the SSD read cache, you pay per operation. You pay for the total throughput capacity (in MBps/month), metadata IOPS (IOPS/month), and SSD read cache size for data and metadata (GB/month) you provision on your file system. To learn more, visit the Amazon FSx for Lustre Pricing page. To learn more about Amazon FSx for Lustre including this feature, visit the Amazon FSx for Lustre page.

Give Amazon FSx for Lustre Intelligent-Tiering a try in the Amazon FSx console today and send feedback to AWS re:Post for Amazon FSx for Lustre or through your usual AWS Support contacts.

– Veliswa.

How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Introducing Claude 4 in Amazon Bedrock, the most powerful models for coding from Anthropic

2025-05-22 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/claude-opus-4-anthropics-most-powerful-model-for-coding-is-now-in-amazon-bedrock/

Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Both models are now generally available in Amazon Bedrock, giving developers immediate access to both the model’s advanced reasoning and agentic capabilities.

Amazon Bedrock expands your AI choices with Anthropic’s most advanced models, giving you the freedom to build transformative applications with enterprise-grade security and responsible AI controls. Both models extend what’s possible with AI systems by improving task planning, tool use, and agent steerability.

With Opus 4’s advanced intelligence, you can build agents that handle long-running, high-context tasks like refactoring large codebases, synthesizing research, or coordinating cross-functional enterprise operations. Sonnet 4 is optimized for efficiency at scale, making it a strong fit as a subagent or for high-volume tasks like code reviews, bug fixes, and production-grade content generation.

When building with generative AI, many developers work on long-horizon tasks. These workflows require deep, sustained reasoning, often involving multistep processes, planning across large contexts, and synthesizing diverse inputs over extended timeframes. Good examples of these workflows are developer AI agents that help you to refactor or transform large projects. Existing models may respond quickly and fluently, but maintaining coherence and context over time—especially in areas like coding, research, or enterprise workflows—can still be challenging.

Claude Opus 4
Claude Opus 4 is the most advanced model to date from Anthropic, designed for building sophisticated AI agents that can reason, plan, and execute complex tasks with minimal oversight. Anthropic benchmarks show it is the best coding model available on the market today. It excels in software development scenarios where extended context, deep reasoning, and adaptive execution are critical. Developers can use Opus 4 to write and refactor code across entire projects, manage full-stack architectures, or design agentic systems that break down high-level goals into executable steps. It demonstrates strong performance on coding and agent-focused benchmarks like SWE-bench and TAU-bench, making it a natural choice for building agents that handle multistep development workflows. For example, Opus 4 can analyze technical documentation, plan a software implementation, write the required code, and iteratively refine it—while tracking requirements and architectural context throughout the process.

Claude Sonnet 4
Claude Sonnet 4 complements Opus 4 by balancing performance, responsiveness, and cost, making it well-suited for high-volume production workloads. It’s optimized for everyday development tasks with enhanced performance, such as powering code reviews, implementing bug ﬁxes, and new feature development with immediate feedback loops. It can also power production-ready AI assistants for near real-time applications. Sonnet 4 is a drop-in replacement from Claude Sonnet 3.7. In multi-agent systems, Sonnet 4 performs well as a task-speciﬁc subagent—handling responsibilities like targeted code reviews, search and retrieval, or isolated feature development within a broader pipeline. You can also use Sonnet 4 to manage continuous integration and delivery (CI/CD) pipelines, perform bug triage, or integrate APIs, all while maintaining high throughput and developer-aligned output.

Opus 4 and Sonnet 4 are hybrid reasoning models offering two modes: near-instant responses and extended thinking for deeper reasoning. You can choose near-instant responses for interactive applications, or enable extended thinking when a request benefits from deeper analysis and planning. Thinking is especially useful for long-context reasoning tasks in areas like software engineering, math, or scientific research. By configuring the model’s thinking budget—for example, by setting a maximum token count—you can tune the tradeoff between latency and answer depth to fit your workload.

How to get started
To see Opus 4 or Sonnet 4 in action, enable the new model in your AWS account. Then, you can start coding using the Bedrock Converse API with model IDanthropic.claude-opus-4-20250514-v1:0 for Opus 4 and anthropic.claude-sonnet-4-20250514-v1:0 for Sonnet 4. We recommend using the Converse API, because it provides a consistent API that works with all Amazon Bedrock models that support messages. This means you can write code one time and use it with different models.

For example, let’s imagine I write an agent to review code before merging changes in a code repository. I write the following code that uses the Bedrock Converse API to send a system and user prompts. Then, the agent consumes the streamed result.

private let modelId = "us.anthropic.claude-sonnet-4-20250514-v1:0"

// Define the system prompt that instructs Claude how to respond
let systemPrompt = """
You are a senior iOS developer with deep expertise in Swift, especially Swift 6 concurrency. Your job is to perform a code review focused on identifying concurrency-related edge cases, potential race conditions, and misuse of Swift concurrency primitives such as Task, TaskGroup, Sendable, @MainActor, and @preconcurrency.

You should review the code carefully and flag any patterns or logic that may cause unexpected behavior in concurrent environments, such as accessing shared mutable state without proper isolation, incorrect actor usage, or non-Sendable types crossing concurrency boundaries.

Explain your reasoning in precise technical terms, and provide recommendations to improve safety, predictability, and correctness. When appropriate, suggest concrete code changes or refactorings using idiomatic Swift 6
"""
let system: BedrockRuntimeClientTypes.SystemContentBlock = .text(systemPrompt)

// Create the user message with text prompt and image
let userPrompt = """
Can you review the following Swift code for concurrency issues? Let me know what could go wrong and how to fix it.
"""
let prompt: BedrockRuntimeClientTypes.ContentBlock = .text(userPrompt)

// Create the user message with both text and image content
let userMessage = BedrockRuntimeClientTypes.Message(
    content: [prompt],
    role: .user
)

// Initialize the messages array with the user message
var messages: [BedrockRuntimeClientTypes.Message] = []
messages.append(userMessage)

// Configure the inference parameters
let inferenceConfig: BedrockRuntimeClientTypes.InferenceConfiguration = .init(maxTokens: 4096, temperature: 0.0)

// Create the input for the Converse API with streaming
let input = ConverseStreamInput(inferenceConfig: inferenceConfig, messages: messages, modelId: modelId, system: [system])

// Make the streaming request
do {
    // Process the stream
    let response = try await bedrockClient.converseStream(input: input)

    // Iterate through the stream events
    for try await event in stream {
        switch event {
        case .messagestart:
            print("AI-assistant started to stream"")

        case let .contentblockdelta(deltaEvent):
            // Handle text content as it arrives
            if case let .text(text) = deltaEvent.delta {
                self.streamedResponse + = text
                print(text, termination: "")
            }

        case .messagestop:
            print("\n\nStream ended")
            // Create a complete assistant message from the streamed response
            let assistantMessage = BedrockRuntimeClientTypes.Message(
                content: [.text(self.streamedResponse)],
                role: .assistant
            )
            messages.append(assistantMessage)

        default:
            break
        }
    }

To help you get started, my colleague Dennis maintains a broad range of code examples for multiple use cases and a variety of programming languages.

Available today in Amazon Bedrock
This release gives developers immediate access in Amazon Bedrock, a fully managed, serverless service, to the next generation of Claude models developed by Anthropic. Whether you’re already building with Claude in Amazon Bedrock or just getting started, this seamless access makes it faster to experiment, prototype, and scale with cutting-edge foundation models—without managing infrastructure or complex integrations.

Claude Opus 4 is available in the following AWS Regions in North America: US East (Ohio, N. Virginia) and US West (Oregon). Claude Sonnet 4 is available not only in AWS Regions in North America but also in APAC, and Europe: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Hyderabad, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), and Europe (Spain). You can access the two models through cross-Region inference. Cross-Region inference helps to automatically select the optimal AWS Region within your geography to process your inference request.

Opus 4 tackles your most challenging development tasks, while Sonnet 4 excels at routine work with its optimal balance of speed and capability.

Learn more about the pricing and how to use these new models in Amazon Bedrock today!

— seb

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

2025-05-20 Nita Shah

Post Syndicated from Nita Shah original https://aws.amazon.com/blogs/big-data/empower-financial-analytics-by-creating-structured-knowledge-bases-using-amazon-bedrock-and-amazon-redshift/

Traditionally, financial data analysis could require deep SQL expertise and database knowledge. Now with Amazon Bedrock Knowledge Bases integration with structured data, you can use simple, natural language prompts to query complex financial datasets. By combining the AI capabilities of Amazon Bedrock with an Amazon Redshift data warehouse, individuals with varied levels of technical expertise can quickly generate valuable insights, making sure that data-driven decision-making is no longer limited to those with specialized programming skills.

With the support for structured data retrieval using Amazon Bedrock Knowledge Bases, you can now use natural language querying to retrieve structured data from your data sources, such as Amazon Redshift. This enables applications to seamlessly integrate natural language processing capabilities on structured data through simple API calls. Developers can rapidly implement sophisticated data querying features without complex coding—just connect to the API endpoints and let users explore financial data using plain English. From customer portals to internal dashboards and mobile apps, this API-driven approach makes enterprise-grade data analysis accessible to everyone in your organization. Using structured data from a Redshift data warehouse, you can efficiently and quickly build generative AI applications for tasks such as text generation, sentiment analysis, or data translation.

In this post, we showcase how financial planners, advisors, or bankers can now ask questions in natural language, such as, “Give me the name of the customer with the highest number of accounts?” or “Give me details of all accounts for a specific customer.” These prompts will receive precise data from the customer databases for accounts, investments, loans, and transactions. Amazon Bedrock Knowledge Bases automatically translates these natural language queries into optimized SQL statements, thereby accelerating time to insight, enabling faster discoveries and efficient decision-making.

Solution overview

To illustrate the new Amazon Bedrock Knowledge Bases integration with structured data in Amazon Redshift, we will build a conversational AI-powered assistant for financial assistance that is designed to help answer financial inquiries, like “Who has the most accounts?” or “Give details of the customer with the highest loan amount.”

We will build a solution using sample financial datasets and set up Amazon Redshift as the knowledge base. Users and applications will be able to access this information using natural language prompts.

The following diagram provides an overview of the solution.

For building and running this solution, the steps include:

Load sample financial datasets.
Enable Amazon Bedrock large language model (LLM) access for Amazon Nova Pro.
Create an Amazon Bedrock knowledge base referencing structured data in Amazon Redshift.
Ask queries and get responses in natural language.

To implement the solution, we use a sample financial dataset that is for demonstration purposes only. The same implementation approach can be adapted to your specific datasets and use cases.

Download the SQL script to run the implementation steps in Amazon Redshift Query Editor V2. If you’re using another SQL editor, you can copy and paste the SQL queries either from this post or from the downloaded notebook.

Prerequisites

Make sure your meet the following prerequisites:

Have an AWS account.
Create an Amazon Redshift Serverless workgroup or provisioned cluster. For setup instructions, see Creating a workgroup with a namespace or Create a sample Amazon Redshift database, respectively. The Amazon Bedrock integration feature is supported in both Amazon Redshift provisioned and serverless.
Create an AWS Identity and Access Management (IAM) role. For instructions, see Creating or updating an IAM role for Amazon Redshift ML integration with Amazon Bedrock.
Associate the IAM role to a Redshift instance.
Set up the required permissions for Amazon Bedrock Knowledge Bases to connect with Amazon Redshift.

Load sample financial data

To load the finance datasets to Amazon Redshift, complete the following steps:

Open the Amazon Redshift Query Editor V2 or another SQL editor of your choice and connect to the Redshift database.

Run the following SQL to create the finance data tables and load sample data:

-- Create table
CREATE TABLE accounts (
    id integer ,
    account_id integer PRIMARY KEY,
    customer_id integer,
    account_type character varying(256),
    opening_date date,
    balance bigint,
    currency character varying(256)
);

CREATE TABLE customer (
    id integer,
    customer_id integer PRIMARY KEY ,
    name character varying(256) ,
    age integer,
    gender character varying(256) ,
    address character varying(256) ,
    phone character varying(256) ,
    email character varying(256)
);

CREATE TABLE investments (
    id integer ,
    investment_id integer PRIMARY KEY,
    customer_id integer ,
    investment_type character varying(256) ,
    investment_name character varying(256) ,
    purchase_date date ,
    purchase_price bigint ,
    quantity integer 
);


CREATE TABLE loans (
    id integer ,
    loan_id integer PRIMARY KEY,
    customer_id integer ,
    loan_type character varying(256) ,
    loan_amount bigint ,
    interest_rate integer ,
    start_date date ,
    end_date date 
);

CREATE TABLE orders (
    id integer ,
    order_id integer PRIMARY KEY,
    customer_id integer ,
    order_type character varying(256) ,
    order_date date ,
    investment_id integer ,
    quantity integer ,
    price integer 
);

CREATE TABLE transactions (
    id integer ,
    transaction_id integer PRIMARY KEY ,
    account_id integer REFERENCES accounts(account_id),
    transaction_type character varying(256) ,
    transaction_date date ,
    amount integer ,
    description character varying(256) 
);

Download the sample financial dataset to your local storage and unzip the zipped folder.
Create an Amazon Simple Storage Service (Amazon S3) bucket with a unique name. For instructions, refer to Creating a general purpose bucket.
Upload the downloaded files into your newly created S3 bucket.

Using the following COPY command statements, load the datasets from Amazon S3 into the new tables you created in Amazon Redshift. Replace <<your_s3_bucket>> with the name of your S3 bucket and <<your_region>> with your AWS Region.

-- Load sample data
COPY accounts FROM 's3://<<your_s3_bucket>>/accounts.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';

COPY customer FROM 's3://<<your_s3_bucket>>/customer.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
COPY investments FROM 's3://<<your_s3_bucket>>/investments.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
COPY loans FROM 's3://<<your_s3_bucket>>/loans.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
COPY orders FROM 's3://<<your_s3_bucket>>/orders.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';
COPY transactions FROM 's3://<<your_s3_bucket>>/transactions.csv' IAM_ROLE DEFAULT FORMAT AS CSV DELIMITER ',' QUOTE '"' IGNOREHEADER 1 REGION AS '<<your_region>>';

Enable LLM access

With Amazon Bedrock, you can access state-of-the-art AI models from providers like Anthropic, AI21 Labs, Stability AI, and Amazon’s own foundation models (FMs). These include Anthropic’s Claude 2, which excels at complex reasoning and content generation; Jurassic-2 from AI21 Labs, known for its multilingual capabilities; Stable Diffusion from Stability AI for image generation; and Amazon Titan models for various text and embedding tasks. For this demo, we use Amazon Bedrock to access the Amazon Nova FMs. Specifically, we use the Amazon Nova Pro model, which is a highly capable multimodal model designed for a wide range of tasks like video summarization, Q&A, mathematical reasoning, software development, and AI agents, including high speed and accuracy for text summarization tasks.

Make sure you have the required IAM permissions to enable access to available Amazon Bedrock Nova FMs. Then complete the following steps to enable model access in Amazon Bedrock:

On the Amazon Bedrock console, in the navigation pane, choose Model access.
Choose Enable specific models.
Search for Amazon Nova models, select Nova Pro, and choose Next.
Review the selection and choose Submit.

Create an Amazon Bedrock knowledge base referencing structured data in Amazon Redshift

Amazon Bedrock Knowledge Bases uses Amazon Redshift as the query engine to query your data. It reads metadata from your structured data store to generate SQL queries. There are different supported authentication methods to create the Amazon Bedrock knowledge base using Amazon Redshift. For more information, refer to the Set up query engine for your structured data store in Amazon Bedrock Knowledge Bases.

For this post, we create an Amazon Bedrock knowledge base for the Redshift database and sync the data using IAM authentication.

If you’re creating an Amazon Bedrock knowledge base through the AWS Management Console, you can skip the service role setup mentioned in the previous section. It automatically creates one with the necessary permissions for Amazon Bedrock Knowledge Bases to retrieve data from your new knowledge base and generate SQL queries for structured data stores.

When creating an Amazon Bedrock knowledge base using an API, you must attach IAM policies that grant permissions to create and manage knowledge bases with connected data stores. Refer to Prerequisites for creating an Amazon Bedrock Knowledge Base with a structured data store for instructions.

Complete the following steps to create an Amazon Bedrock knowledge base using structured data:

On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
Choose Create and choose Knowledge Base with structure data store from the dropdown menu.
Provide the following details for your knowledge base:
1. Enter a name and optional description.
2. Select Amazon Redshift as the query engine.
3. Select Create and use a new service role for resource management.
4. Make note of this newly created IAM role.
5. Choose Next to proceed to the next part of the setup process.
6. Configure the query engine:
  - Select Redshift Serverless (Amazon Redshift provisioned is also supported).
  - Choose your Redshift workgroup.
  - Use the IAM role created earlier.
  - Under Default storage metadata, select Amazon Redshift databases and for Database, choose dev.
  - You can customize settings by adding specific contexts to enhance the accuracy of the results.
  - Choose Next.
7. Complete creating your knowledge base.
8. Record the generated service role details.
9. Next, grant appropriate access to the service role for Amazon Bedrock Knowledge Bases through the Amazon Redshift Query Editor V2. Update <your Service Role name> in the following statements with your service role, and update the value for <your schema>.
```
CREATE USER "IAMR:<your Service Role name>" WITH PASSWORD DISABLE;
SELECT * FROM PG_USER; -- To verify that the user is created.
GRANT SELECT ON ALL TABLES IN SCHEMA <your schema> TO "IAMR:<your Service Role name>";
--You can also Restricting access to certain tables for finer-grained control on the tables that can be accessed as shown below
GRANT SELECT ON TABLE customer to "IAMR:<your Service Role name>";
GRANT SELECT ON TABLE loan to "IAMR:<your Service Role name>";
```

Now you can update the knowledge base with the Redshift database.

On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
Open the knowledge base you created.
Select the dev Redshift database and choose Sync.

It may take a few minutes for the status to display as COMPLETE.

Ask queries and get responses in natural language

You can set up your application to query the knowledge base or attach the knowledge base to an agent by deploying your knowledge base for your AI application. For this demo, we use a native testing interface on the Amazon Bedrock Knowledge Bases console.

To ask questions in natural language on the knowledge base for Redshift data, complete the following steps:

On the Amazon Bedrock console, open the details page for your knowledge base.
Choose Test.
Choose your category (Amazon), model (Nova Pro), and inference settings (On demand), and choose Apply.
In the right pane of the console, test the knowledge base setup with Amazon Redshift by asking a few simple questions in natural language, such as “How many tables do I have in the database?” or “Give me list of all tables in the database.”

The following screenshot shows our results.

To view the generated query from your Amazon Redshift based knowledge base, choose Show details next to the response.
Next, ask questions related to the financial datasets loaded in Amazon Redshift using natural language prompts, such as, “Give me the name of the customer with the highest number of accounts” or “Give the details of all accounts for customer Deanna McCoy.”

The following screenshot shows the responses in natural language.

Using natural language queries in Amazon Bedrock, you were able to retrieve responses from the structured financial data stored in Amazon Redshift.

Considerations

In this section, we discuss some important considerations when using this solution.

Security and compliance

When integrating Amazon Bedrock with Amazon Redshift, implementing robust security measures is crucial. To protect your systems and data, implement essential safeguards including restricted database roles, read-only database instances, and proper input validation. These measures help prevent unauthorized access and potential system vulnerabilities. For more information, see Allow your Amazon Bedrock Knowledge Bases service role to access your data store.

Cost

You incur a cost for converting natural language to text based on SQL. To learn more, refer to Amazon Bedrock pricing.

Use custom contexts

To improve query accuracy, you can enhance SQL generation by providing custom context in two key ways. First, specify which tables to include or exclude, focusing the model on relevant data structures. Second, supply curated queries as examples, demonstrating the types of SQL queries you expect. These curated queries serve as valuable reference points, guiding the model to generate more accurate and relevant SQL outputs tailored to your specific needs. For more information, refer to Create a knowledge base by connecting to a structured data store.

For different workgroups, you can create separate knowledge bases for each group, with access only to their specific tables. Control data access by setting up role-based permissions in Amazon Redshift, verifying each role can only view and query authorized tables.

Clean up

To avoid incurring future charges, delete the Redshift Serverless instance or provisioned data warehouse created as part of the prerequisite steps.

Conclusion

Generative AI applications provide significant advantages in structured data management and analysis. The key benefits include:

Using natural language processing – This makes data warehouses more accessible and user-friendly
Enhancing customer experience – By providing more intuitive data interactions, it boosts overall customer satisfaction and engagement
Simplifying data warehouse navigation – Users can understand and explore data warehouse content through natural language interactions, improving ease of use
Improving operational efficiency – By automating routine tasks, it allows human resources to focus on more complex and strategic activities

In this post, we showed how the natural language querying capabilities of Amazon Bedrock Knowledge Bases when integrated with Amazon Redshift enables rapid solution development. This is particularly valuable for the finance industry, where financial planners, advisors, or bankers face challenges in accessing and analyzing large volumes of financial data in a secured and performant manner.

By enabling natural language interactions, you can bypass the traditional barriers of understanding database structures and SQL queries, and quickly access insights and provide real-time support. This streamlined approach accelerates decision-making and drives innovation by making complex data analysis accessible to non-technical users.

For additional details on Amazon Bedrock and Amazon Redshift integration, refer to Amazon Redshift ML integration with Amazon Bedrock.

About the authors

Nita Shah is an Analytics Specialist Solutions Architect at AWS based out of New York. She has been building data warehouse solutions for over 20 years and specializes in Amazon Redshift. She is focused on helping customers design and build enterprise-scale well-architected analytics and decision support platforms.

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Strategic customers architect their data workloads on AWS. With a background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based in Florida and enjoys traveling, reading and playing tennis.

Jonathan Katz is a Principal Product Manager – Technical on the Amazon Redshift team and is based in New York. He is a Core Team member of the open source PostgreSQL project and an active open source contributor, including PostgreSQL and the pgvector project.

Llama 4 models from Meta now available in Amazon Bedrock serverless

2025-04-29 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/llama-4-models-from-meta-now-available-in-amazon-bedrock-serverless/

The newest AI models from Meta, Llama 4 Scout 17B and Llama 4 Maverick 17B, are now available as a fully managed, serverless option in Amazon Bedrock. These new foundation models (FMs) deliver natively multimodal capabilities with early fusion technology that you can use for precise image grounding and extended context processing in your applications.

Llama 4 uses an innovative mixture-of-experts (MoE) architecture that provides enhanced performance across reasoning and image understanding tasks while optimizing for both cost and speed. This architectural approach enables Llama 4 to offer improved performance at lower cost compared to Llama 3, with expanded language support for global applications.

The models were already available on Amazon SageMaker JumpStart, and you can now use them in Amazon Bedrock to streamline building and scaling generative AI applications with enterprise-grade security and privacy.

Llama 4 Maverick 17B – A natively multimodal model featuring 128 experts and 400 billion total parameters. It excels in image and text understanding, making it suitable for versatile assistant and chat applications. The model supports a 1 million token context window, giving you the flexibility to process lengthy documents and complex inputs.

Llama 4 Scout 17B – A general-purpose multimodal model with 16 experts, 17 billion active parameters, and 109 billion total parameters that delivers superior performance compared to all previous Llama models. Amazon Bedrock currently supports a 3.5 million token context window for Llama 4 Scout, with plans to expand in the near future.

Use cases for Llama 4 models
You can use the advanced capabilities of Llama 4 models for a wide range of use cases across industries:

Enterprise applications – Build intelligent agents that can reason across tools and workflows, process multimodal inputs, and deliver high-quality responses for business applications.

Multilingual assistants – Create chat applications that understand images and provide high-quality responses across multiple languages, making them accessible to global audiences.

Code and document intelligence – Develop applications that can understand code, extract structured data from documents, and provide insightful analysis across large volumes of text and code.

Customer support – Enhance support systems with image analysis capabilities, enabling more effective problem resolution when customers share screenshots or photos.

Content creation – Generate creative content across multiple languages, with the ability to understand and respond to visual inputs.

Research – Build research applications that can integrate and analyze multimodal data, providing insights across text and images.

Using Llama 4 models in Amazon Bedrock
To use these new serverless models in Amazon Bedrock, I first need to request access. In the Amazon Bedrock console, I choose Model access from the navigation pane to toggle access to Llama 4 Maverick 17B and Llama 4 Scout 17B models.

The Llama 4 models can be easily integrated into your applications using the Amazon Bedrock Converse API, which provides a unified interface for conversational AI interactions.

Here’s an example of how to use the AWS SDK for Python (Boto3) with Llama 4 Maverick for a multimodal conversation:

import boto3
import json
import os

AWS_REGION = "us-west-2"
MODEL_ID = "us.meta.llama4-maverick-17b-instruct-v1:0"
IMAGE_PATH = "image.jpg"


def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    extension = os.path.splitext(filename)[1].lower()[1:] or 'txt'
    if extension == 'jpg':
        extension = 'jpeg'
    return extension


def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name=AWS_REGION
)

request_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "What can you tell me about this image?"
                },
                {
                    "image": {
                        "format": get_file_extension(IMAGE_PATH),
                        "source": {"bytes": read_file(IMAGE_PATH)},
                    }
                },
            ],
        }
    ]
}

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=request_body["messages"]
)

print(response["output"]["message"]["content"][-1]["text"])

This example demonstrates how to send both text and image inputs to the model and receive a conversational response. The Converse API abstracts away the complexity of working with different model input formats, providing a consistent interface across models in Amazon Bedrock.

For more interactive use cases, you can also use the streaming capabilities of the Converse API:

response_stream = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=request_body['messages']
)

stream = response_stream.get('stream')
if stream:
    for event in stream:

        if 'messageStart' in event:
            print(f"\nRole: {event['messageStart']['role']}")

        if 'contentBlockDelta' in event:
            print(event['contentBlockDelta']['delta']['text'], end="")

        if 'messageStop' in event:
            print(f"\nStop reason: {event['messageStop']['stopReason']}")

        if 'metadata' in event:
            metadata = event['metadata']
            if 'usage' in metadata:
                print(f"Usage: {json.dumps(metadata['usage'], indent=4)}")
            if 'metrics' in metadata:
                print(f"Metrics: {json.dumps(metadata['metrics'], indent=4)}")

With streaming, your applications can provide a more responsive experience by displaying model outputs as they are generated.

Things to know
The Llama 4 models are available today with a fully managed, serverless experience in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) AWS Regions. You can also access Llama 4 in US East (Ohio) via cross-region inference.

As usual with Amazon Bedrock, you pay for what you use. For more information, see Amazon Bedrock pricing.

These models support 12 languages for text (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese) and English when processing images.

To start using these new models today, visit the Meta Llama models section in the Amazon Bedrock User Guide. You can also explore how our Builder communities are using Amazon Bedrock in their solutions in the generative AI section of our community.aws site.

— Danilo

How is the News Blog doing? Take this 1 minute survey!

Writer Palmyra X5 and X4 foundation models are now available in Amazon Bedrock

2025-04-28 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/writer-palmyra-x5-and-x4-foundation-models-are-now-available-in-amazon-bedrock/

One thing we’ve witnessed in recent months is the expansion of context windows in foundation models (FMs), with many now handling sequence lengths that would have been unimaginable just a year ago. However, building AI-powered applications that can process vast amounts of information while maintaining the reliability and security standards required for enterprise use remains challenging.

For these reasons, we’re excited to announce that Writer Palmyra X5 and X4 models are available today in Amazon Bedrock as a fully managed, serverless offering. AWS is the first major cloud provider to deliver fully managed models from Writer. Palmyra X5 is a new model launched today by Writer. Palmyra X4 was previously available in Amazon Bedrock Marketplace.

Writer Palmyra models offer robust reasoning capabilities that support complex agent-based workflows while maintaining enterprise security standards and reliability. Palmyra X5 features a one million token context window, and Palmyra X4 supports a 128K token context window. With these extensive context windows, these models remove some of the traditional constraints for app and agent development, enabling deeper analysis and more comprehensive task completion.

With this launch, Amazon Bedrock continues to bring access to the most advanced models and the tools you need to build generative AI applications with security, privacy, and responsible AI.

As a pioneer in FM development, Writer trains and fine-tunes its industry leading models on Amazon SageMaker HyperPod. With its optimized distributed training environment, Writer reduces training time and brings its models to market faster.

Palmyra X5 and X4 use cases
Writer Palmyra X5 and X4 are designed specifically for enterprise use cases, combining powerful capabilities with stringent security measures, including System and Organization Controls (SOC) 2, Payment Card Industry Data Security Standard (PCI DSS), and Health Insurance Portability and Accountability Act (HIPAA) compliance certifications.

Palmyra X5 and X4 models excel in various enterprise use cases across multiple industries:

Financial services – Palmyra models power solutions across investment banking and asset and wealth management, including deal transaction support, 10-Q, 10-K and earnings transcript highlights, fund and market research, and personalized client outreach at scale.

Healthcare and life science – Payors and providers use Palmyra models to build solutions for member acquisition and onboarding, appeals and grievances, case and utilization management, and employer request for proposal (RFP) response. Pharmaceutical companies use these models for commercial applications, medical affairs, R&D, and clinical trials.

Retail and consumer goods – Palmyra models enable AI solutions for product description creation and variation, performance analysis, SEO updates, brand and compliance reviews, automated campaign workflows, and RFP analysis and response.

Technology – Companies across the technology sector implement Palmyra models for personalized and account-based marketing, content creation, campaign workflow automation, account preparation and research, knowledge support, job briefs and candidate reports, and RFP responses.

Palmyra models support a comprehensive suite of enterprise-grade capabilities, including:

Adaptive thinking – Hybrid models combining advanced reasoning with enterprise-grade reliability, excelling at complex problem-solving and sophisticated decision-making processes.

Multistep tool-calling – Support for advanced tool-calling capabilities that can be used in complex multistep workflows and agentic actions, including interaction with enterprise systems to perform tasks like updating systems, executing transactions, sending emails, and triggering workflows.

Enterprise-grade reliability – Consistent, accurate results while maintaining strict quality standards required for enterprise use, with models specifically trained on business content to align outputs with professional standards.

Using Palmyra X5 and X4 in Amazon Bedrock
As for all new serverless models in Amazon Bedrock, I need to request access first. In the Amazon Bedrock console, I choose Model access from the navigation pane to enable access to Palmyra X5 and Palmyra X4 models.

When I have access to the models, I can start building applications with any AWS SDKs using the Amazon Bedrock Converse API. The models use cross-Region inference with these inference profiles:

For Palmyra X5: us.writer.palmyra-x5-v1:0
For Palmyra X4: us.writer.palmyra-x4-v1:0

Here’s a sample implementation with the AWS SDK for Python (Boto3). In this scenario, there is a new version of an existing product. I need to prepare a detailed comparison of what’s new. I have the old and new product manuals. I use the large input context of Palmyra X5 to read and compare the two versions of the manual and prepare a first draft of the comparison document.

import sys
import os
import boto3
import re

AWS_REGION = "us-west-2"
MODEL_ID = "us.writer.palmyra-x5-v1:0"
DEFAULT_OUTPUT_FILE = "product_comparison.md"

def create_bedrock_runtime_client(region: str = AWS_REGION):
    """Create and return a Bedrock client."""
    return boto3.client('bedrock-runtime', region_name=region)

def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    return os.path.splitext(filename)[1].lower()[1:] or 'txt'

def sanitize_document_name(filename: str) -> str:
    """Sanitize document name."""
    # Remove extension and get base name
    name = os.path.splitext(filename)[0]
    
    # Replace invalid characters with space
    name = re.sub(r'[^a-zA-Z0-9\s\-\(\)\[\]]', ' ', name)
    
    # Replace multiple spaces with single space
    name = re.sub(r'\s+', ' ', name)
    
    # Strip leading/trailing spaces
    return name.strip()

def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

def generate_comparison(client, document1: bytes, document2: bytes, filename1: str, filename2: str) -> str:
    """Generate a markdown comparison of two product manuals."""
    print(f"Generating comparison for {filename1} and {filename2}")
    try:
        response = client.converse(
            modelId=MODEL_ID,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "text": "Please compare these two product manuals and create a detailed comparison in markdown format. Focus on comparing key features, specifications, and highlight the main differences between the products."
                        },
                        {
                            "document": {
                                "format": get_file_extension(filename1),
                                "name": sanitize_document_name(filename1),
                                "source": {
                                    "bytes": document1
                                }
                            }
                        },
                        {
                            "document": {
                                "format": get_file_extension(filename2),
                                "name": sanitize_document_name(filename2),
                                "source": {
                                    "bytes": document2
                                }
                            }
                        }
                    ]
                }
            ]
        )
        return response['output']['message']['content'][0]['text']
    except Exception as e:
        raise Exception(f"Error generating comparison: {str(e)}")

def main():
    if len(sys.argv) < 3 or len(sys.argv) > 4:
        cmd = sys.argv[0]
        print(f"Usage: {cmd} <manual1_path> <manual2_path> [output_file]")
        sys.exit(1)

    manual1_path = sys.argv[1]
    manual2_path = sys.argv[2]
    output_file = sys.argv[3] if len(sys.argv) == 4 else DEFAULT_OUTPUT_FILE
    paths = [manual1_path, manual2_path]

    # Check each file's existence
    for path in paths:
        if not os.path.exists(path):
            print(f"Error: File does not exist: {path}")
            sys.exit(1)

    try:
        # Create Bedrock client
        bedrock_runtime = create_bedrock_runtime_client()

        # Read both manuals
        print("Reading documents...")
        manual1_content = read_file(manual1_path)
        manual2_content = read_file(manual2_path)

        # Generate comparison directly from the documents
        print("Generating comparison...")
        comparison = generate_comparison(
            bedrock_runtime,
            manual1_content,
            manual2_content,
            os.path.basename(manual1_path),
            os.path.basename(manual2_path)
        )

        # Save comparison to file
        with open(output_file, 'w') as f:
            f.write(comparison)

        print(f"Comparison generated successfully! Saved to {output_file}")

    except Exception as e:
        print(f"Error: {str(e)}")
        sys.exit(1)

if __name__ == "__main__":
    main()

To learn how to use Amazon Bedrock with AWS SDKs, browse the code samples in the Amazon Bedrock User Guide.

Things to know
Writer Palmyra X5 and X4 models are available in Amazon Bedrock today in the US West (Oregon) AWS Region with cross-Region inference. For the most up-to-date information on model support by Region, refer to the Amazon Bedrock documentation. For information on pricing, visit Amazon Bedrock pricing.

These models support English, Spanish, French, German, Chinese, and multiple other languages, making them suitable for global enterprise applications.

Using the expansive context capabilities of these models, developers can build more sophisticated applications and agents that can process extensive documents, perform complex multistep reasoning, and handle sophisticated agentic workflows.

To start using Writer Palmyra X5 and X4 models today, visit the Writer model section in the Amazon Bedrock User Guide. You can also explore how our Builder communities are using Amazon Bedrock in their solutions in the generative AI section of our community.aws site.

Let us know what you build with these powerful new capabilities!

— Danilo

How is the News Blog doing? Take this 1 minute survey!

AI-DLC Workflow Overview

Prerequisites

Let’s Begin Building!

Step 1: Clone GitHub repo containing the AI-DLC Q Developer Rules

Step 2: Load Q Developer Rules in your project workspace

Step 3: Install and authenticate Amazon Q Developer Extension in IDE

Step 4: Start the AI-DLC workflow by entering a high-level problem statement

Step 5: Workspace Detection

Step 6: Requirements Analysis

Step 7: Workflow Planning

Step 8: Code Generation Planning

Step 9: Code Generation

Step 10: Build and Test

Let’s Solve the Puzzle!

Conclusion

Cleanup

What is the Machine Learning Lens?

Machine Learning Lens components

What else is discussed in the Machine Learning Lens?

What’s new in the updated Machine Learning Lens?

Who should use the Machine Learning Lens?

Next steps

About the authors

The Responsible AI Lens: Embedding trust in AI systems

The Machine Learning Lens: Foundation for ML workloads

The Generative AI Lens: Specialized guidance for foundation models

Implementation strategy for Well-Architected AI/ML guidance: A unified approach

What are the next steps?

About the authors

Solution overview

Prerequisites

Load sample financial data

Enable LLM access

Create an Amazon Bedrock knowledge base referencing structured data in Amazon Redshift

Ask queries and get responses in natural language

Considerations

Security and compliance

Cost

Use custom contexts

Clean up

Conclusion

About the authors

The collective thoughts of the interwebz