All posts by Danilo Poccia

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

2024-12-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/

Today, Amazon Bedrock has introduced in preview two capabilities that help reduce costs and latency for generative AI applications:

Amazon Bedrock Intelligent Prompt Routing – When invoking a model, you can now use a combination of foundation models (FMs) from the same model family to help optimize for quality and cost. For example, with the Anthropic’s Claude model family, Amazon Bedrock can intelligently route requests between Claude 3.5 Sonnet and Claude 3 Haiku depending on the complexity of the prompt. Similarly, Amazon Bedrock can route requests between Meta Llama 3.1 70B and 8B. The prompt router predicts which model will provide the best performance for each request while optimizing the quality of response and cost. This is particularly useful for applications such as customer service assistants, where uncomplicated queries can be handled by smaller, faster, and more cost-effective models, and complex queries are routed to more capable models. Intelligent Prompt Routing can reduce costs by up to 30 percent without compromising on accuracy.

Amazon Bedrock now supports prompt caching – You can now cache frequently used context in prompts across multiple model invocations. This is especially valuable for applications that repeatedly use the same context, such as document Q&A systems where users ask multiple questions about the same document or coding assistants that need to maintain context about code files. The cached context remains available for up to 5 minutes after each access. Prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.

These features make it easier to reduce latency and balance performance with cost efficiency. Let’s look at how you can use them in your applications.

Using Amazon Bedrock Intelligent Prompt Routing in the console
Amazon Bedrock Intelligent Prompt Routing uses advanced prompt matching and model understanding techniques to predict the performance of each model for every request, optimizing for quality of responses and cost. During the preview, you can use the default prompt routers for Anthropic’s Claude and Meta Llama model families.

Intelligent prompt routing can be accessed through the AWS Management Console, the AWS Command Line Interface (AWS CLI), and the AWS SDKs. In the Amazon Bedrock console, I choose Prompt routers in the Foundation models section of the navigation pane.

I choose the Anthropic Prompt Router default router to get more information.

From the configuration of the prompt router, I see that it’s routing requests between Claude 3.5 Sonnet and Claude 3 Haiku using cross-Region inference profiles. The routing criteria defines the quality difference between the response of the largest model and the smallest model for each prompt as predicted by the router internal model at runtime. The fallback model, used when none of the chosen models meet the desired performance criteria, is Anthropic’s Claude 3.5 Sonnet.

I choose Open in Playground to chat using the prompt router and enter this prompt:

Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?

The result is quickly provided. I choose the new Router metrics icon on the right to see which model was selected by the prompt router. In this case, because the question is rather complex, Anthropic’s Claude 3.5 Sonnet was used.

Now I ask a straightforward question to the same prompt router:

Describe the purpose of a 'hello world' program in one line.

This time, Anthropic’s Claude 3 Haiku has been selected by the prompt router.

I select the Meta Prompt Router to check its configuration. It’s using the cross-Region inference profiles for Llama 3.1 70B and 8B with the 70B model as fallback.

Prompt routers are integrated with other Amazon Bedrock capabilities, such as Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents, or when performing evaluations. For example, here I create a model evaluation to help me compare, for my use case, a prompt router to another model or prompt router.

To use a prompt router in an application, I need to set the prompt router Amazon Resource Name (ARN) as model ID in the Amazon Bedrock API. Let’s see how this works with the AWS CLI and an AWS SDK.

Using Amazon Bedrock Intelligent Prompt Routing with the AWS CLI
The Amazon Bedrock API has been extended to handle prompt routers. For example, I can list the existing prompt routes in an AWS Region using ListPromptRouters:

aws bedrock list-prompt-routers

In output, I receive a summary of the existing prompt routers, similar to what I saw in the console.

Here’s the full output of the previous command:

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}

I can get information about a specific prompt router using GetPromptRouter with a prompt router ARN. For example, for the Meta Llama model family:

aws bedrock get-prompt-router --prompt-router-arn arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1

{
    "promptRouterName": "Meta Prompt Router",
    "routingCriteria": {
        "responseQualityDifference": 0.0
    },
    "description": "Routes requests among models in the LLaMA family",
    "createdAt": "2024-11-20T00:00:00+00:00",
    "updatedAt": "2024-11-20T00:00:00+00:00",
    "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
        },
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
        }
    ],
    "fallbackModel": {
        "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
    },
    "status": "AVAILABLE",
    "type": "default"
}

To use a prompt router with Amazon Bedrock, I set the prompt router ARN as model ID when making API calls. For example, here I use the Anthropic Prompt Router with the AWS CLI and the Amazon Bedrock Converse API:

aws bedrock-runtime converse \
    --model-id arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1 \
    --messages '[{ "role": "user", "content": [ { "text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?" } ] }]' \

In output, invocations using a prompt router include a new trace section that tells which model was actually used. In this case, it’s Anthropic’s Claude 3.5 Sonnet:

{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "To solve this problem, let's think it through step-by-step:\n\n1) First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2) Now, we need to consider who Alice's brothers' sisters are:\n   - Alice herself is a sister to all her brothers\n   - All of Alice's sisters are also sisters to Alice's brothers\n\n3) So, the total number of sisters that Alice's brothers have is:\n   - The number of Alice's sisters (M)\n   - Plus Alice herself (+1)\n\n4) Therefore, the answer can be expressed as: M + 1\n\nThus, Alice's brothers have M + 1 sisters."
                }
            ]
        }
    },
    . . .
    "trace": {
        "promptRouter": {
            "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
        }
    }
}

Using Amazon Bedrock Intelligent Prompt Routing with an AWS SDK
Using an AWS SDK with a prompt router is similar to the previous command line experience. When invoking a model, I set the model ID to the prompt model ARN. For example, in this Python code I’m using the Meta Llama router with the ConverseStream API:

import json
import boto3

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1"

user_message = "Describe the purpose of a 'hello world' program in one line."
messages = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

streaming_response = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=messages,
)

for chunk in streaming_response["stream"]:
    if "contentBlockDelta" in chunk:
        text = chunk["contentBlockDelta"]["delta"]["text"]
        print(text, end="")
    if "messageStop" in chunk:
        print()
    if "metadata" in chunk:
        if "trace" in chunk["metadata"]:
            print(json.dumps(chunk['metadata']['trace'], indent=2))

This script prints the response text and the content of the trace in response metadata. For this uncomplicated request, the faster and more affordable model has been selected by the prompt router:

A "Hello World" program is a simple, introductory program that serves as a basic example to demonstrate the fundamental syntax and functionality of a programming language, typically used to verify that a development environment is set up correctly.
{
  "promptRouter": {
    "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
  }
}

Using prompt caching with an AWS SDK
You can use prompt caching with the Amazon Bedrock Converse API. When you tag content for caching and send it to the model for the first time, the model processes the input and saves the intermediate results in a cache. For subsequent requests containing the same content, the model loads the preprocessed results from the cache, significantly reducing both costs and latency.

You can implement prompt caching in your applications with a few steps:

Identify the portions of your prompts that are frequently reused.
Tag these sections for caching in the list of messages using the new cachePoint block.
Monitor cache usage and latency improvements in the response metadata usage section.

Here’s an example of implementing prompt caching when working with documents.

First, I download three decision guides in PDF format from the AWS website. These guides help choose the AWS services that fit your use case.

Then, I use a Python script to ask three questions about the documents. In the code, I create a converse() function to handle the conversation with the model. The first time I call the function, I include a list of documents and a flag to add a cachePoint block.

import json

import boto3

MODEL_ID = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
AWS_REGION = "us-west-2"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name=AWS_REGION,
)

DOCS = [
    "bedrock-or-sagemaker.pdf",
    "generative-ai-on-aws-how-to-choose.pdf",
    "machine-learning-on-aws-how-to-choose.pdf",
]

messages = []


def converse(new_message, docs=[], cache=False):

    if len(messages) == 0 or messages[-1]["role"] != "user":
        messages.append({"role": "user", "content": []})

    for doc in docs:
        print(f"Adding document: {doc}")
        name, format = doc.rsplit('.', maxsplit=1)
        with open(doc, "rb") as f:
            bytes = f.read()
        messages[-1]["content"].append({
            "document": {
                "name": name,
                "format": format,
                "source": {"bytes": bytes},
            }
        })

    messages[-1]["content"].append({"text": new_message})

    if cache:
        messages[-1]["content"].append({"cachePoint": {"type": "default"}})

    response = bedrock_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))

    messages.append(output_message)


converse("Compare AWS Trainium and AWS Inferentia in 20 words or less.", docs=DOCS, cache=True)
converse("Compare Amazon Textract and Amazon Transcribe in 20 words or less.")
converse("Compare Amazon Q Business and Amazon Q Developer in 20 words or less.")

For each invocation, the script prints the response and the usage counters.

Adding document: bedrock-or-sagemaker.pdf
Adding document: generative-ai-on-aws-how-to-choose.pdf
Adding document: machine-learning-on-aws-how-to-choose.pdf
Response text:
AWS Trainium is optimized for machine learning training, while AWS Inferentia is designed for low-cost, high-performance machine learning inference.
Usage:
{
  "inputTokens": 4,
  "outputTokens": 34,
  "totalTokens": 29879,
  "cacheReadInputTokenCount": 0,
  "cacheWriteInputTokenCount": 29841
}
Response text:
Amazon Textract extracts text and data from documents, while Amazon Transcribe converts speech to text from audio or video files.
Usage:
{
  "inputTokens": 59,
  "outputTokens": 30,
  "totalTokens": 29930,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}
Response text:
Amazon Q Business answers questions using enterprise data, while Amazon Q Developer assists with building and operating AWS applications and services.
Usage:
{
  "inputTokens": 108,
  "outputTokens": 26,
  "totalTokens": 29975,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}

The usage section of the response contains two new counters: cacheReadInputTokenCount and cacheWriteInputTokenCount. The total number of tokens for an invocation is the sum of the input and output tokens plus the tokens read and written into the cache.

Each invocation processes a list of messages. The messages in the first invocation contain the documents, the first question, and the cache point. Because the messages preceding the cache point aren’t currently in the cache, they’re written to cache. According to the usage counters, 29,841 tokens have been written into the cache.

"cacheWriteInputTokenCount": 29841

For the next invocations, the previous response and the new question are appended to the list of messages. The messages before the cachePoint are not changed and found in the cache.

As expected, we can tell from the usage counters that the same number of tokens previously written is now read from the cache.

"cacheReadInputTokenCount": 29841

In my tests, the next invocations take 55 percent less time to complete compared to the first one. Depending on your use case (for example, with more cached content), prompt caching can improve latency up to 85 percent.

Depending on the model, you can set more than one cache point in a list of messages. To find the right cache points for your use case, try different configurations and look at the effect on the reported usage.

Things to know
Amazon Bedrock Intelligent Prompt Routing is available in preview today in US East (N. Virginia) and US West (Oregon) AWS Regions. During the preview, you can use the default prompt routers, and there is no additional cost for using a prompt router. You pay the cost of the selected model. You can use prompt routers with other Amazon Bedrock capabilities such as performing evaluations, using knowledge bases, and configuring agents.

Because the internal model used by the prompt routers needs to understand the complexity of a prompt, intelligent prompt routing currently only supports English language prompts.

Amazon Bedrock support for prompt caching is available in preview in US West (Oregon) for Anthropic’s Claude 3.5 Sonnet V2 and Claude 3.5 Haiku. Prompt caching is also available in US East (N. Virginia) for Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro.

With prompt caching, cache reads receive a 90 percent discount compared to noncached input tokens. There are no additional infrastructure charges for cache storage. When using Anthropic models, you pay an additional cost for tokens written in the cache. There are no additional costs for cache writes with Amazon Nova models. For more information, see Amazon Bedrock pricing.

When using prompt caching, content is cached for up to 5 minutes, with each cache hit resetting this countdown. Prompt caching has been implemented to transparently support cross-Region inference. In this way, your applications can get the cost optimization and latency benefit of prompt caching with the flexibility of cross-Region inference.

These new capabilities make it easier to build cost-effective and high-performing generative AI applications. By intelligently routing requests and caching frequently used content, you can significantly reduce your costs while maintaining and even improving application performance.

To learn more and start using these new capabilities today, visit the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws.

— Danilo

Amazon Bedrock Marketplace: Access over 100 foundation models in one place

2024-12-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-bedrock-marketplace-access-over-100-foundation-models-in-one-place/

Today, we’re introducing Amazon Bedrock Marketplace, a new capability that gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. With this launch, you can now discover, test, and deploy new models from enterprise providers such as IBM and Nvidia, specialized models such as Upstages’ Solar Pro for Korean language processing, and Evolutionary Scale’s ESM3 for protein research, alongside Amazon Bedrock general-purpose FMs from providers such as Anthropic and Meta.

Models deployed with Amazon Bedrock Marketplace can be accessed through the same standard APIs as the serverless models and, for models which are compatible with Converse API, be used with tools such as Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases.

As generative AI continues to reshape how organizations work, the need for specialized models optimized for specific domains, languages, or tasks is growing. However, finding and evaluating these models can be challenging and costly. You need to discover them across different services, build abstractions to use them in your applications, and create complex security and governance layers. Amazon Bedrock Marketplace addresses these challenges by providing a single interface to access both specialized and general-purpose FMs.

Using Amazon Bedrock Marketplace
To get started, in the Amazon Bedrock console, I choose Model catalog in the Foundation models section of the navigation pane. Here, I can search for models that help me with a specific use case or language. The results of the search include both serverless models and models available in Amazon Bedrock Marketplace. I can filter results by provider, modality (such as text, image, or audio), or task (such as classification or text summarization).

In the catalog, there are models from organizations like Arcee AI, which builds context-adapted small language models (SLMs), and Widn.AI, which provides multilingual models.

For example, I am interested in the IBM Granite models and search for models from IBM Data and AI.

I select Granite 3.0 2B Instruct, a language model designed for enterprise applications. Choosing the model opens the model detail page where I can see more information from the model provider such as highlights about the model, pricing, and usage including sample API calls.

This specific model requires a subscription, and I choose View subscription options.

From the subscription dialog, I review pricing and legal notes. In Pricing details, I see the software price set by the provider. For this model, there are no additional costs on top of the deployed infrastructure. The Amazon SageMaker infrastructure cost is charged separately and can be seen in Amazon SageMaker pricing.

To proceed with this model, I choose Subscribe.

After the subscription has been completed, which usually takes a few minutes, I can deploy the model. For Deployment details, I use the default settings and the recommended instance type.

I expand the optional Advanced settings. Here, I can choose to deploy in a virtual private cloud (VPC) or specify the AWS Identity and Access Management (IAM) service role used by the deployment. Amazon Bedrock Marketplace automatically creates a service role to access Amazon Simple Storage Service (Amazon S3) buckets where the model weights are stored, but I can choose to use an existing role.

I keep the default values and complete the deployment.

After a few minutes, the deployment is In Service and can be reviewed in the Marketplace deployments page from the navigation pane.

There, I can choose an endpoint to view details and edit the configuration such as the number of instances. To test the deployment, I choose Open in playground and ask for some poetry.

I can also select the model from the Chat/text page of the Playground using the new Marketplace category where the deployed endpoints are listed.

In a similar way, I can use the model with other tools such as Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Prompt Management, Amazon Bedrock Guardrails, and model evaluations, by choosing Select Model and selecting the Marketplace model endpoint.

The model I used here is text-to-text, but I can use Amazon Bedrock Marketplace to deploy models with different modalities. For example, after I deploy Stability AI Stable Diffusion 3.5 Large, I can run a quick test in the Amazon Bedrock Image playground.

The models I deployed are now available through the Amazon Bedrock InvokeModel API. When a model is deployed, I can use it with the AWS Command Line Interface (AWS CLI) and any AWS SDKs using the endpoint Amazon Resource Name (ARN) as model ID.

For chat-tuned text-to-text models, I can also use the Amazon Bedrock Converse API, which abstracts model differences and enables model switching with a single parameter change.

Things to know
Amazon Bedrock Marketplace is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), and South America (São Paulo).

With Amazon Bedrock Marketplace, you pay a software fee to the third-party model provider (which can be zero, as in the previous example) and a hosting fee based on the type and number of instances you choose for your model endpoints.

Start browsing the new models using the Model catalog in the Amazon Bedrock console, visit the Amazon Bedrock Marketplace documentation, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws.

— Danilo

Introducing Amazon Nova: Frontier intelligence and industry leading price performance

2024-12-03 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/

Today, we’re thrilled to announce Amazon Nova, a new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available exclusively in Amazon Bedrock.

You can use Amazon Nova to lower costs and latency for almost any generative AI task. You can build on Amazon Nova to analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents, from across a range of intelligence classes optimized for enterprise workloads.

Whether you’re developing document processing applications that need to process images and text, creating marketing content at scale, or building AI assistants that can understand and act on visual information, Amazon Nova provides the intelligence and flexibility you need with two categories of models: understanding and creative content generation.

Amazon Nova understanding models accept text, image, or video inputs to generate text output. Amazon creative content generation models accept text and image inputs to generate image or video output.

Understanding models: Text and visual intelligence
The Amazon Nova models include three understanding models (with a fourth one coming soon) designed to meet different needs:

Amazon Nova Micro – A text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat and brainstorming, and simple mathematical reasoning and coding. Amazon Nova Micro also supports customization on proprietary data using fine-tuning and model distillation to boost accuracy.

Amazon Nova Lite – A very low-cost multimodal model that is lightning fast for processing image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. The model processes inputs up to 300K tokens in length and can analyze multiple images or up to 30 minutes of video in a single request. Amazon Nova Lite also supports text and multimodal fine-tuning and can be optimized to deliver the best quality and costs for your use case with techniques such as model distillation.

Amazon Nova Pro – A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro is capable of processing up to 300K input tokens and sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. It achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and excels at analyzing financial documents. With an input context of 300K tokens, it can process code bases with over fifteen thousand lines of code. Amazon Nova Pro also serves as a teacher model to distill custom variants of Amazon Nova Micro and Lite.

Amazon Nova Premier – Our most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. We’re targeting availability in early 2025.

Amazon Nova understanding models excel in Retrieval-Augmented Generation (RAG), function calling, and agentic applications. This is reflected in Amazon Nova model scores in the Comprehensive RAG Benchmark (CRAG) evaluation, Berkeley Function Calling Leaderboard (BFCL), VisualWebBench, and Mind2Web.

What makes Amazon Nova particularly powerful for enterprises is its customization capabilities. Think of it as tailoring a suit: you start with a high-quality foundation and adjust it to fit your exact needs. You can fine-tune the models with text, image, and video to understand your industry’s terminology, align with your brand voice, and optimize for your specific use cases. For instance, a legal firm might customize Amazon Nova to better understand legal terminology and document structures.

You can see the latest benchmark scores for these models on the Amazon Nova product page.

Creative content generation: Bringing concepts to life
The Amazon Nova models also include two creative content generation models:

Amazon Nova Canvas – A state-of-the-art image generation model producing studio-quality images with precise control over style and content, including rich editing features such as inpainting, outpainting, and background removal. Amazon Nova Canvas excels on human evaluations and key benchmarks such as text-to-image faithfulness evaluation with question answering (TIFA) and ImageReward.

Amazon Nova Reel – A state-of-the-art video generation model. Using Amazon Nova Reel, you can produce short videos through text prompts and images, control visual style and pacing, and generate professional-quality video content for marketing, advertising, and entertainment. Amazon Nova Reel outperforms existing models on human evaluations of video quality and video consistency.

All Amazon Nova models include built-in safety controls and creative content generation models include watermarking capabilities to promote responsible AI use.

Let’s see how these models work in practice for a few use cases.

Using Amazon Nova Pro for document analysis
To demonstrate the capabilities of document analysis, I downloaded the Choosing a generative AI service decision guide in PDF format from the AWS documentation.

First, I choose Model access in the Amazon Bedrock console navigation pane and request access to the new Amazon Nova models. Then, I choose Chat/text in the Playground section of the navigation pane and select the Amazon Nova Pro model. In the chat, I upload the decision guide PDF and ask:

Write a summary of this doc in 100 words. Then, build a decision tree.

The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.

Using Amazon Nova Pro for video analysis
To demonstrate video analysis, I prepared a video by joining two short clips (more on this in the next section):

This time, I use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video:

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "the-sea.mp4"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
    video = f.read()

user_message = "Describe this video."

messages = [ { "role": "user", "content": [
    {"video": {"format": "mp4", "source": {"bytes": video}}},
    {"text": user_message}
] } ]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
    inferenceConfig={"temperature": 0.0}
 )

response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.

In the script, I ask to describe the video. I run the script from the command line. Here’s the result:

The video begins with a view of a rocky shore on the ocean, and then transitions to a close-up of a large seashell resting on a sandy beach.

I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.

Using Amazon Nova for video creation
Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image.

Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:

StartAsyncInvoke – To start an asynchronous invocation

GetAsyncInvoke – To get the current status of a specific asynchronous invocation

ListAsyncInvokes – To list the status of all asynchronous invocations with optional filters such as status or date

Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:

Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close.

After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.

import random
import time

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"

video_prompt = "Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close."

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {"text": video_prompt},
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)

if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

I run the script:

Status: InProgress
. . .
Status: Completed

Video is ready at s3://BUCKET/PREFIX/output.mp4

After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI):

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4

This is the resulting video. As requested, the camera zooms in on the subject.

Using Amazon Nova Reel with a reference image
To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:

This script uses the reference image and a text prompt with a camera action (drone view flying over a coastal landscape) to create a video:

import base64
import random
import time

import boto3

S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view flying over a coastal landscape"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
    input_image_bytes = f.read()
    input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")

model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {
        "text": video_prompt,
        "images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
        },
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"

print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)
if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Again, I download the output using the AWS CLI:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4

This is the resulting video. The camera starts from the reference image and moves forward.

Building AI responsibly
Amazon Nova models are built with a focus on customer safety, security, and trust throughout the model development stages, offering you peace of mind as well as an adequate level of control to enable your unique use cases.

We’ve built in comprehensive safety features and content moderation capabilities, giving you the controls you need to use AI responsibly. Every generated image and video include digital watermarking.

The Amazon Nova foundation models are built with protections that match its increased capabilities. Amazon Nova extends our safety measures to combat the spread of misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks.

Things to know
Amazon Nova models are available in Amazon Bedrock in the US East (N. Virginia) AWS region. Amazon Nova Micro, Lite, and Pro are also available in the US West (Oregon), and US East (Ohio) regions via cross-Region inference. As usual with Amazon Bedrock, the pricing follows a pay-as-you-go model. For more information, see Amazon Bedrock pricing.

The new generation of Amazon Nova understanding models speaks your language. These models understand and generate content in over 200 languages, with particularly strong capabilities in English, German, Spanish, French, Italian, Japanese, Korean, Arabic, Simplified Chinese, Russian, Hindi, Portuguese, Dutch, Turkish, and Hebrew. This means you can build truly global applications without worrying about language barriers or maintaining separate models for different regions. Amazon Nova models for creative content generation support English prompts.

As you explore Amazon Nova, you’ll discover its ability to handle increasingly complex tasks. You can use these models to process lengthy documents up to 300K tokens, analyze multiple images in a single request, understand up to 30 minutes of video content, and generate images and videos at scale from natural language. This makes these models suitable for a variety of business use cases, from quick customer service interactions to deep analysis of corporate documentation and asset creation for advertising, ecommerce, and social media applications.

Integration with Amazon Bedrock makes deployment and scaling straightforward. You can leverage features like Amazon Bedrock Knowledge Bases to enhance your model with proprietary information, use Amazon Bedrock Agents to automate complex workflows, and implement Amazon Bedrock Guardrails to promote responsible AI use. The platform supports real-time streaming for interactive applications, batch processing for high-volume workloads, and detailed monitoring to help you optimize performance.

Ready to start building with Amazon Nova? Give the new models a try in the Amazon Bedrock console today, visit the Amazon Nova models section of the Amazon Bedrock documentation, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new models!

— Danilo

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

2024-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/

Today, we’re announcing two new evaluation capabilities in Amazon Bedrock that can help you streamline testing and improve generative AI applications:

Amazon Bedrock Knowledge Bases now supports RAG evaluation (preview) – You can now run an automatic knowledge base evaluation to assess and optimize Retrieval Augmented Generation (RAG) applications using Amazon Bedrock Knowledge Bases. The evaluation process uses a large language model (LLM) to compute the metrics for the evaluation. With RAG evaluations, you can compare different configurations and tune your settings to get the results you need for your use case.

Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (preview) – You can now perform tests and evaluate other models with humanlike quality at a fraction of the cost and time of running human evaluations.

These new capabilities make it easier to go into production by providing fast, automated evaluation of AI-powered applications, shortening feedback loops and speeding up improvements. These evaluations assess multiple quality dimensions including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.

To make it easy and intuitive, the evaluation results provide natural language explanations for each score in the output and on console, and the scores are normalized from 0 to 1 for ease of interpretability. Rubrics are published in full with the judge prompts in the documentation so non-scientists can understand how scores are derived.

Let’s see how they work in practice.

Using RAG evaluations in Amazon Bedrock Knowledge Bases
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section. There, I see the new Knowledge Bases tab.

I choose Create, enter a name and a description for the evaluation, and select the Evaluator model that will compute the metrics. In this case, I use Anthropic’s Claude 3.5 Sonnet.

I select the knowledge base to evaluate. I previously created a knowledge base containing only the AWS Lambda Developer Guide PDF file. In this way, for the evaluation, I can ask questions about the AWS Lambda service.

I can evaluate either the retrieval function alone or the complete retrieve-and-generate workflow. This choice affects the metrics that are available in the next step. I choose to evaluate both retrieval and response generation and select the model to use. In this case, I use Anthropic’s Claude 3 Haiku. I can also use Amazon Bedrock Guardrails and adjust runtime inference settings by choosing the configurations link after the response generator model.

Now, I can choose which metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

Now, I select the dataset that will be used for evaluation. This is the JSONL file I prepared and uploaded to Amazon Simple Storage Service (Amazon S3) for this evaluation. Each line provides a conversation, and for each message there is a reference response.

{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"A trigger is a resource or configuration that invokes a Lambda function such as an AWS service."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda trigger?"}]}}]}
{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"An event is a JSON document defined by the AWS service or the application invoking a Lambda function that is provided in input to the Lambda function."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda event?"}]}}]}

I specify the S3 location in which to store the results of the evaluation. The evaluation job requires that the S3 bucket is configured with the cross-origin resource sharing (CORS) permissions described in the Amazon Bedrock User Guide.

For service access, I need to create or provide an AWS Identity and Access Management (IAM) service role that Amazon Bedrock can assume and that allows access to the Amazon Bedrock and Amazon S3 resources used by the evaluation.

After a few minutes, the evaluation has completed, and I browse the results. The actual duration of an evaluation depends on the size of the prompt dataset and on the generator and the evaluator models used.

At the top, the Metric summary evaluates the overall performance using the average score across all conversations.

After that, the Generation metrics breakdown gives me details about each of the selected evaluation metrics. My evaluation dataset was small (two lines), so there isn’t a large distribution to look at.

From here, I can also see example conversations and how they were rated. To view all conversations, I can visit the full output in the S3 bucket.

I’m curious why Helpfulness is slightly below one. I expand and zoom Example conversations for Helpfulness. There, I see the generated output, the ground truth that I provided with the evaluation dataset, and the score. I choose the score to see the model reasoning. According to the model, it would have helped to have more in-depth information. Models really are strict judges.

Comparing RAG evaluations
The result of a knowledge base evaluation can be difficult to interpret by itself. For this reason, the console allows comparing results from multiple evaluations to understand the differences. In this way, you can understand if you’re improving or not for the metrics you care about.

For example, I previously ran two other knowledge base evaluations. They’re related to knowledge bases with the same data sources but different chunking and parsing configurations and different embedding models.

I select the two evaluations and choose Compare. To be comparable in the console, the evaluations need to cover the same metrics.

In the At a glance tab, I see a visual comparison of the metrics using a spider chart. In this case, the results are not much different. The main difference is the Faithfulness score.

In the Evaluation details tab, I find a detailed comparison of the results for each metric, including the difference in scores.

Using LLM-as-a-judge in Amazon Bedrock Model Evaluation (preview)
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section of the navigation pane. After I choose Create, I select the new Automatic: Model as a judge option.

I enter a name and a description for the evaluation and select the Evaluator model that is used to generate evaluation metrics. I use Anthropic’s Claude 3.5 Sonnet.

Then, I select the Generator model, which is the model I want to evaluate. Model evaluation can help me understand if a smaller and more cost-effective model meets the needs of my use case. I use Anthropic’s Claude 3 Haiku.

In the next section I select the Metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

In the Datasets section I specify the Amazon S3 location where my evaluation dataset is stored and the folder in an S3 bucket where the results of the model evaluation job are stored.

For the evaluation dataset, I prepared another JSONL file. Each line provides a prompt and a reference answer. Note that the format is different compared to knowledge base evaluations.

{"prompt":"Write a 15 words summary of this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"AWS Fargate allows running containers without managing servers or clusters, simplifying container deployment and scaling."}
{"prompt":"Give me a list of the top 3 benefits from this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"- No need to manage servers or clusters.\n- Simplified infrastructure management.\n- Improved focus on application development."}

Finally, I can choose an IAM service role that gives Amazon Bedrock access to the resources used by this evaluation job.

I complete the creation of the evaluation. After a few minutes, the evaluation is complete. Similar to the knowledge base evaluation, the result starts with a Metrics Summary.

The Generation metrics breakdown details each metric, and I can look at details for a few sample prompts. I look at Helpfulness to better understand the evaluation score.

The prompts in the evaluation have been correctly processed by the model, and I can apply the results for my use case. If my application needs to manage prompts similar to the ones used in this evaluation, the evaluated model is a good choice.

Things to know
These new evaluation capabilities are available in preview in the following AWS Regions:

RAG evaluation in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris), and South America (São Paulo)
LLM-as-a-judge in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Seoul, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Zurich), and South America (São Paulo)

Note that the available evaluator models depend on the Region.

Pricing is based on the standard Amazon Bedrock pricing for model inference. There are no additional charges for evaluation jobs themselves. The evaluator models and models being evaluated are billed according to their normal on-demand or provisioned pricing. The judge prompt templates are part of the input tokens, and those judge prompts can be found in the AWS documentation for transparency.

The evaluation service is optimized for English language content at launch, though the underlying models can work with content in other languages they support.

To get started, visit the Amazon Bedrock console. To learn more, you can access the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

— Danilo

Introducing new PartyRock capabilities and free daily usage

2024-12-01 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-new-partyrock-capabilities-and-free-daily-usage/

PartyRock is an Amazon Bedrock playground that anyone can use to create generative AI-powered applications by simply describing the app you want to build without the need to write any code.

Since its launch in November 2023, over half a million apps have been built by users worldwide. These apps range from simple text generators to sophisticated productivity tools that combine multiple AI capabilities.

Throughout this year, we observed that as PartyRock users build skills and intuition by using the playground, they find interesting and useful ways to build apps for improving their daily lives. PartyRock apps increased their individual productivity, and they returned to PartyRock to use them regularly.

Today, we’re introducing improvements meeting the most requested customer needs:

Free daily usage – Previously, PartyRock offered a free trial period for a limited time. Starting in 2025, all users will have a recurring free daily usage granted, with no credit card required.

Search the app catalog – You can now explore hundreds of thousands of apps in the PartyRock catalog and find the right app for your use case by category or functionality. Relevant and popular apps are highlighted to showcase the creativity of the community. Results include app previews and last modified date to help you pick what’s best for you.

Do more with docs – You can upload and process multiple documents simultaneously, making it easier to build apps that handle batch processing, document comparison, or content aggregation.

Let’s see these some new features in action.

Searching and remixing a PartyRock app
I open PartyRock and sign in with my social credentials. In the Home section, I can use the search box to look for apps for a specific use case. I love traveling, and I’d like to improve the way I share my trips with family and friends. I enter travel and vlog in the search box. In the search results, I see an app that gets my attention.

I choose the Travel vlog script writer app and open it in a browser tab. The app generates a travel log script starting from a few inputs: the destination, the itinerary, and the tone.

I like to prepare some travel notes before a trip so that I know what the options are and what I want to visit. What if I can upload my notes and other documents to better personalize the vlog?

One of the key capabilities of PartyRock is that I can start with an existing app and “remix” it to tailor it to my needs. The resulting app can then be shared for others to use.

I choose Remix and then Edit to customize this app. I add a Document widget and edit it:

For Widget title, I use Notes.
For Instruction, I enter Upload your notes and documents with travel tips.

I save the new widget and move just after the other input fields.

To use these images in the app, I edit the Your Vlog script widget. I want the script to include the content of those images. In the prompt generating the script, I add a sentence to analyze and consider the image of the destination:

Get inspiration from what you see in @Notes.

I also update the Vlog cover widget prompt to consider the whole script when generating the cover image:

A portrait of a trip to @Destination considering the @Your Vlog script.

I save and leave edit. The remixed app is now ready to be tested.

Using the remixed PartyRock app
Let’s try the customized version of the app. I enter:

Rome, Italy as Destination
A walk in the old city center as Itinerary
Peaceful and relaxing as Tone.

Then, I upload my travel notes.

I choose the Play button to start the app. The app takes a few seconds to generate its output.

I like the result. The script is quite detailed, and the image cover a nice addition. I can further extend the app to use the image cover in a social media post generator for posting about the vlog to different platforms with different tones and styles. The possibilities are endless!

Things to know
PartyRock with these new capabilities is available at https://partyrock.aws.

No credit card or AWS account is required to use PartyRock, and you can explore hundreds of thousands of published apps even without signing in.

With PartyRock, everyone can become a builder. Apps can be generated from a textual description and then customized and extended with additional capabilities using the visual editor. All apps are automatically optimized for mobile devices and can be shared with others. To make it easier for others to view and use your apps, you can create your personalized playlist page.

For examples of how PartyRock can help you be more productive, refer to How 3 small businesses use PartyRock to help customers. And don’t forget to share your best apps with me!

— Danilo

Amazon FSx for Lustre increases throughput to GPU instances by up to 12x

2024-11-28 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-fsx-for-lustre-unlocks-full-network-bandwidth-and-gpu-performance/

Today, we are announcing support for Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect Storage (GDS) on Amazon FSx for Lustre. EFA is a network interface for Amazon EC2 instances that makes it possible to run applications requiring high levels of inter-node communications at scale. GDS is a technology that creates a direct data path between local or remote storage and GPU memory. With these enhancements, Amazon FSx for Lustre with EFA/GDS support provides up to 12 times higher (up to 1200 Gbps) per-client throughput compared to the previous FSx for Lustre version.

You can use FSx for Lustre to build and run the most performance demanding applications, such as deep learning training, drug discovery, financial modeling, and autonomous vehicle development. As datasets grow and new technologies emerge, you can adopt increasingly powerful GPU and HPC instances such as Amazon EC2 P5, Trn1, and Hpc7a. Until now, when accessing FSx for Lustre file systems, the use of traditional TCP networking limited throughput to 100 Gbps for individual client instances. This adoption is driving the need for FSx for Lustre file systems to provide the performance necessary to optimally utilize the increasing network bandwidth of these cutting-edge EC2 instances when accessing large datasets.

With EFA and GDS support in FSx for Lustre, you can now achieve up to 1,200 Gbps throughput per client instance (twelve times more throughput than previously) when using P5 GPU instances and NVIDIA CUDA in your applications.

With this new capability, you can fully utilize the network bandwidth of the most powerful compute instances and accelerate your machine learning (ML) and HPC workloads. EFA enhances performance by bypassing the operating system and using the AWS Scalable Reliable Datagram (SRD) protocol to optimize data transfer. GDS further improves performance by enabling direct data transfer between the file system and GPU memory, bypassing the CPU and eliminating redundant memory copies.

Let’s see how this works in practice.

Creating an Amazon FSx for Lustre file system with EFA enabled
To get started, in the Amazon FSx console, I choose Create file system and then Amazon FSx for Lustre.

I enter a name for the file system. In the Deployment and storage type section, I select Persistent, SSD and the new with EFA enabled option. I select 1000 MB/s/TiB in the Throughput per unit of storage section. With these settings, I enter 4.8 TiB for Storage capacity, which is the minimum supported with these settings.

For networking, I use the default virtual private cloud (VPC) and an EFA-enabled security group. I leave all other options to their default values.

I review all the options and proceed to create the file system. After a few minutes, the file system is ready to be used.

Mounting an Amazon FSx for Lustre file system with EFA enabled from an Amazon EC2 instance
In the Amazon EC2 console, I choose Launch instance, enter a name for the instance, and select the Ubuntu Amazon Machine Image (AMI). For Instance type, I select trn1.32xlarge.

In Network settings, I edit the default settings and select the same subnet used by the FSx Lustre file system. In Firewall (security groups), I select three existing security groups: the EFA-enabled security group used by the FSx for Lustre file system, the default security group, and a security group that provides Secure Shell (SSH) access.

In Advanced network configuration, I select ENA and EFA as Interface type. Without this setting, the instance would use traditional TCP networking and the connection with the FSx for Lustre file system would still be limited to 100 Gbps in throughput.

To have more throughput, I can add more EFA network interfaces, depending on the instance type.

I launch the instance and, when the instance is ready, I connect using EC2 Instance Connect and follow the instructions for installing the Lustre client in the FSx for Lustre User Guide and configuring EFA clients.

Then, I follow the instructions for mounting an FSx for Lustre file system from an EC2 instance.

I create a folder to use as mount point:

sudo mkdir -p /fsx

I select the file system in the FSx console and lookup the DNS name and Mount name. Using these values, I mount the file system:

sudo mount -t lustre -o relatime,flock file_system_dns_name@tcp:/mountname /fsx

EFA is automatically used when you access an EFA-enabled file system from client instances that support EFA and are using Lustre version 2.15 or higher.

Things to know
EFA and GDS support is available today with no additional cost on new Amazon FSx for Lustre file systems in all AWS Regions where persistent 2 is offered. FSx for Lustre automatically uses EFA when customers access an EFA-enabled file system from client instances that support EFA, without requiring any additional configuration. For a list of EC2 client instances that support EFA, see supported instance types in the Amazon EC2 User Guide. This network specifications table describes network bandwidths and EFA support for instance types in the accelerated computing category.

To use EFA-enabled instances with FSx for Lustre file systems, you must use Lustre 2.15 clients on Ubuntu 22.04 with kernel 6.8 or higher.

Note that your client instances and your file systems must be located in the same subnet within your Amazon Virtual Private Cloud (Amazon VPC) connection.

GDS is automatically supported on EFA-enabled file systems. To use GDS with your FSx for Lustre file systems, you need the NVIDIA Compute Unified Device Architecture (CUDA) package, the open source NVIDIA driver, and the NVIDIA GPUDirect Storage Driver installed on your client instance. These packages come preinstalled on the AWS Deep Learning AMI. You can then use your CUDA-enabled application to use GPUDirect storage for data transfer between your file system and GPUs.

When planning your deployment, note that EFA-enabled file systems have larger minimum storage capacity increments than file systems that are not EFA-enabled. For instance, if you choose the 1,000 MB/s/TiB throughput tier, the minimum storage capacity for EFA-enabled file systems starts at 4.8 TiB as compared to 1.2TB for FSx for Lustre file systems not enabling EFA. If you’re looking to migrate your existing workloads, you can use AWS DataSync to move your data from an existing file system to a new one that supports EFA and GDS.

For maximum flexibility, FSx for Lustre maintains compatibility with both EFA and non-EFA workloads. When accessing an EFA-enabled file system, traffic from non-EFA client instances automatically flows over traditional TCP/IP networking using Elastic Network Adapter (ENA), allowing seamless access for all workloads without any additional configuration.

To learn more about EFA and GDS support on FSx for Lustre, including detailed setup instructions and best practices, visit the Amazon FSx for Lustre documentation. Get started today and experience the fastest storage performance available for your GPU instances in the cloud.

— Danilo

Update 11/27: post updated to reflect 12x throughput

Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon Bedrock

2024-10-22 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/upgraded-claude-3-5-sonnet-from-anthropic-available-now-computer-use-public-beta-and-claude-3-5-haiku-coming-soon-in-amazon-bedrock/

Four months ago, we introduced Anthropic’s Claude 3.5 in Amazon Bedrock, raising the industry bar for AI model intelligence while maintaining the speed and cost of Claude 3 Sonnet.

Today, I am excited to announce three new capabilities for the Claude 3.5 model family in Amazon Bedrock:

Upgraded Claude 3.5 Sonnet – You now have access to an upgraded Claude 3.5 Sonnet model that builds upon its predecessor’s strengths, offering even more intelligence at the same cost. Claude 3.5 Sonnet continues to improve its capability to solve real-world software engineering tasks and follow complex, agentic workflows. The upgraded Claude 3.5 Sonnet helps across the entire software development lifecycle, from initial design to bug fixes, maintenance, and optimizations. With these capabilities, the upgraded Claude 3.5 Sonnet model can help build more advanced chatbots with a warm, human-like tone. Other use cases in which the upgraded model excels include knowledge Q&A platforms, data extraction from visuals like charts and diagrams, and automation of repetitive tasks and operations.

Computer use – Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces. Developers can direct Claude to use computers the way people do: by looking at a screen, moving a cursor, clicking buttons, and typing text. This works by giving the model access to integrated tools that can return computer actions, like keystrokes and mouse clicks, editing text files, and running shell commands. Software developers can integrate computer use in their solutions by building an action-execution layer and grant screen access to Claude 3.5 Sonnet. In this way, software developers can build applications with the ability to perform computer actions, follow multiple steps, and check their results. Computer use opens new possibilities for AI-powered applications. For example, it can help automate software testing and back office tasks and implement more advanced software assistants that can interact with applications. Given this technology is early, developers are encouraged to explore lower-risk tasks and use it in a sandbox environment.

Claude 3.5 Haiku – The new Claude 3.5 Haiku is coming soon and combines rapid response times with improved reasoning capabilities, making it ideal for tasks that require both speed and intelligence. Claude 3.5 Haiku improves on its predecessor and matches the performance of Claude 3 Opus (previously Claude’s largest model) at the speed and cost of Claude 3 Haiku. Claude 3.5 Haiku can help with use cases such as fast and accurate code suggestions, highly interactive chatbots that need rapid response times for customer service, e-commerce solutions, and educational platforms. For customers dealing with large volumes of unstructured data in finance, healthcare, research, and more, Claude 3.5 Haiku can help efficiently process and categorize information.

According to Anthropic, the upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with significant gains in coding, an area where it already excelled. The upgraded Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks. On coding, it improves performance on SWE-bench Verified from 33% to 49%, scoring higher than all publicly available models. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the airline domain. The following table includes the model evaluations provided by Anthropic.

Computer use, a new frontier in AI interaction
Instead of restricting the model to use APIs, Claude has been trained on general computer skills, allowing it to use a wide range of standard tools and software programs. In this way, applications can use Claude to perceive and interact with computer interfaces. Software developers can integrate this API to enable Claude to translate prompts (for example, “find me a hotel in Rome”) into specific computer commands (open a browser, navigate this website, and so on).

More specifically, when invoking the model, software developers now have access to three new integrated tools that provide a virtual set of hands to operate a computer:

Computer tool – This tool can receive as input a screenshot and a goal and returns a description of the mouse and keyboard actions that should be performed to achieve that goal. For example, this tool can ask to move the cursor to a specific position, click, type, and take screenshots.
Text editor tool – Using this tool, the model can ask to perform operations like viewing file contents, creating new files, replacing text, and undoing edits.
Bash tool – This tool returns commands that can be run on a computer system to interact at a lower level as a user typing in a terminal.

These tools open up a world of possibilities for automating complex tasks, from data analysis and software testing to content creation and system administration. Imagine an application powered by Claude 3.5 Sonnet interacting with the computer just as a human would, navigating through multiple desktop tools including terminals, text editors, internet browsers, and also capable of filling out forms and even debugging code.

We’re excited to help software developers explore these new capabilities with Amazon Bedrock. We expect this capability to improve rapidly in the coming months, and Claude’s current ability to use computers has limits. Some actions such as scrolling, dragging, or zooming can present challenges for Claude, and we encourage you to start exploring low-risk tasks.

When looking at OSWorld, a benchmark for multimodal agents in real computer environments, the upgraded Claude 3.5 Sonnet currently gets 14.9%. While human-level skill is far ahead with about 70-75%, this result is much better than the 7.7% obtained by the next-best model in the same category.

Using the upgraded Claude 3.5 Sonnet in the Amazon Bedrock console
To get started with the upgraded Claude 3.5 Sonnet, I navigate to the Amazon Bedrock console and choose Model access in the navigation pane. There, I request access for the new Claude 3.5 Sonnet V2 model.

To test the new vision capability, I open another browser tab and download from the Our World in Data website the Wind power generation chart in PNG format.

Back in the Amazon Bedrock console, I choose Chat/text under Playgrounds in the navigation pane. For the model, I select Anthropic as the model provider and then Claude 3.5 Sonnet V2.

I use the three vertical dots in the input section of the chat to upload the image file from my computer. Then I enter this prompt:

Which are the top countries for wind power generation? Answer only in JSON.

The result follows my instructions and returns the list extracting the information from the image.

Using the upgraded Claude 3.5 Sonnet with AWS CLI and SDKs
Here’s a sample AWS Command Line Interface (AWS CLI) command using the Amazon Bedrock Converse API. I use the --query parameter of the CLI to filter the result and only show the text content of the output message:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
    --messages '[{ "role": "user", "content": [ { "text": "What do you throw out when you want to use it, but take in when you do not want to use it?" } ] }]' \
    --query 'output.message.content[*].text' \
    --output text

In output, I get this text in the response.

An anchor! You throw an anchor out when you want to use it to stop a boat, but you take it in (pull it up) when you don't want to use it and want to move the boat.

The AWS SDKs implement a similar interface. For example, you can use the AWS SDK for Python (Boto3) to analyze the same image as in the console example:

import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
IMAGE_NAME = "wind-generation.png"

bedrock_runtime = boto3.client("bedrock-runtime")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which are the top countries for wind power generation? Answer only in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Integrating computer use with your application
Let’s see how computer use works in practice. First, I take a snapshot of the desktop of a Ubuntu system:

This screenshot is the starting point for the steps that will be implemented by computer use. To see how that works, I run a Python script passing in input to the model the screenshot image and this prompt:

Find me a hotel in Rome.

This script invokes the upgraded Claude 3.5 Sonnet in Amazon Bedrock using the new syntax required for computer use:

import base64
import json
import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"

IMAGE_NAME = "ubuntu-screenshot.png"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

image_base64 = base64.b64encode(image).decode("utf-8")

prompt = "Find me a hotel in Rome."

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_base64,
                    },
                },
            ],
        }
    ],
    "tools": [
        { # new
            "type": "computer_20241022", # literal / constant
            "name": "computer", # literal / constant
            "display_height_px": 1280, # min=1, no max
            "display_width_px": 800, # min=1, no max
            "display_number": 0 # min=0, max=N, default=None
        },
        { # new
            "type": "bash_20241022", # literal / constant
            "name": "bash", # literal / constant
        },
        { # new
            "type": "text_editor_20241022", # literal / constant
            "name": "str_replace_editor", # literal / constant
        }
    ],
    "anthropic_beta": ["computer-use-2024-10-22"],
}

# Convert the native request to JSON.
request = json.dumps(body)

try:
    # Invoke the model with the request.
    response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=request)

except Exception as e:
    print(f"ERROR: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())
print(model_response)

The body of the request includes new options:

anthropic_beta with value ["computer-use-2024-10-22"] to enable computer use.
The tools section supports a new type option (set to custom for the tools you configure).
Note that the computer tool needs to know the resolution of the screen (display_height_px and display_width_px).

To follow my instructions with computer use, the model provides actions that operate on the desktop described by the input screenshot.

The response from the model includes a tool_use section from the computer tool that provides the first step. The model has found in the screenshot the Firefox browser icon and the position of the mouse arrow. Because of that, it now asks to move the mouse to specific coordinates to start the browser.

{
    "id": "msg_bdrk_01WjPCKnd2LCvVeiV6wJ4mm3",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-sonnet-20241022",
    "content": [
        {
            "type": "text",
            "text": "I'll help you search for a hotel in Rome. I see Firefox browser on the desktop, so I'll use that to access a travel website.",
        },
        {
            "type": "tool_use",
            "id": "toolu_bdrk_01CgfQ2bmQsPFMaqxXtYuyiJ",
            "name": "computer",
            "input": {"action": "mouse_move", "coordinate": [35, 65]},
        },
    ],
    "stop_reason": "tool_use",
    "stop_sequence": None,
    "usage": {"input_tokens": 3443, "output_tokens": 106},
}

This is just the first step. As with usual tool use requests, the script should reply with the result of using the tool (moving the mouse in this case). Based on the initial request to book a hotel, there would be a loop of tool use interactions that will ask to click on the icon, type a URL in the browser, and so on until the hotel has been booked.

A more complete example is available in this repository shared by Anthropic.

Things to know
The upgraded Claude 3.5 Sonnet is available today in Amazon Bedrock in the US West (Oregon) AWS Region and is offered at the same cost as the original Claude 3.5 Sonnet. For up-to-date information on regional availability, refer to the Amazon Bedrock documentation. For detailed cost information for each Claude model, visit the Amazon Bedrock pricing page.

In addition to the greater intelligence of the upgraded model, software developers can now integrate computer use (available in public beta) in their applications to automate complex desktop workflows, enhance software testing processes, and create more sophisticated AI-powered applications.

Claude 3.5 Haiku will be released in the coming weeks, initially as a text-only model and later with image input.

You can see how computer use can help with coding in this video with Alex Albert, Head of Developer Relations at Anthropic.

This other video describes computer use for automating operations.

To learn more about these new features, visit the Claude models section of the Amazon Bedrock documentation. Give the upgraded Claude 3.5 Sonnet a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

– Danilo

Introducing Llama 3.2 models from Meta in Amazon Bedrock: A new generation of multimodal vision and lightweight models

2024-09-25 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-llama-3-2-models-from-meta-in-amazon-bedrock-a-new-generation-of-multimodal-vision-and-lightweight-models/

In July, we announced the availability of Llama 3.1 models in Amazon Bedrock. Generative AI technology is improving at incredible speed and today, we are excited to introduce the new Llama 3.2 models from Meta in Amazon Bedrock.

Llama 3.2 offers multimodal vision and lightweight models representing Meta’s latest advancement in large language models (LLMs) and providing enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, these new models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features that help you build a new generation of AI experiences.

These models are designed to inspire builders with image reasoning and are more accessible for edge applications, unlocking more possibilities with AI.

The Llama 3.2 collection of models are offered in various sizes, from lightweight text-only 1B and 3B parameter models suitable for edge devices to small and medium-sized 11B and 90B parameter models capable of sophisticated reasoning tasks including multimodal support for high resolution images. Llama 3.2 11B and 90B are the first Llama models to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. The new models are designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications.

All Llama 3.2 models support a 128K context length, maintaining the expanded token capacity introduced in Llama 3.1. Additionally, the models offer improved multilingual support for eight languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

In addition to the existing text capable Llama 3.1 8B, 70B, and 405B models, Llama 3.2 supports multimodal use cases. You can now use four new Llama 3.2 models — 90B, 11B, 3B, and 1B — from Meta in Amazon Bedrock to build, experiment, and scale your creative ideas:

Llama 3.2 90B Vision (text + image input) – Meta’s most advanced model, ideal for enterprise-level applications. This model excels at general knowledge, long-form text generation, multilingual translation, coding, math, and advanced reasoning. It also introduces image reasoning capabilities, allowing for image understanding and visual reasoning tasks. This model is ideal for the following use cases: image captioning, image-text retrieval, visual grounding, visual question answering and visual reasoning, and document visual question answering.

Llama 3.2 11B Vision (text + image input) – Well-suited for content creation, conversational AI, language understanding, and enterprise applications requiring visual reasoning. The model demonstrates strong performance in text summarization, sentiment analysis, code generation, and following instructions, with the added ability to reason about images. This model use cases are similar to the 90B version: image captioning, image-text-retrieval, visual grounding, visual question answering and visual reasoning, and document visual question answering.

Llama 3.2 3B (text input) – Designed for applications requiring low-latency inferencing and limited computational resources. It excels at text summarization, classification, and language translation tasks. This model is ideal for the following use cases: mobile AI-powered writing assistants and customer service applications.

Llama 3.2 1B (text input) – The most lightweight model in the Llama 3.2 collection of models, perfect for retrieval and summarization for edge devices and mobile applications. This model is ideal for the following use cases: personal information management and multilingual knowledge retrieval.

In addition, Llama 3.2 is built on top of the Llama Stack, a standardized interface for building canonical toolchain components and agentic applications, making building and deploying easier than ever. Llama Stack API adapters and distributions are designed to most effectively leverage the Llama model capabilities and it gives customers the ability to benchmark Llama models across different vendors.

Meta has tested Llama 3.2 on over 150 benchmark datasets spanning multiple languages and conducted extensive human evaluations, demonstrating competitive performance with other leading foundation models. Let’s see how these models work in practice.

Using Llama 3.2 models in Amazon Bedrock
To get started with Llama 3.2 models, I navigate to the Amazon Bedrock console and choose Model access on the navigation pane. There, I request access for the new Llama 3.2 models: Llama 3.2 1B, 3B, 11B Vision, and 90B Vision.

To test the new vision capability, I open another browser tab and download from the Our World in Data website the Share of electricity generated by renewables chart in PNG format. The chart is very high resolution and I resize it to be 1024 pixel wide.

Back in the Amazon Bedrock console, I choose Chat under Playgrounds in the navigation pane, select Meta as the category, and choose the Llama 3.2 90B Vision model.

I use Choose files to select the resized chart image and use this prompt:

Based on this chart, which countries in Europe have the highest share?

I choose Run and the model analyzes the image and returns its results:

I can also access the models programmatically using the AWS Command Line Interface (AWS CLI) and AWS SDKs. Compared to using the Llama 3.1 models, I only need to update the model IDs as described in the documentation. I can also use the new cross-region inference endpoint for the US and the EU Regions. These endpoints work for any Region within the US and the EU respectively. For example, the cross-region inference endpoints for the Llama 3.2 90B Vision model are:

us.meta.llama3-2-90b-instruct-v1:0
eu.meta.llama3-2-90b-instruct-v1:0

Here’s a sample AWS CLI command using the Amazon Bedrock Converse API. I use the --query parameter of the CLI to filter the result and only show the text content of the output message:

aws bedrock-runtime converse --messages '[{ "role": "user", "content": [ { "text": "Tell me the three largest cities in Italy." } ] }]' --model-id us.meta.llama3-2-90b-instruct-v1:0 --query 'output.message.content[*].text' --output text

In output, I get the response message from the "assistant".

The three largest cities in Italy are:

1. Rome (Roma) - population: approximately 2.8 million
2. Milan (Milano) - population: approximately 1.4 million
3. Naples (Napoli) - population: approximately 970,000

It’s not much different if you use one of the AWS SDKs. For example, here’s how you can use Python with the AWS SDK for Python (Boto3) to analyze the same image as in the console example:

import boto3

MODEL_ID = "us.meta.llama3-2-90b-instruct-v1:0"
# MODEL_ID = "eu.meta.llama3-2-90b-instruct-v1:0"

IMAGE_NAME = "share-electricity-renewable-small.png"

bedrock_runtime = boto3.client("bedrock-runtime")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Based on this chart, which countries in Europe have the highest share?"

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Llama 3.2 models are also available in Amazon SageMaker JumpStart, a machine learning (ML) hub that makes it easy to deploy pre-trained models using the console or programmatically through the SageMaker Python SDK. From SageMaker JumpStart, you can also access and deploy new safeguard models that can help classify the safety level of model inputs (prompts) and outputs (responses), including Llama Guard 3 11B Vision, which are designed to support responsible innovation and system-level safety.

In addition, you can easily fine-tune Llama 3.2 1B and 3B models with SageMaker JumpStart today. Fine-tuned models can then be imported as custom models into Amazon Bedrock. Fine-tuning for the full collection of Llama 3.2 models in Amazon Bedrock and Amazon SageMaker JumpStart is coming soon.

The publicly available weights of Llama 3.2 models make it easier to deliver tailored solutions for custom needs. For example, you can fine-tune a Llama 3.2 model for a specific use case and bring it into Amazon Bedrock as a custom model, potentially outperforming other models in domain-specific tasks. Whether you’re fine-tuning for enhanced performance in areas like content creation, language understanding, or visual reasoning, Llama 3.2’s availability in Amazon Bedrock and SageMaker empowers you to create unique, high-performing AI capabilities that can set your solutions apart.

More on Llama 3.2 model architecture
Llama 3.2 builds upon the success of its predecessors with an advanced architecture designed for optimal performance and versatility:

Auto-regressive language model – At its core, Llama 3.2 uses an optimized transformer architecture, allowing it to generate text by predicting the next token based on the previous context.

Fine-tuning techniques – The instruction-tuned versions of Llama 3.2 employ two key techniques:

Supervised fine-tuning (SFT) – This process adapts the model to follow specific instructions and generate more relevant responses.
Reinforcement learning with human feedback (RLHF) – This advanced technique aligns the model’s outputs with human preferences, enhancing helpfulness and safety.

Multimodal capabilities – For the 11B and 90B Vision models, Llama 3.2 introduces a novel approach to image understanding:

Separately trained image reasoning adaptor weights are integrated with the core LLM weights.
These adaptors are connected to the main model through cross-attention mechanisms. Cross-attention allows one section of the model to focus on relevant parts of another component’s output, enabling information flow between different sections of the model.
When an image is input, the model treats the image reasoning process as a “tool use” operation, allowing for sophisticated visual analysis alongside text processing. In this context, tool use is the generic term used when a model uses external resources or functions to augment its capabilities and complete tasks more effectively.

Optimized inference – All models support grouped-query attention (GQA), which enhances inference speed and efficiency, particularly beneficial for the larger 90B model.

This architecture enables Llama 3.2 to handle a wide range of tasks, from text generation and understanding to complex reasoning and image analysis, all while maintaining high performance and adaptability across different model sizes.

Things to know
Llama 3.2 models from Meta are now generally available in Amazon Bedrock in the following AWS Regions:

Llama 3.2 1B and 3B models are available in the US West (Oregon) and Europe (Frankfurt) Regions, and are available in the US East (Ohio, N. Virginia) and Europe (Ireland, Paris) Regions via cross-region inference.
Llama 3.2 11B Vision and 90B Vision models are available in the US West (Oregon) Region, and are available in the US East (Ohio, N. Virginia) Regions via cross-region inference.

Check the full AWS Region list for future updates. To estimate your costs, visit the Amazon Bedrock pricing page.

To learn more about Llama 3.2 features and capabilities, visit the Llama models section of the Amazon Bedrock documentation. Give Llama 3.2 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock.

You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with Llama 3.2 in Amazon Bedrock!

— Danilo

AWS Weekly Roundup: Amazon DynamoDB, AWS AppSync, Storage Browser for Amazon S3, and more (September 9, 2024)

2024-09-09 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-dynamodb-aws-appsync-storage-browser-for-amazon-s3-and-more-september-9-2024/

Last week, the latest AWS Heroes arrived! AWS Heroes are amazing technical experts who generously share their insights, best practices, and innovative solutions to help others.

The AWS GenAI Lofts are in full swing with San Francisco and São Paulo open now, and London, Paris, and Seoul coming in the next couple of months. Here’s an insider view from a workshop in San Francisco last week.

Last week’s launches
Here are the launches that got my attention.

Storage Browser for Amazon S3 (alpha release) – An open source Amplify UI React component that you can add to your web applications to provide your end users with a simple interface for data stored in S3. The component uses the new ListCallerAccessGrants API to list all S3 buckets, prefixes, and objects they can access, as defined by their S3 Access Grants.

AWS Network Load Balancer – Now supports a configurable TCP idle timeout. For more information, see this Networking & Content Devliery Blog post.

AWS Gateway Load Balancer – Also supports a configurable TCP idle timeout. More info is available in this blog post.

Amazon ECS – Now supports AWS Graviton-based Spot compute with AWS Fargate. This allows to run fault-tolerant Arm-based applications with up to 70% lower costs compared to on-demand.

Zone Groups for Availability Zones in AWS Regions – We are working on extending the Zone Group construct to Availability Zones (AZs) with a consistent naming format across all AWS Regions.

Amazon Managed Service for Apache Flink – Now supports Apache Flink 1.20. You can upgrade to benefit from bug fixes, performance improvements, and new functionality added by the Flink community.

AWS Glue – Now provides job queuing. If quotas or limits are insufficient to start a Glue job, AWS Glue will now automatically queue the job and wait for limits to free up.

Amazon DynamoDB – Now supports Attribute-Based Access Control (ABAC) for tables and indexes (limited preview). ABAC is an authorization strategy that defines access permissions based on tags attached to users, roles, and AWS resources. Read more in this Database Blog post.

Amazon Bedrock – Stability AI’s top text-to-image models (Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core) are now available to generate high-quality visuals with speed and precision.

Amazon Bedrock Agents – Now supports Anthropic Claude 3.5 Sonnet, including Anthropic recommended tool use for function calling which can improve developer and end user experience.

Amazon Sagemaker Studio – You can now use Amazon EMR Serverless directly from your Studio Notebooks to interactively query, explore and visualize data, and run Apache Spark jobs.

Amazon SageMaker – Introducing sagemaker-core, a new Python SDK that provides an object-oriented interface for interacting with SageMaker resources such as TrainingJob, Model, and Endpoint resource classes.

AWS AppSync – Improves monitoring by including DEBUG and INFO logging levels for its GraphQL APIs. You now have more granular control over log verbosity to make it easier to troubleshoot your APIs while optimizing readability and costs.

Amazon WorkSpaces Pools – You can now bring your Windows 10 or 11 licenses and provide a consistent desktop experience when switching between on-premise and virtual desktops.

Amazon SES – A new enhanced onboarding experience to help discover and activate key SES features, including recommendations for optimal setup and the option to enable the Virtual Deliverability Manager to enhance email deliverability.

Amazon Redshift – Now the Amazon Redshift Data API support session reuse to retain the context of a session from one query execution to another, reducing connection setup latency on repeated queries to the same data warehouse.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

Amazon Q Developer Code Challenge – At the 2024 AWS Summit in Sydney, we put two teams (one using Amazon Q Developer, one not) in a battle of coding prowess, starting with basic math and string manipulation, up to including complex algorithms and intricate ciphers. Here are the results.

AWS named as a Leader in the first Gartner Magic Quadrant for AI Code Assistants – It’s great to see how new technologies make the whole software development lifecycle easier and increase developer productivity.

Build powerful RAG pipelines with LlamaIndex and Amazon Bedrock – A deep dive tutorial that covers simple and advanced use cases.

Evaluating prompts at scale with Prompt Management and Prompt Flows for Amazon Bedrock – To implement an automated prompt evaluation system to streamline prompt development and improve the overall quality of AI-generated content.

Amazon Redshift data ingestion options – An overview of the available ingestion methods and how they work for different use cases.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. AWS Summits for this year are coming to an end. There are two more left that you can still register: Toronto (September 11), and Ottawa (October 9).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in the SF Bay Area (September 13), where our own Antje Barth is a keynote speaker, Argentina (September 14), Armenia (September 14), and DACH (in Munich on September 17).

AWS GenAI Lofts – Collaborative spaces and immersive experiences that showcase AWS’s cloud and AI expertise, while providing startups and developers with hands-on access to AI products and services, exclusive sessions with industry leaders, and valuable networking opportunities with investors and peers. Find a GenAI Loft location near you and don’t forget to register.

Browse all upcoming AWS-led in-person and virtual events here.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Stability AI’s best image generating models now in Amazon Bedrock

2024-09-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/stability-ais-best-image-generating-models-now-in-amazon-bedrock/

Starting today, you can use three new text-to-image models from Stability AI in Amazon Bedrock: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These models greatly improve performance in multi-subject prompts, image quality, and typography and can be used to rapidly generate high-quality visuals for a wide range of use cases across marketing, advertising, media, entertainment, retail, and more.

These models excel in producing images with stunning photorealism, boasting exceptional detail, color, and lighting, addressing common challenges like rendering realistic hands and faces. The models’ advanced prompt understanding allows it to interpret complex instructions involving spatial reasoning, composition, and style.

The three new Stability AI models available in Amazon Bedrock cover different use cases:

Stable Image Ultra – Produces the highest quality, photorealistic outputs perfect for professional print media and large format applications. Stable Image Ultra excels at rendering exceptional detail and realism.

Stable Diffusion 3 Large – Strikes a balance between generation speed and output quality. Ideal for creating high-volume, high-quality digital assets like websites, newsletters, and marketing materials.

Stable Image Core – Optimized for fast and affordable image generation, great for rapidly iterating on concepts during ideation.

This table summarizes the model’s key features:

Features	Stable Image Ultra	Stable Diffusion 3 Large	Stable Image Core
Parameters	16 billion	8 billion	2.6 billion
Input	Text	Text or image	Text
Typography	Tailored for large-scale display	Tailored for large-scale display	Versatility and readability across different sizes and applications
Visual aesthetics	Photorealistic image output	Highly realistic with finer attention to detail	Good rendering; not as detail-oriented

One of the key improvements of Stable Image Ultra and Stable Diffusion 3 Large compared to Stable Diffusion XL (SDXL) is text quality in generated images, with fewer errors in spelling and typography thanks to its innovative Diffusion Transformer architecture, which implements two separate sets of weights for image and text but enables information flow between the two modalities.

Here are a few images created with these models.

Stable Image Ultra – Prompt: photo, realistic, a woman sitting in a field watching a kite fly in the sky, stormy sky, highly detailed, concept art, intricate, professional composition.

Stable Diffusion 3 Large – Prompt: comic-style illustration, male detective standing under a streetlamp, noir city, wearing a trench coat, fedora, dark and rainy, neon signs, reflections on wet pavement, detailed, moody lighting.

Stable Image Core – Prompt: professional 3d render of a white and orange sneaker, floating in center, hovering, floating, high quality, photorealistic.

Use cases for the new Stability AI models in Amazon Bedrock
Text-to-image models offer transformative potential for businesses across various industries and can significantly streamline creative workflows in marketing and advertising departments, enabling rapid generation of high-quality visuals for campaigns, social media content, and product mockups. By expediting the creative process, companies can respond more quickly to market trends and reduce time-to-market for new initiatives. Additionally, these models can enhance brainstorming sessions, providing instant visual representations of concepts that can spark further innovation.

For e-commerce businesses, AI-generated images can help create diverse product showcases and personalized marketing materials at scale. In the realm of user experience and interface design, these tools can quickly produce wireframes and prototypes, accelerating the design iteration process. The adoption of text-to-image models can lead to significant cost savings, increased productivity, and a competitive edge in visual communication across various business functions.

Here are some example use cases across different industries:

Advertising and Marketing

Stable Image Ultra for luxury brand advertising and photorealistic product showcases
Stable Diffusion 3 Large for high-quality product marketing images and print campaigns
Use Stable Image Core for rapid A/B testing of visual concepts for social media ads

E-commerce

Stable Image Ultra for high-end product customization and made-to-order items
Stable Diffusion 3 Large for most product visuals across an e-commerce site
Stable Image Core to quickly generate product images and keep listings up-to-date

Media and Entertainment

Stable Image Ultra for ultra-realistic key art, marketing materials, and game visuals
Stable Diffusion 3 Large for environment textures, character art, and in-game assets
Stable Image Core for rapid prototyping and concept art exploration

Now, let’s see these new models in action, first using the AWS Management Console, then with the AWS Command Line Interface (AWS CLI) and AWS SDKs.

Using the new Stability AI models in the Amazon Bedrock console
In the Amazon Bedrock console, I choose Model access from the navigation pane to enable access the three new models in the Stability AI section.

Now that I have access, I choose Image in the Playgrounds section of the navigation pane. For the model, I choose Stability AI and Stable Image Ultra.

As prompt, I type:

A stylized picture of a cute old steampunk robot with in its hands a sign written in chalk that says "Stable Image Ultra in Amazon Bedrock".

I leave all other options to their default values and choose Run. After a few seconds, I get what I asked. Here’s the image:

Using Stable Image Ultra with the AWS CLI
While I am still in the console Image playground, I choose the three small dots in the corner of the playground window and then View API request. In this way, I can see the AWS Command Line Interface (AWS CLI) command equivalent to what I just did in the console:

aws bedrock-runtime invoke-model \
--model-id stability.stable-image-ultra-v1:0 \
--body "{\"prompt\":\"A stylized picture of a cute old steampunk robot with in its hands a sign written in chalk that says \\\"Stable Image Ultra in Amazon Bedrock\\\".\",\"mode\":\"text-to-image\",\"aspect_ratio\":\"1:1\",\"output_format\":\"jpeg\"}" \
--cli-binary-format raw-in-base64-out \
--region us-west-2 \
invoke-model-output.txt

To use Stable Image Core or Stable Diffusion 3 Large, I can replace the model ID.

The previous command outputs the image in Base64 format inside a JSON object in a text file.

To get the image with a single command, I write the output JSON file to standard output and use the jq tool to extract the encoded image so that it can be decoded on the fly. The output is written in the img.png file. Here’s the full command:

aws bedrock-runtime invoke-model \
--model-id stability.stable-image-ultra-v1:0 \
--body "{\"prompt\":\"A stylized picture of a cute old steampunk robot with in its hands a sign written in chalk that says \\\"Stable Image Ultra in Amazon Bedrock\\\".\",\"mode\":\"text-to-image\",\"aspect_ratio\":\"1:1\",\"output_format\":\"jpeg\"}" \
--cli-binary-format raw-in-base64-out \
--region us-west-2 \
/dev/stdout | jq -r '.images[0]' | base64 --decode > img.png

Using Stable Image Ultra with AWS SDKs
Here’s how you can use Stable Image Ultra with the AWS SDK for Python (Boto3). This simple application interactively asks for a text-to-image prompt and then calls Amazon Bedrock to generate the image.

import base64
import boto3
import json
import os

MODEL_ID = "stability.stable-image-ultra-v1:0"

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

print("Enter a prompt for the text-to-image model:")
prompt = input()

body = {
    "prompt": prompt,
    "mode": "text-to-image"
}
response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=json.dumps(body))

model_response = json.loads(response["body"].read())

base64_image_data = model_response["images"][0]

i, output_dir = 1, "output"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
while os.path.exists(os.path.join(output_dir, f"img_{i}.png")):
    i += 1

image_data = base64.b64decode(base64_image_data)

image_path = os.path.join(output_dir, f"img_{i}.png")
with open(image_path, "wb") as file:
    file.write(image_data)

print(f"The generated image has been saved to {image_path}")

The application writes the resulting image in an output directory that is created if not present. To not overwrite existing files, the code checks for existing files to find the first file name available with the img_<number>.png format.

More examples of how to use Stable Diffusion models are available in the Code Library of the AWS Documentation.

Customer voices
Learn from Ken Hoge, Global Alliance Director, Stability AI, how Stable Diffusion models are reshaping the industry from text-to-image to video, audio, and 3D, and how Amazon Bedrock empowers customers with an all-in-one, secure, and scalable solution.

Step into a world where reading comes alive with Nicolette Han, Product Owner, Stride Learning. With support from Amazon Bedrock and AWS, Stride Learning’s Legend Library is transforming how young minds engage with and comprehend literature using AI to create stunning, safe illustrations for children stories.

Things to know
The new Stability AI models – Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core – are available today in Amazon Bedrock in the US West (Oregon) AWS Region. With this launch, Amazon Bedrock offers a broader set of solutions to boost your creativity and accelerate content generation workflows. See the Amazon Bedrock pricing page to understand costs for your use case.

You can find more information on Stable Diffusion 3 in the research paper that describes in detail the underlying technology.

To start, see the Stability AI’s models section of the Amazon Bedrock User Guide. To discover how others are using generative AI in their solutions and learn with deep-dive technical content, visit community.aws.

— Danilo

Agents for Amazon Bedrock now support memory retention and code interpretation (preview)

2024-07-10 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/agents-for-amazon-bedrock-now-support-memory-retention-and-code-interpretation-preview/

With Agents for Amazon Bedrock, generative artificial intelligence (AI) applications can run multistep tasks across different systems and data sources. A couple of months back, we simplified the creation and configuration of agents. Today, we are introducing in preview two new fully managed capabilities:

Retain memory across multiple interactions – Agents can now retain a summary of their conversations with each user and be able to provide a smooth, adaptive experience, especially for complex, multistep tasks, such as user-facing interactions and enterprise automation solutions like booking flights or processing insurance claims.

Support for code interpretation – Agents can now dynamically generate and run code snippets within a secure, sandboxed environment and be able to address complex use cases such as data analysis, data visualization, text processing, solving equations, and optimization problems. To make it easier to use this feature, we also added the ability to upload documents directly to an agent.

Let’s see how these new capabilities work in more detail.

Memory retention across multiple interactions
With memory retention, you can build agents that learn and adapt to each user’s unique needs and preferences over time. By maintaining a persistent memory, agents can pick up right where the users left off, providing a smooth flow of conversations and workflows, especially for complex, multistep tasks.

Imagine a user booking a flight. Thanks to the ability to retain memory, the agent can learn their travel preferences and use that knowledge to streamline subsequent booking requests, creating a personalized and efficient experience. For example, it can automatically propose the right seat to a user or a meal similar to their previous choices.

Using memory retention to be more context-aware also simplifies business process automation. For example, an agent used by an enterprise to process customer feedback can now be aware of previous and on-going interactions with the same customer without having to handle custom integrations.

Each user’s conversation history and context are securely stored under a unique memory identifier (ID), ensuring complete separation between users. With memory retention, it’s easier to build agents that provide seamless, adaptive, and personalized experiences that continuously improve over time. Let’s see how this works in practice.

Using memory retention in Agents for Amazon Bedrock
In the Amazon Bedrock console, I choose Agents from the Builder Tools section of the navigation pane and start creating an agent.

For the agent, I use agent-book-flight as the name with this as description:

Help book a flight.

Then, in the agent builder, I select the Anthropic’s Claude 3 Sonnet model and enter these instructions:

To book a flight, you should know the origin and destination airports and the day and time the flight takes off.

In Additional settings, I enable User input to allow the agent to ask clarifying questions to capture necessary inputs. This will help when a request to book a flight misses some necessary information such as the origin and destination or the date and time of the flight.

In the new Memory section, I enable memory to generate and store a session summary at the end of each session and use the default 30 days for memory duration.

Then, I add an action group to search and book flights. I use search-and-book-flights as name and this description:

Search for flights between two destinations on a given day and book a specific flight.

Then, I choose to define the action group with function details and then to create a new Lambda function. The Lambda function will implement the business logic for all the functions in this action group.

I add two functions to this action group: one to search for flights and another to book flights.

The first function is search-for-flights and has this description:

Search for flights on a given date between two destinations.

All parameters of this function are required and of type string. Here are the parameters’ names and descriptions:

origin_airport – Origin IATA airport code
destination_airport – Destination IATA airport code
date – Date of the flight in YYYYMMDD format

The second function is book-flight and uses this description:

Book a flight at a given date and time between two destinations.

Again, all parameters are required and of type string. These are the names and descriptions for the parameters:

origin_airport – Origin IATA airport code
destination_airport – Destination IATA airport code
date – Date of the flight in YYYYMMDD format
time – Time of the flight in HHMM format

To complete the creation of the agent, I choose Create.

To access the source code of the Lambda function, I choose the search-and-book-flights action group and then View (near the Select Lambda function settings). Normally, I’d use this Lambda function to integrate with an existing system such as a travel booking platform. In this case, I use this code to simulate a booking platform for the agent.

import json
import random
from datetime import datetime, time, timedelta


def convert_params_to_dict(params_list):
    params_dict = {}
    for param in params_list:
        name = param.get("name")
        value = param.get("value")
        if name is not None:
            params_dict[name] = value
    return params_dict


def generate_random_times(date_str, num_flights, min_hours, max_hours):
    # Set seed based on input date
    seed = int(date_str)
    random.seed(seed)

    # Convert min_hours and max_hours to minutes
    min_minutes = min_hours * 60
    max_minutes = max_hours * 60

    # Generate random times
    random_times = set()
    while len(random_times) < num_flights:
        minutes = random.randint(min_minutes, max_minutes)
        hours, mins = divmod(minutes, 60)
        time_str = f"{hours:02d}{mins:02d}"
        random_times.add(time_str)

    return sorted(random_times)


def get_flights_for_date(date):
    num_flights = random.randint(1, 6) # Between 1 and 6 flights per day
    min_hours = 6 # 6am
    max_hours = 22 # 10pm
    flight_times = generate_random_times(date, num_flights, min_hours, max_hours)
    return flight_times
    
    
def get_days_between(start_date, end_date):
    # Convert string dates to datetime objects
    start = datetime.strptime(start_date, "%Y%m%d")
    end = datetime.strptime(end_date, "%Y%m%d")
    
    # Calculate the number of days between the dates
    delta = end - start
    
    # Generate a list of all dates between start and end (inclusive)
    date_list = [start + timedelta(days=i) for i in range(delta.days + 1)]
    
    # Convert datetime objects back to "YYYYMMDD" string format
    return [date.strftime("%Y%m%d") for date in date_list]


def lambda_handler(event, context):
    print(event)
    agent = event['agent']
    actionGroup = event['actionGroup']
    function = event['function']
    param = convert_params_to_dict(event.get('parameters', []))

    if actionGroup == 'search-and-book-flights':
        if function == 'search-for-flights':
            flight_times = get_flights_for_date(param['date'])
            body = f"On {param['date']} (YYYYMMDD), these are the flights from {param['origin_airport']} to {param['destination_airport']}:\n{json.dumps(flight_times)}"
        elif function == 'book-flight':
            body = f"Flight from {param['origin_airport']} to {param['destination_airport']} on {param['date']} (YYYYMMDD) at {param['time']} (HHMM) booked and confirmed."
        elif function == 'get-flights-in-date-range':
            days = get_days_between(param['start_date'], param['end_date'])
            flights = {}
            for day in days:
                flights[day] = get_flights_for_date(day)
            body = f"These are the times (HHMM) for all the flights from {param['origin_airport']} to {param['destination_airport']} between {param['start_date']} (YYYYMMDD) and {param['end_date']} (YYYYMMDD) in JSON format:\n{json.dumps(flights)}"
        else:
            body = f"Unknown function {function} for action group {actionGroup}."
    else:
        body = f"Unknown action group {actionGroup}."
    
    # Format the output as expected by the agent
    responseBody =  {
        "TEXT": {
            "body": body
        }
    }

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }

    }

    function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print(f"Response: {function_response}")

    return function_response

I prepare the agent to test it in the console and ask this question:

Which flights are available from London Heathrow to Rome Fiumicino on July 20th, 2024?

The agent replies with a list of times. I choose Show trace to get more information about how the agent processed my instructions.

In the Trace tab, I explore the trace steps to understand the chain of thought used by the agent’s orchestration. For example, here I see that the agent handled the conversion of the airport names to codes (LHR for London Heathrow, FCO for Rome Fiumicino) before calling the Lambda function.

In the new Memory tab, I see what’s the content of the memory. The console uses a specific test memory ID. In an application, to keep memory separated for each user, I can use a different memory ID for every user.

I look at the list of flights and ask to book one:

Book the one at 6:02pm.

The agent replies confirming the booking.

After a few minutes, after the session has expired, I see a summary of my conversation in the Memory tab.

I choose the broom icon to start with a new conversation and ask a question that, by itself, doesn’t provide a full context to the agent:

Which other flights are available on the day of my flight?

The agent recalls the flight that I booked from our previous conversation. To provide me with an answer, the agent asks me to confirm the flight details. Note that the Lambda function is just a simulation and didn’t store the booking information in any database. The flight details were retrieved from the agent’s memory.

I confirm those values and get the list of the other flights with the same origin and destination on that day.

Yes, please.

To better demonstrate the benefits of memory retention, let’s call the agent using the AWS SDK for Python (Boto3). To do so, I first need to create an agent alias and version. I write down the agent ID and the alias ID because they are required when invoking the agent.

In the agent invocation, I add the new memoryId option to use memory. By including this option, I get two benefits:

The memory retained for that memoryId (if any) is used by the agent to improve its response.
A summary of the conversation for the current session is retained for that memoryId so that it can be used in another session.

Using an AWS SDK, I can also get the content or delete the content of the memory for a specific memoryId.

import random
import string
import boto3
import json

DEBUG = False # Enable debug to see all trace steps
DATE_FORMAT = "%Y-%m-%d %H:%M:%S"

AGENT_ID = 'URSVOGLFNX'
AGENT_ALIAS_ID = 'JHLX9ERCMD'

SESSION_ID_LENGTH = 10
SESSION_ID = "".join(
    random.choices(string.ascii_uppercase + string.digits, k=SESSION_ID_LENGTH)
)

# A unique identifier for each user
MEMORY_ID = 'danilop-92f79781-a3f3-4192-8de6-890b67c63d8b' 
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')


def invoke_agent(prompt, end_session=False):
    response = bedrock_agent_runtime.invoke_agent(
        agentId=AGENT_ID,
        agentAliasId=AGENT_ALIAS_ID,
        sessionId=SESSION_ID,
        inputText=prompt,
        memoryId=MEMORY_ID,
        enableTrace=DEBUG,
        endSession=end_session,
    )

    completion = ""

    for event in response.get('completion'):
        if DEBUG:
            print(event)
        if 'chunk' in event:
            chunk = event['chunk']
            completion += chunk['bytes'].decode()

    return completion


def delete_memory():
    try:
        response = bedrock_agent_runtime.delete_agent_memory(
            agentId=AGENT_ID,
            agentAliasId=AGENT_ALIAS_ID,
            memoryId=MEMORY_ID,
        )
    except Exception as e:
        print(e)
        return None
    if DEBUG:
        print(response)


def get_memory():
    response = bedrock_agent_runtime.get_agent_memory(
        agentId=AGENT_ID,
        agentAliasId=AGENT_ALIAS_ID,
        memoryId=MEMORY_ID,
        memoryType='SESSION_SUMMARY',
    )
    memory = ""
    for content in response['memoryContents']:
        if 'sessionSummary' in content:
            s = content['sessionSummary']
            memory += f"Session ID {s['sessionId']} from {s['sessionStartTime'].strftime(DATE_FORMAT)} to {s['sessionExpiryTime'].strftime(DATE_FORMAT)}\n"
            memory += s['summaryText'] + "\n"
    if memory == "":
        memory = "<no memory>"
    return memory


def main():
    print("Delete memory? (y/n)")
    if input() == 'y':
        delete_memory()

    print("Memory content:")
    print(get_memory())

    prompt = input('> ')
    if len(prompt) > 0:
        print(invoke_agent(prompt, end_session=False)) # Start a new session
        invoke_agent('end', end_session=True) # End the session

if __name__ == "__main__":
    main()

I run the Python script from my laptop. I choose to delete the current memory (even if it should be empty for now) and then ask to book a morning flight on a specific date.

Delete memory? (y/n)
y
Memory content:
<no memory>
> Book me on a morning flight on July 20th, 2024 from LHR to FCO.
I have booked you on the morning flight from London Heathrow (LHR) to Rome Fiumicino (FCO) on July 20th, 2024 at 06:44.

I wait a couple of minutes and run the script again. The script creates a new session every time it’s run. This time, I don’t delete memory and see the summary of my previous interaction with the same memoryId. Then, I ask on which date my flight is scheduled. Even though this is a new session, the agent finds the previous booking in the content of the memory.

Delete memory? (y/n)
n
Memory content:
Session ID MM4YYW0DL2 from 2024-07-09 15:35:47 to 2024-07-09 15:35:58
The user's goal was to book a morning flight from LHR to FCO on July 20th, 2024. The assistant booked a 0644 morning flight from LHR to FCO on the requested date of July 20th, 2024. The assistant successfully booked the requested morning flight for the user. The user requested a morning flight booking on July 20th, 2024 from London Heathrow (LHR) to Rome Fiumicino (FCO). The assistant booked a 0644 flight for the specified route and date.

> Which date is my flight on?
I recall from our previous conversation that you booked a morning flight from London Heathrow (LHR) to Rome Fiumicino (FCO) on July 20th, 2024. Please confirm if this date of July 20th, 2024 is correct for the flight you are asking about.

Yes, that’s my flight!

Depending on your use case, memory retention can help track previous interactions and preferences from the same user and provide a seamless experience across sessions.

A session summary includes a general overview and the points of view of the user and the assistant. For a short session as this one, this can cause some repetition.

Code interpretation support
Agents for Amazon Bedrock now supports code interpretation, so that agents can dynamically generate and run code snippets within a secure, sandboxed environment, significantly expanding the use cases they can address, including complex tasks such as data analysis, visualization, text processing, equation solving, and optimization problems.

Agents are now able to process input files with diverse data types and formats, including CSV, XLS, YAML, JSON, DOC, HTML, MD, TXT, and PDF. Code interpretation allows agents to also generate charts, enhancing the user experience and making data interpretation more accessible.

Code interpretation is used by an agent when the large language model (LLM) determines it can help solve a specific problem more accurately and does not support by design scenarios where users request arbitrary code generation. For security, each user session is provided with an isolated, sandboxed code runtime environment.

Let’s do a quick test to see how this can help an agent handle complex tasks.

Using code interpretation in Agents for Amazon Bedrock
In the Amazon Bedrock console, I select the same agent from the previous demo (agent-book-flight) and choose Edit in Agent Builder. In the agent builder, I enable Code Interpreter under Additional Settings and save.

I prepare the agent and test it straight in the console. First, I ask a mathematical question.

Compute the sum of the first 10 prime numbers.

After a few seconds, I get the answer from the agent:

The sum of the first 10 prime numbers is 129.

That’s accurate. Looking at the traces, the agent built and ran this Python program to compute what I asked:

import math

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

primes = []
n = 2
while len(primes) < 10:
    if is_prime(n):
        primes.append(n)
    n += 1
    
print(f"The first 10 prime numbers are: {primes}")
print(f"The sum of the first 10 prime numbers is: {sum(primes)}")

Now, let’s go back to the agent-book-flight agent. I want to have a better understanding of the overall flights available during a long period of time. To do so, I start by adding a new function to the same action group to get all the flights available in a date range.

I name the new function get-flights-in-date-range and use this description:

Get all the flights between two destinations for each day in a date range.

All the parameters are required and of type string. These are the parameters names and descriptions:

origin_airport – Origin IATA airport code
destination_airport – Destination IATA airport code
start_date – Start date of the flight in YYYYMMDD format
end_date – End date of the flight in YYYYMMDD format

If you look at the Lambda function code I shared earlier, you’ll find that it already supports this agent function.

Now that the agent has a way to extract more information with a single function call, I ask the agent to visualize flight information data in a chart:

Draw a chart with the number of flights each day from JFK to SEA for the first ten days of August, 2024.

The agent reply includes a chart:

I choose the link to download the image on my computer:

That’s correct. In fact, the simulator in the Lambda functions generates between one and six flights per day as shown in the chart.

Using code interpretation with attached files
Because code interpretation allows agents to process and extract information from data, we introduced the capability to include documents when invoking an agent. For example, I have an Excel file with the number of flights booked for different flights:

Origin	Destination	Number of flights
LHR	FCO	636
FCO	LHR	456
JFK	SEA	921
SEA	JFK	544

Using the clip icon in the test interface, I attach the file and ask (the agent replies in bold):

What is the most popular route? And the least one?

Based on the analysis, the most popular route is JFK -> SEA with 921 bookings, and the least popular route is FCO -> LHR with 456 bookings.

How many flights in total have been booked?

The total number of booked flights across all routes is 2557.

Draw a chart comparing the % of flights booked for these routes compared to the total number.

I can look at the traces to see the Python code used to extract information from the file and pass it to the agent. I can attach more than one file and use different file formats. These options are available in AWS SDKs to let agents use files in your applications.

Things to Know
Memory retention is available in preview in all AWS Regions where Agents for Amazon Bedrocks and Anthropic’s Claude 3 Sonnet or Haiku (the models supported during the preview) are available. Code interpretation is available in preview in the US East (N. Virginia), US West (Oregon), and Europe (Frankfurt) Regions.

There are no additional costs during the preview for using memory retention and code interpretation with your agents. When using agents with these features, normal model use charges apply. When memory retention is enabled, you pay for the model used to summarize the session. For more information, see the Amazon Bedrock Pricing page.

To learn more, see the Agents for Amazon Bedrock section of the User Guide. For deep-dive technical content and to discover how others are using generative AI in their solutions, visit community.aws.

— Danilo

AWS Weekly Roundup: Claude 3.5 Sonnet in Amazon Bedrock, CodeCatalyst updates, SageMaker with MLflow, and more (June 24, 2024)

2024-06-24 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-3-5-sonnet-in-amazon-bedrock-codecatalyst-updates-sagemaker-with-mlflow-and-more-june-24-2024/

This week, I had the opportunity to try the new Anthropic Claude 3.5 Sonnet model in Amazon Bedrock just before it launched, and I was really impressed by its speed and accuracy! It was also the week of AWS Summit Japan; here’s a nice picture of the busy AWS Community stage.

Last week’s launches
With many new capabilities, from recommendations on the size of your Amazon Relational Database Services (Amazon RDS) databases to new built-in transformations in AWS Glue, here’s what got my attention:

Amazon Bedrock – Now supports Anthropic’s Claude 3.5 Sonnet and compressed embeddings from Cohere Embed.

AWS CodeArtifact – With support for Rust packages with Cargo, developers can now store and access their Rust libraries (known as crates).

Amazon CodeCatalyst – Many updates from this unified software development service. You can now assign issues in CodeCatalyst to Amazon Q and direct it to work with source code hosted in GitHub Cloud and Bitbucket Cloud and ask Amazon Q to analyze issues and recommend granular tasks. These tasks can then be individually assigned to users or to Amazon Q itself. You can now also use Amazon Q to help pick the best blueprint for your needs. You can now securely store, publish, and share Maven, Python, and NuGet packages. You can also link an issue to other issues. This allows customers to link issues in CodeCatalyst as blocked by, duplicate of, related to, or blocks another issue. You can now configure a single CodeBuild webhook at organization or enterprise level to receive events from all repositories in your organizations, instead of creating webhooks for each individual repository. Finally, you can now add a default IAM role to an environment.

Amazon EC2 – C7g and R7g instances (powered by AWS Graviton3 processors) are now available in Europe (Milan), Asia Pacific (Hong Kong), and South America (São Paulo) Regions. C7i-flex instances are now available in US East (Ohio) Region.

AWS Compute Optimizer – Now provides rightsizing recommendations for Amazon RDS MySQL, and RDS PostgreSQL. More info in this Cloud Financial Management blog post.

Amazon OpenSearch Service – With JSON Web Token (JWT) authentication and authorization, it’s now easier to integrate identity providers and isolate tenants in a multi-tenant application.

Amazon SageMaker – Now helps you manage machine learning (ML) experiments and the entire ML lifecycle with a fully managed MLflow capability.

AWS Glue – The serverless data integration service now offers 13 new built-in transforms: flag duplicates in column, format Phone Number, format case, fill with mode, flag duplicate rows, remove duplicates, month name, iIs even, cryptographic hash, decrypt, encrypt, int to IP, and IP to int.

Amazon MWAA – Amazon Managed Workflows for Apache Airflow (MWAA) now supports custom domain names for the Airflow web server, allowing to use private web servers with load balancers, custom DNS entries, or proxies to point users to a user-friendly web address.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

AWS re:Inforce 2024 re:Cap – A summary of our annual, immersive, cloud-security learning event by my colleague Wojtek.

Three ways Amazon Q Developer agent for code transformation accelerates Java upgrades – This post offers interesting details on how Amazon Q Developer handles major version upgrades of popular frameworks, replacing deprecated API calls on your behalf, and explainability on code changes.

Five ways Amazon Q simplifies AWS CloudFormation development – For template code generation, querying CloudFormation resource requirements, explaining existing template code, understanding deployment options and issues, and querying CloudFormation documentation.

Improving air quality with generative AI – A nice solution that uses artificial intelligence (AI) to standardize air quality data, addressing the air quality data integration problem of low-cost sensors.

Deploy a Slack gateway for Amazon Bedrock – A solution bringing the power of generative AI directly into your Slack workspace.

An agent-based simulation of Amazon’s inbound supply chain – Simulating the entire US inbound supply chain, including the “first-mile” of distribution and tracking the movement of hundreds of millions of individual products through the network.

AWS CloudFormation Linter (cfn-lint) v1 – This upgrade is particularly significant because it converts from using the CloudFormation spec to using CloudFormation registry resource provider schemas.

A practical approach to using generative AI in the SDLC – Learn how an AI assistant like Amazon Q Developer helps my colleague Jenna figure out what to build and how to build it.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. This week, you can join the AWS Summit in Washington, DC, June 26–27. Learn here about future AWS Summit events happening in your area.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. This week there are AWS Community Days in Switzerland (June 27), Sri Lanka (June 27), and the Gen AI Edition in Ahmedabad, India (June 29).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Anthropic’s Claude 3.5 Sonnet model now available in Amazon Bedrock: Even more intelligence than Claude 3 Opus at one-fifth the cost

2024-06-20 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/anthropics-claude-3-5-sonnet-model-now-available-in-amazon-bedrock-the-most-intelligent-claude-model-yet/

It’s been just 3 months since Anthropic launched Claude 3, a family of state-of-the-art artificial intelligence (AI) models that allows you to choose the right combination of intelligence, speed, and cost that suits your needs.

Today, Anthropic introduced Claude 3.5 Sonnet, its first release in the forthcoming Claude 3.5 model family. We are happy to announce that Claude 3.5 Sonnet is now available in Amazon Bedrock.

Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming other generative AI models on a wide range of evaluations, including Anthropic’s previously most intelligent model, Claude 3 Opus. Claude 3.5 Sonnet is available with the speed and cost of the original Claude 3 Sonnet model. In fact, you can now get intelligence and speed better than Claude 3 Opus at one-fifth of the price because Claude 3.5 Sonnet is 80 percent cheaper than Opus.

The frontier intelligence displayed by Claude 3.5 Sonnet combined with cost-effective pricing, makes the model ideal for complex tasks such as context-sensitive customer support, orchestrating multi-step workflows, and streamlining code translations.

Claude 3.5 Sonnet sets new industry benchmarks for undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), code (HumanEval), and more. As you can see in the following table, according to Anthropic, Claude 3.5 Sonnet outperforms OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro in nearly every benchmark.

Claude 3.5 Sonnet is also Anthropic’s strongest vision model yet, performing an average of 10 percent better than Claude 3 Opus across the majority of vision benchmarks. According to Anthropic, Claude 3.5 Sonnet also outperforms other generative AI models in nearly every category.

Anthropic’s Claude 3.5 Sonnet key improvements
The release of Claude 3.5 Sonnet brings significant improvements across multiple domains, empowering software developers and businesses with new generative AI-powered capabilities. Here are some of the key strengths of this new model:

Visual processing and understanding – Claude 3.5 Sonnet demonstrates remarkable capabilities in processing images, particularly in interpreting charts and graphs. It accurately transcribes text from imperfect images, a core capability for industries such as retail, logistics, and financial services, to gather more insights from graphics or illustrations than from text alone. Use Claude 3.5 Sonnet to automate visual data processing tasks, extract valuable information, and enhance data analysis pipelines.

Writing and content generation – Claude 3.5 Sonnet represents a significant leap in its ability to understand nuance and humor. The model produces high-quality written content with a more natural, human tone that feels more authentic and relatable. Use the model to generate engaging and compelling content, streamline your writing workflows, and enhance your storytelling capabilities.

Customer support and natural language processing – With its improved understanding of context and multistep workflow orchestration, Claude 3.5 Sonnet excels at handling intricate customer inquiries. This capability enables round-the-clock support, faster response times, and more natural-sounding interactions, ultimately leading to improved customer satisfaction. Use this model to automate and enhance customer support processes and provide a seamless experience for end users. For an example of a similar implementation, see how DoorDash built a generative AI self-service contact center solution using Anthropic’s Claude 3 models in Amazon Bedrock.

Analytics and insights – Claude 3.5 Sonnet augments human expertise in data science by effortlessly navigating unstructured data and using multiple tools to generate insights. It produces high-quality statistical visualizations and actionable predictions, ranging from business strategies to real-time product trends. Use Claude 3.5 Sonnet to simplify data analysis workflows, uncover valuable insights, and drive data-driven decision-making.

Coding and software development – Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities when provided with the relevant tools. Use Claude 3.5 Sonnet to streamline developer workflows, accelerate coding tasks, reduce manual effort, and enhance overall productivity.

Using Anthropic’s Claude 3.5 Sonnet in the Amazon Bedrock console
Because this is a new model, I go to the Amazon Bedrock console and choose Model access from the navigation pane to enable access to Claude 3.5 Sonnet.

Now that I have access to the model, I’d like to use it to extract information from the following image from the Our World in Data website:

I choose Chat in the Playgrounds section of the navigation pane and select Anthropic and Claude 3.5 Sonnet as the model. I then choose the Image upload button to upload the previous image, type the following prompt, and choose Run:

Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON.

Here’s the result and an animated image recapping what I did.

To determine which countries consume more than 1,000 TWh from hydropower in 2022, I need to carefully examine the map and color-coding. The scale at the bottom shows that dark blue represents consumption between 1,000 TWh and 2,000 TWh.

Let's look at each region systematically:

1. North America: Canada is dark blue, indicating it consumes between 1,000-2,000 TWh from hydropower.
2. South America: Brazil is dark blue, also consuming between 1,000-2,000 TWh.
3. Europe: No country is dark blue; all are light green or have no data.
4. Africa: No data available for any country.
5. Asia: China is dark blue, indicating consumption between 1,000-2,000 TWh.
6. Oceania: No country is dark blue; Australia is light green.

In summary, there are three countries that consume more than 1,000 TWh from hydropower in 2022:
1. Canada (North America)
2. Brazil (South America)
3. China (Asia)

Here's the response in JSON format:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    {
      "name": "Canada",
      "region": "North America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "Brazil",
      "region": "South America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "China",
      "region": "Asia",
      "consumption_range": "1,000-2,000 TWh"
    }
  ]
}
```

These three countries stand out as the highest consumers of hydropower energy, each using between 1,000 and 2,000 terawatt-hours in 2022.

The model’s ability to reliably extract information from unstructured data, like images, opens up a world of new possibilities.

I choose the three small dots in the corner of the playground window and then View API request to see code examples using the model in the AWS Command Line Interface (AWS CLI) and AWS SDKs. Let’s have a better look at the code syntax.

Using Claude 3.5 Sonnet with AWS SDKs
You can use Claude 3.5 Sonnet with any AWS SDK using the new Amazon Bedrock Converse API or Anthropic Claude Messages API.

To update code already using a Claude 3 model, I just need to replace the model ID with:

anthropic.claude-3-5-sonnet-20240620-v1:0

Here’s a sample implementation with the AWS SDK for Python (Boto3) using the same image as before to show how to use images and text with the Converse API.

import boto3
from botocore.exceptions import ClientError

MODEL_ID = "anthropic.claude-3-5-sonnet-20240620-v1:0"

IMAGE_NAME = "primary-energy-hydro.png"

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

When I run it, I get a similar output as in the console:

Let's approach this step-by-step:

1. First, I'll examine the color scale at the bottom of the map. The darkest blue color represents consumption of 2,000 TWh or more.

2. Now, I'll scan the map region by region:

   North America: Canada is dark blue, indicating over 1,000 TWh.
   South America: Brazil is also dark blue, over 1,000 TWh.
   Europe: No country appears to be dark blue.
   Africa: No country appears to be dark blue.
   Asia: China stands out as dark blue, indicating over 1,000 TWh.
   Oceania: No country appears to be dark blue.

3. To be thorough, I'll double-check for any medium blue countries that might be close to or over 1,000 TWh, but I don't see any that appear to reach that threshold.

4. Based on this analysis, there are three countries that clearly consume more than 1,000 TWh from hydropower.

Now, I'll format the answer in JSON:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    "Canada",
    "Brazil",
    "China"
  ]
}
```

This JSON output lists the three countries that visually appear to consume more than 1,000 TWh of primary energy from hydropower according to the 2022 data presented in the map.

Because I didn’t specify a JSON syntax, the two answers use a different format. In your applications, you can describe in the prompt the JSON properties you want or provide a sample to get a standard format in output.

For more examples, see the code samples in the Amazon Bedrock User Guide. For a more advanced use case, here’s a fully functional tool use demo illustrating how to connect a generative AI model with a custom tool or API.

Using Claude 3.5 Sonnet with the AWS CLI
There are times when nothing beats the speed of the command line. This is how you can use the AWS CLI with the new model:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20240620-v1:0 \
    --messages '{"role": "user", "content": [{"text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"}]}' \
    --region us-east-1
    --query output.message.content

In the output, I use the query option to only get the content of the output message:

[
    {
        "text": "Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters."
    }
]

I copy the text into a small Python program to see it printed on multiple lines:

print("Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters.")

Let's approach this step-by-step:

1. First, we need to understand the relationships:
   - Alice has N brothers
   - Alice has M sisters

2. Now, let's consider Alice's brother:
   - He is one of Alice's N brothers
   - He has the same parents as Alice

3. This means that Alice's brother has:
   - The same sisters as Alice
   - One sister more than Alice (because Alice herself is his sister)

4. Therefore, the number of sisters Alice's brother has is:
   M + 1

   Where M is the number of sisters Alice has.

So, the answer is: Alice's brother has M + 1 sisters.

Even if this was a quite nuanced question, Claude 3.5 Sonnet got it right and described its reasoning step by step.

Things to know
Anthropic’s Claude 3.5 Sonnet is available in Amazon Bedrock today in the US East (N. Virginia) AWS Region. More information on Amazon Bedrock model support by Region is available in the documentation. View the Amazon Bedrock pricing page to determine the costs for your specific use case.

By providing access to a faster and more powerful model at a lower cost, Claude 3.5 Sonnet makes generative AI easier and more effective to use for many industries, such as:

Healthcare and life sciences – In the medical field, Claude 3.5 Sonnet shows promise in enhancing imaging analysis, acting as a diagnostic assistant for patient triage, and summarizing the latest research findings in an easy-to-digest format.

Financial services – The model can provide valuable assistance in identifying financial trends and creating personalized debt repayment plans tailored to clients’ unique situations.

Legal – Law firms can use the model to accelerate legal research by quickly surfacing relevant precedents and statutes. Additionally, the model can increase paralegal efficiency through contract analysis and assist with drafting standard legal documents.

Media and entertainment – The model can expedite research for journalists, support the creative process of scriptwriting and character development, and provide valuable audience sentiment analysis.

Technology – For software developers, Claude 3.5 Sonnet offers opportunities in rapid application prototyping, legacy code migration, innovative feature ideation, user experience optimization, and identification of friction points.

Education – Educators can use the model to streamline grant proposal writing, develop comprehensive curricula incorporating emerging trends, and receive research assistance through database queries and insight generation.

It’s an exciting time for for generative AI. To start using this new model, see the Anthropic Claude models section of the Amazon Bedrock User Guide. You can also visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you do with these enhanced capabilities!

— Danilo

Simplify risk and compliance assessments with the new common control library in AWS Audit Manager

2024-06-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/simplify-risk-and-compliance-assessments-with-the-new-common-control-library-in-aws-audit-manager/

With AWS Audit Manager, you can map your compliance requirements to AWS usage data and continually audit your AWS usage as part of your risk and compliance assessment. Today, Audit Manager introduces a common control library that provides common controls with predefined and pre-mapped AWS data sources.

The common control library is based on extensive mapping and reviews conducted by AWS certified auditors, verifying that the appropriate data sources are identified for evidence collection. Governance, Risk and Compliance (GRC) teams can use the common control library to save time time when mapping enterprise controls into Audit Manager for evidence collection, reducing their dependence on information technology (IT) teams.

Using the common control library, you can view the compliance requirements for multiple frameworks (such as PCI or HIPAA) associated with the same common control in one place, making it easier to understand your audit readiness across multiple frameworks simultaneously. In this way, you don’t need to implement different compliance standard requirements individually and then review the resulting data multiple times for different compliance regimes.

Additionally, by using controls from this library, you automatically inherit improvements as Audit Manager updates or adds new data sources, such as additional AWS CloudTrail events, AWS API calls, AWS Config rules, or maps additional compliance frameworks to common controls. This eliminates the efforts required by GRC and IT teams to constantly update and manage evidence sources and makes it easier to benefit from additional compliance frameworks that Audit Manager adds to its library.

Let’s see how this works in practice with an example.

Using AWS Audit Manager common control library
A common scenario for an airline is to implement a policy so that their customer payments, including in-flight meals and internet access, can only be taken via credit card. To implement this policy, the airline develops an enterprise control for IT operations that says that “customer transactions data is always available.” How can they monitor whether their applications on AWS meet this new control?

Acting as their compliance officer, I open the Audit Manager console and choose Control library from the navigation bar. The control library now includes the new Common category. Each common control maps to a group of core controls that collect evidence from AWS managed data sources and makes it easier to demonstrate compliance with a range of overlapping regulations and standards. I look through the common control library and search for “availability.” Here, I realize the airline’s expected requirements map to common control High availability architecture in the library.

I expand the High availability architecture common control to see the underlying core controls. There, I notice this control doesn’t adequately meet all the company’s needs because Amazon DynamoDB is not in this list. DynamoDB is a fully managed database, but given extensive usage of DynamoDB in their application architecture, they definitely want their DynamoDB tables to be available when their workload grows or shrinks. This might not be the case if they configured a fixed throughput for a DynamoDB table.

I look again through the common control library and search for “redundancy.” I expand the Fault tolerance and redundancy common control to see how it maps to core controls. There, I see the Enable Auto Scaling for Amazon DynamoDB tables core control. This core control is relevant for the architecture that the airline has implemented but the whole common control is not needed.

Additionally, common control High availability architecture already includes a couple of core controls that check that Multi-AZ replication on Amazon Relational Database Service (RDS) is enabled, but these core controls rely on an AWS Config rule. This rule doesn’t work for this use case because the airline does not use AWS Config. One of these two core controls also uses a CloudTrail event, but that event does not cover all scenarios.

As the compliance officer, I would like to collect the actual resource configuration. To collect this evidence, I briefly consult with an IT partner and create a custom control using a Customer managed source. I select the api-rds_describedbinstances API call and set a weekly collection frequency to optimize costs.

Implementing the custom control can be handled by the compliance team with minimal interaction needed from the IT team. If the compliance team has to reduce their reliance on IT, they can implement the entire second common control (Fault tolerance and redundancy) instead of only selecting the core control related to DynamoDB. It might be more than what they need based on their architecture, but the acceleration of velocity and reduction of time and effort for both the compliance and IT teams is often a bigger benefit than optimizing the controls in place.

I now choose Framework library in the navigation pane and create a custom framework that includes these controls. Then, I choose Assessments in the navigation pane and create an assessment that includes the custom framework. After I create the assessment, Audit Manager starts collecting evidence about the selected AWS accounts and their AWS usage.

By following these steps, a compliance team can precisely report on the enterprise control “customer transactions data is always available” using an implementation in line with their system design and their existing AWS services.

Things to know
The common control library is available today in all AWS Regions where AWS Audit Manager is offered. There is no additional cost for using the common control library. For more information, see AWS Audit Manager pricing.

This new capability streamlines the compliance and risk assessment process, reducing the workload for GRC teams and simplifying the way they can map enterprise controls into Audit Manager for evidence collection. To learn more, see the AWS Audit Manager User Guide.

— Danilo

AWS Weekly Roundup: Amazon Bedrock, AWS CodeBuild, Amazon CodeCatalyst, and more (April 29, 2024)

2024-04-29 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-bedrock-aws-codebuild-amazon-codecatalyst-and-more-april-29-2024/

This was a busy week for Amazon Bedrock with many new features! Using GitHub Actions with AWS CodeBuild is much easier. Also, Amazon Q in Amazon CodeCatalyst can now manage more complex issues.

I was amazed to meet so many new and old friends at the AWS Summit London. To give you a quick glimpse, here’s AWS Hero Yan Cui starting his presentation at the AWS Community stage.

Last week’s launches
With so many interesting new features, I start with generative artificial intelligence (generative AI) and then move to the other topics. Here’s what got my attention:

Amazon Bedrock – For supported architectures such as Llama, Mistral, or Flan T5, you can now import custom models and access them on demand. Model evaluation is now generally available to help you evaluate, compare, and select the best foundation models (FMs) for your specific use case. You can now access Meta’s Llama 3 models.

Agents for Amazon Bedrock – A simplified agent creation and return of control, so that you can define an action schema and get the control back to perform those action without needing to create a specific AWS Lambda function. Agents also added support for Anthropic Claude 3 Haiku and Sonnet to help build faster and more intelligent agents.

Knowledge Bases for Amazon Bedrock – You can now ingest data from up to five data sources and provide more complete answers. In the console, you can now chat with one of your documents without needing to set up a vector database (read more in this Machine Learning blog post).

Guardrails for Amazon Bedrock – The capability to implement safeguards based on your use cases and responsible AI policies is now available with new safety filters and privacy controls.

Amazon Titan – The new watermark detection feature is now generally available in Amazon Bedrock. In this way, you can identify images generated by Amazon Titan Image Generator using an invisible watermark present in all images generated by Amazon Titan.

Amazon CodeCatalyst – Amazon Q can now split complex issues into separate, simpler tasks that can then be assigned to a user or back to Amazon Q. CodeCatalyst now also supports approval gates within a workflow. Approval gates pause a workflow that is building, testing, and deploying code so that a user can validate whether it should be allowed to proceed.

Amazon EC2 – You can now remove an automatically assigned public IPv4 address from an EC2 instance. If you no longer need the automatically assigned public IPv4 (for example, because you are migrating to using a private IPv4 address for SSH with EC2 instance connect), you can use this option to quickly remove the automatically assigned public IPv4 address and reduce your public IPv4 costs.

Network Load Balancer – Now supports Resource Map in AWS Management Console, a tool that displays all your NLB resources and their relationships in a visual format on a single page. Note that Application Load Balancer already supports Resource Map in the console.

AWS CodeBuild – Now supports managed GitHub Action self-hosted runners. You can configure CodeBuild projects to receive GitHub Actions workflow job events and run them on CodeBuild ephemeral hosts.

Amazon Route 53 – You can now define a standard DNS configuration in the form of a Profile, apply this configuration to multiple VPCs, and share it across AWS accounts.

AWS Direct Connect – Hosted connections now support capacities up to 25 Gbps. Before, the maximum was 10 Gbps. Higher bandwidths simplify deployments of applications such as advanced driver assistance systems (ADAS), media and entertainment (M&E), artificial intelligence (AI), and machine learning (ML).

NoSQL Workbench for Amazon DynamoDB – A revamped operation builder user interface to help you better navigate, run operations, and browse your DynamoDB tables.

Amazon GameLift – Now supports in preview end-to-end development of containerized workloads, including deployment and scaling on premises, in the cloud, or for hybrid configurations. You can use containers for building, deploying, and running game server packages.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

GQL, the new ISO standard for graphs, has arrived – GQL, which stands for Graph Query Language, is the first new ISO database language since the introduction of SQL in 1987.

Authorize API Gateway APIs using Amazon Verified Permissions and Amazon Cognito – Externalizing authorization logic for application APIs can yield multiple benefits. Here’s an example of how to use Cedar policies to secure a REST API.

Build and deploy a 1 TB/s file system in under an hour – Very nice walkthrough for something that used to be not so easy to do in the recent past.

Let’s Architect! Discovering Generative AI on AWS – A new episode in this amazing series of posts that provides a broad introduction to the domain and then shares a mix of videos, blog posts, and hands-on workshops.

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock – This post explores the new features (including AWS CloudFormation support) and how they align with the AWS Well-Architected Framework.

Using the unified CloudWatch Agent to send traces to AWS X-Ray – With added support for the collection of AWS X-Ray and OpenTelemetry traces, you can now provision a single agent to capture metrics, logs, and traces.

The executive’s guide to generative AI for sustainability – A guide for implementing a generative AI roadmap within sustainability strategies.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Singapore (May 7), Seoul (May 16–17), Hong Kong (May 22), Milan (May 23), Stockholm (June 4), and Madrid (June 5).

AWS re:Inforce – Explore 2.5 days of immersive cloud security learning in the age of generative AI at AWS re:Inforce, June 10–12 in Pennsylvania.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Turkey (May 18), Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), Nigeria (August 24), and New York (August 28).

GOTO EDA Day London – Join us in London on May 14 to learn about event-driven architectures (EDA) for building highly scalable, fault tolerant, and extensible applications. This conference is organized by GOTO, AWS, and partners.

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Agents for Amazon Bedrock: Introducing a simplified creation and configuration experience

2024-04-23 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/agents-for-amazon-bedrock-introducing-a-simplified-creation-and-configuration-experience/

With Agents for Amazon Bedrock, applications can use generative artificial intelligence (generative AI) to run tasks across multiple systems and data sources. Starting today, these new capabilities streamline the creation and management of agents:

Quick agent creation – You can now quickly create an agent and optionally add instructions and action groups later, providing flexibility and agility for your development process.

Agent builder – All agent configurations can be operated in the new agent builder section of the console.

Simplified configuration – Action groups can use a simplified schema that just lists functions and parameters without having to provide an API schema.

Return of control –You can skip using an AWS Lambda function and return control to the application invoking the agent. In this way, the application can directly integrate with systems outside AWS or call internal endpoints hosted in any Amazon Virtual Private Cloud (Amazon VPC) without the need to integrate the required networking and security configurations with a Lambda function.

Infrastructure as code – You can use AWS CloudFormation to deploy and manage agents with the new simplified configuration, ensuring consistency and reproducibility across environments for your generative AI applications.

Let’s see how these enhancements work in practice.

Creating an agent using the new simplified console
To test the new experience, I want to build an agent that can help me reply to an email containing customer feedback. I can use generative AI, but a single invocation of a foundation model (FM) is not enough because I need to interact with other systems. To do that, I use an agent.

In the Amazon Bedrock console, I choose Agents from the navigation pane and then Create Agent. I enter a name for the agent (customer-feedback) and a description. Using the new interface, I proceed and create the agent without providing additional information at this stage.

I am now presented with the Agent builder, the place where I can access and edit the overall configuration of an agent. In the Agent resource role, I leave the default setting as Create and use a new service role so that the AWS Identity and Access Management (IAM) role assumed by the agent is automatically created for me. For the model, I select Anthropic and Claude 3 Sonnet.

In Instructions for the Agent, I provide clear and specific instructions for the task the agent has to perform. Here, I can also specify the style and tone I want the agent to use when replying. For my use case, I enter:

Help reply to customer feedback emails with a solution tailored to the customer account settings.

In Additional settings, I select Enabled for User input so that the agent can ask for additional details when it does not have enough information to respond. Then, I choose Save to update the configuration of the agent.

I now choose Add in the Action groups section. Action groups are the way agents can interact with external systems to gather more information or perform actions. I enter a name (retrieve-customer-settings) and a description for the action group:

Retrieve customer settings including customer ID.

The description is optional but, when provided, is passed to the model to help choose when to use this action group.

In Action group type, I select Define with function details so that I only need to specify functions and their parameters. The other option here (Define with API schemas) corresponds to the previous way of configuring action groups using an API schema.

Action group functions can be associated to a Lambda function call or configured to return control to the user or application invoking the agent so that they can provide a response to the function. The option to return control is useful for four main use cases:

When it’s easier to call an API from an existing application (for example, the one invoking the agent) than building a new Lambda function with the correct authentication and network configurations as required by the API
When the duration of the task goes beyond the maximum Lambda function timeout of 15 minutes so that I can handle the task with an application running in containers or virtual servers or use a workflow orchestration such as AWS Step Functions
When I have time-consuming actions because, with the return of control, the agent doesn’t wait for the action to complete before proceeding to the next step, and the invoking application can run actions asynchronously in the background while the orchestration flow of the agent continues
When I need a quick way to mock the interaction with an API during the development and testing and of an agent

In Action group invocation, I can specify the Lambda function that will be invoked when this action group is identified by the model during orchestration. I can ask the console to quickly create a new Lambda function, to select an existing Lambda function, or return control so that the user or application invoking the agent will ask for details to generate a response. I select Return Control to show how that works in the console.

I configure the first function of the action group. I enter a name (retrieve-customer-settings-from-crm) and the following description for the function:

Retrieve customer settings from CRM including customer ID using the customer email in the sender/from fields of the email.

In Parameters, I add email with Customer email as the description. This is a parameter of type String and is required by this function. I choose Add to complete the creation of the action group.

Because, for my use case, I expect many customers to have issues when logging in, I add another action group (named check-login-status) with the following description:

Check customer login status.

This time, I select the option to create a new Lambda function so that I can handle these requests in code.

For this action group, I configure a function (named check-customer-login-status-in-login-system) with the following description:

Check customer login status in login system using the customer ID from settings.

In Parameters, I add customer_id, another required parameter of type String. Then, I choose Add to complete the creation of the second action group.

When I open the configuration of this action group, I see the name of the Lambda function that has been created in my account. There, I choose View to open the Lambda function in the console.

In the Lambda console, I edit the starting code that has been provided and implement my business case:

import json

def lambda_handler(event, context):
    print(event)
    
    agent = event['agent']
    actionGroup = event['actionGroup']
    function = event['function']
    parameters = event.get('parameters', [])

    # Execute your business logic here. For more information,
    # refer to: https://docs.aws.amazon.com/bedrock/latest/userguide/agents-lambda.html
    if actionGroup == 'check-login-status' and function == 'check-customer-login-status-in-login-system':
        response = {
            "status": "unknown"
        }
        for p in parameters:
            if p['name'] == 'customer_id' and p['type'] == 'string' and p['value'] == '12345':
                response = {
                    "status": "not verified",
                    "reason": "the email address has not been verified",
                    "solution": "please verify your email address"
                }
    else:
        response = {
            "error": "Unknown action group {} or function {}".format(actionGroup, function)
        }
    
    responseBody =  {
        "TEXT": {
            "body": json.dumps(response)
        }
    }

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }

    }

    dummy_function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print("Response: {}".format(dummy_function_response))

    return dummy_function_response

I choose Deploy in the Lambda console. The function is configured with a resource-based policy that allows Amazon Bedrock to invoke the function. For this reason, I don’t need to update the IAM role used by the agent.

I am ready to test the agent. Back in the Amazon Bedrock console, with the agent selected, I look for the Test Agent section. There, I choose Prepare to prepare the agent and test it with the latest changes.

As input to the agent, I provide this sample email:

From: [email protected]

Subject: Problems logging in

Hi, when I try to log into my account, I get an error and cannot proceed further. Can you check? Thank you, Danilo

In the first step, the agent orchestration decides to use the first action group (retrieve-customer-settings) and function (retrieve-customer-settings-from-crm). This function is configured to return control, and in the console, I am asked to provide the output of the action group function. The customer email address is provided as the input parameter.

To simulate an interaction with an application, I reply with a JSON syntax and choose Submit:

{ "customer id": 12345 }

In the next step, the agent has the information required to use the second action group (check-login-status) and function (check-customer-login-status-in-login-system) to call the Lambda function. In return, the Lambda function provides this JSON payload:

{
  "status": "not verified",
  "reason": "the email address has not been verified",
  "solution": "please verify your email address"
}

Using this content, the agent can complete its task and suggest the correct solution for this customer.

I am satisfied with the result, but I want to know more about what happened under the hood. I choose Show trace where I can see the details of each step of the agent orchestration. This helps me understand the agent decisions and correct the configurations of the agent groups if they are not used as I expect.

Things to know
You can use the new simplified experience to create and manage Agents for Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) AWS Regions.

You can now create an agent without having to specify an API schema or provide a Lambda function for the action groups. You just need to list the parameters that the action group needs. When invoking the agent, you can choose to return control with the details of the operation to perform so that you can handle the operation in your existing applications or if the duration is longer than the maximum Lambda function timeout.

CloudFormation support for Agents for Amazon Bedrock has been released recently and is now being updated to support the new simplified syntax.

To learn more:

See the Agents for Amazon Bedrock section of the User Guide.
Visit our community.aws site to find deep-dive technical content and discover how others are using Amazon Bedrock in their solutions.

— Danilo

Import custom models in Amazon Bedrock (preview)

2024-04-23 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/import-custom-models-in-amazon-bedrock-preview/

With Amazon Bedrock, you have access to a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies that make it easier to build and scale generative AI applications. Some of these models provide publicly available weights that can be fine-tuned and customized for specific use cases. However, deploying customized FMs in a secure and scalable way is not an easy task.

Starting today, Amazon Bedrock adds in preview the capability to import custom weights for supported model architectures (such as Meta Llama 2, Llama 3, and Mistral) and serve the custom model using On-Demand mode. You can import models with weights in Hugging Face safetensors format from Amazon SageMaker and Amazon Simple Storage Service (Amazon S3).

In this way, you can use Amazon Bedrock with existing customized models such as Code Llama, a code-specialized version of Llama 2 that was created by further training Llama 2 on code-specific datasets, or use your data to fine-tune models for your own unique business case and import the resulting model in Amazon Bedrock.

Let’s see how this works in practice.

Bringing a custom model to Amazon Bedrock
In the Amazon Bedrock console, I choose Imported models from the Foundation models section of the navigation pane. Now, I can create a custom model by importing model weights from an Amazon Simple Storage Service (Amazon S3) bucket or from an Amazon SageMaker model.

I choose to import model weights from an S3 bucket. In another browser tab, I download the MistralLite model from the Hugging Face website using this pull request (PR) that provides weights in safetensors format. The pull request is currently Ready to merge, so it might be part of the main branch when you read this. MistralLite is a fine-tuned Mistral-7B-v0.1 language model with enhanced capabilities of processing long context up to 32K tokens.

When the download is complete, I upload the files to an S3 bucket in the same AWS Region where I will import the model. Here are the MistralLite model files in the Amazon S3 console:

Back at the Amazon Bedrock console, I enter a name for the model and keep the proposed import job name.

I select Model weights in the Model import settings and browse S3 to choose the location where I uploaded the model weights.

To authorize Amazon Bedrock to access the files on the S3 bucket, I select the option to create and use a new AWS Identity and Access Management (IAM) service role. I use the View permissions details link to check what will be in the role. Then, I submit the job.

About ten minutes later, the import job is completed.

Now, I see the imported model in the console. The list also shows the model Amazon Resource Name (ARN) and the creation date.

I choose the model to get more information, such as the S3 location of the model files.

In the model detail page, I choose Open in playground to test the model in the console. In the text playground, I type a question using the prompt template of the model:

<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>

The MistralLite imported model is quick to reply and describe some of those challenges.

In the playground, I can tune responses for my use case using configurations such as temperature and maximum length or add stop sequences specific to the imported model.

To see the syntax of the API request, I choose the three small vertical dots at the top right of the playground.

I choose View API syntax and run the command using the AWS Command Line Interface (AWS CLI):

aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-east-1:123412341234:imported-model/a82bkefgp20f \
--body "{\"prompt\":\"<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>\",\"max_tokens\":512,\"top_k\":200,\"top_p\":0.9,\"stop\":[],\"temperature\":0.5}" \
--cli-binary-format raw-in-base64-out \
--region us-east-1 \
invoke-model-output.txt

The output is similar to what I got in the playground. As you can see, for imported models, the model ID is the ARN of the imported model. I can use the model ID to invoke the imported model with the AWS CLI and AWS SDKs.

Things to know
You can bring your own weights for supported model architectures to Amazon Bedrock in the US East (N. Virginia) AWS Region. The model import capability is currently available in preview.

When using custom weights, Amazon Bedrock serves the model with On-Demand mode, and you only pay for what you use with no time-based term commitments. For detailed information, see Amazon Bedrock pricing.

The ability to import models is managed using AWS Identity and Access Management (IAM), and you can allow this capability only to the roles in your organization that need to have it.

With this launch, it’s now easier to build and scale generative AI applications using custom models with security and privacy built in.

To learn more:

See the Amazon Bedrock User Guide.
Visit our community.aws site to find deep-dive technical content and discover how others are using Amazon Bedrock in their solutions.

— Danilo

Run large-scale simulations with AWS Batch multi-container jobs

2024-03-25 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/run-large-scale-simulations-with-aws-batch-multi-container-jobs/

Industries like automotive, robotics, and finance are increasingly implementing computational workloads like simulations, machine learning (ML) model training, and big data analytics to improve their products. For example, automakers rely on simulations to test autonomous driving features, robotics companies train ML algorithms to enhance robot perception capabilities, and financial firms run in-depth analyses to better manage risk, process transactions, and detect fraud.

Some of these workloads, including simulations, are especially complicated to run due to their diversity of components and intensive computational requirements. A driving simulation, for instance, involves generating 3D virtual environments, vehicle sensor data, vehicle dynamics controlling car behavior, and more. A robotics simulation might test hundreds of autonomous delivery robots interacting with each other and other systems in a massive warehouse environment.

AWS Batch is a fully managed service that can help you run batch workloads across a range of AWS compute offerings, including Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Fargate, and Amazon EC2 Spot or On-Demand Instances. Traditionally, AWS Batch only allowed single-container jobs and required extra steps to merge all components into a monolithic container. It also did not allow using separate “sidecar” containers, which are auxiliary containers that complement the main application by providing additional services like data logging. This additional effort required coordination across multiple teams, such as software development, IT operations, and quality assurance (QA), because any code change meant rebuilding the entire container.

Now, AWS Batch offers multi-container jobs, making it easier and faster to run large-scale simulations in areas like autonomous vehicles and robotics. These workloads are usually divided between the simulation itself and the system under test (also known as an agent) that interacts with the simulation. These two components are often developed and optimized by different teams. With the ability to run multiple containers per job, you get the advanced scaling, scheduling, and cost optimization offered by AWS Batch, and you can use modular containers representing different components like 3D environments, robot sensors, or monitoring sidecars. In fact, customers such as IPG Automotive, MORAI, and Robotec.ai are already using AWS Batch multi-container jobs to run their simulation software in the cloud.

Let’s see how this works in practice using a simplified example and have some fun trying to solve a maze.

Building a Simulation Running on Containers
In production, you will probably use existing simulation software. For this post, I built a simplified version of an agent/model simulation. If you’re not interested in code details, you can skip this section and go straight to how to configure AWS Batch.

For this simulation, the world to explore is a randomly generated 2D maze. The agent has the task to explore the maze to find a key and then reach the exit. In a way, it is a classic example of pathfinding problems with three locations.

Here’s a sample map of a maze where I highlighted the start (S), end (E), and key (K) locations.

The separation of agent and model into two separate containers allows different teams to work on each of them separately. Each team can focus on improving their own part, for example, to add details to the simulation or to find better strategies for how the agent explores the maze.

Here’s the code of the maze model (app.py). I used Python for both examples. The model exposes a REST API that the agent can use to move around the maze and know if it has found the key and reached the exit. The maze model uses Flask for the REST API.

import json
import random
from flask import Flask, request, Response

ready = False

# How map data is stored inside a maze
# with size (width x height) = (4 x 3)
#
#    012345678
# 0: +-+-+ +-+
# 1: | |   | |
# 2: +-+ +-+-+
# 3: | |   | |
# 4: +-+-+ +-+
# 5: | | | | |
# 6: +-+-+-+-+
# 7: Not used

class WrongDirection(Exception):
    pass

class Maze:
    UP, RIGHT, DOWN, LEFT = 0, 1, 2, 3
    OPEN, WALL = 0, 1
    

    @staticmethod
    def distance(p1, p2):
        (x1, y1) = p1
        (x2, y2) = p2
        return abs(y2-y1) + abs(x2-x1)


    @staticmethod
    def random_dir():
        return random.randrange(4)


    @staticmethod
    def go_dir(x, y, d):
        if d == Maze.UP:
            return (x, y - 1)
        elif d == Maze.RIGHT:
            return (x + 1, y)
        elif d == Maze.DOWN:
            return (x, y + 1)
        elif d == Maze.LEFT:
            return (x - 1, y)
        else:
            raise WrongDirection(f"Direction: {d}")


    def __init__(self, width, height):
        self.width = width
        self.height = height        
        self.generate()
        

    def area(self):
        return self.width * self.height
        

    def min_lenght(self):
        return self.area() / 5
    

    def min_distance(self):
        return (self.width + self.height) / 5
    

    def get_pos_dir(self, x, y, d):
        if d == Maze.UP:
            return self.maze[y][2 * x + 1]
        elif d == Maze.RIGHT:
            return self.maze[y][2 * x + 2]
        elif d == Maze.DOWN:
            return self.maze[y + 1][2 * x + 1]
        elif d ==  Maze.LEFT:
            return self.maze[y][2 * x]
        else:
            raise WrongDirection(f"Direction: {d}")


    def set_pos_dir(self, x, y, d, v):
        if d == Maze.UP:
            self.maze[y][2 * x + 1] = v
        elif d == Maze.RIGHT:
            self.maze[y][2 * x + 2] = v
        elif d == Maze.DOWN:
            self.maze[y + 1][2 * x + 1] = v
        elif d ==  Maze.LEFT:
            self.maze[y][2 * x] = v
        else:
            WrongDirection(f"Direction: {d}  Value: {v}")


    def is_inside(self, x, y):
        return 0 <= y < self.height and 0 <= x < self.width


    def generate(self):
        self.maze = []
        # Close all borders
        for y in range(0, self.height + 1):
            self.maze.append([Maze.WALL] * (2 * self.width + 1))
        # Get a random starting point on one of the borders
        if random.random() < 0.5:
            sx = random.randrange(self.width)
            if random.random() < 0.5:
                sy = 0
                self.set_pos_dir(sx, sy, Maze.UP, Maze.OPEN)
            else:
                sy = self.height - 1
                self.set_pos_dir(sx, sy, Maze.DOWN, Maze.OPEN)
        else:
            sy = random.randrange(self.height)
            if random.random() < 0.5:
                sx = 0
                self.set_pos_dir(sx, sy, Maze.LEFT, Maze.OPEN)
            else:
                sx = self.width - 1
                self.set_pos_dir(sx, sy, Maze.RIGHT, Maze.OPEN)
        self.start = (sx, sy)
        been = [self.start]
        pos = -1
        solved = False
        generate_status = 0
        old_generate_status = 0                    
        while len(been) < self.area():
            (x, y) = been[pos]
            sd = Maze.random_dir()
            for nd in range(4):
                d = (sd + nd) % 4
                if self.get_pos_dir(x, y, d) != Maze.WALL:
                    continue
                (nx, ny) = Maze.go_dir(x, y, d)
                if (nx, ny) in been:
                    continue
                if self.is_inside(nx, ny):
                    self.set_pos_dir(x, y, d, Maze.OPEN)
                    been.append((nx, ny))
                    pos = -1
                    generate_status = len(been) / self.area()
                    if generate_status - old_generate_status > 0.1:
                        old_generate_status = generate_status
                        print(f"{generate_status * 100:.2f}%")
                    break
                elif solved or len(been) < self.min_lenght():
                    continue
                else:
                    self.set_pos_dir(x, y, d, Maze.OPEN)
                    self.end = (x, y)
                    solved = True
                    pos = -1 - random.randrange(len(been))
                    break
            else:
                pos -= 1
                if pos < -len(been):
                    pos = -1
                    
        self.key = None
        while(self.key == None):
            kx = random.randrange(self.width)
            ky = random.randrange(self.height)
            if (Maze.distance(self.start, (kx,ky)) > self.min_distance()
                and Maze.distance(self.end, (kx,ky)) > self.min_distance()):
                self.key = (kx, ky)


    def get_label(self, x, y):
        if (x, y) == self.start:
            c = 'S'
        elif (x, y) == self.end:
            c = 'E'
        elif (x, y) == self.key:
            c = 'K'
        else:
            c = ' '
        return c

                    
    def map(self, moves=[]):
        map = ''
        for py in range(self.height * 2 + 1):
            row = ''
            for px in range(self.width * 2 + 1):
                x = int(px / 2)
                y = int(py / 2)
                if py % 2 == 0: #Even rows
                    if px % 2 == 0:
                        c = '+'
                    else:
                        v = self.get_pos_dir(x, y, self.UP)
                        if v == Maze.OPEN:
                            c = ' '
                        elif v == Maze.WALL:
                            c = '-'
                else: # Odd rows
                    if px % 2 == 0:
                        v = self.get_pos_dir(x, y, self.LEFT)
                        if v == Maze.OPEN:
                            c = ' '
                        elif v == Maze.WALL:
                            c = '|'
                    else:
                        c = self.get_label(x, y)
                        if c == ' ' and [x, y] in moves:
                            c = '*'
                row += c
            map += row + '\n'
        return map


app = Flask(__name__)

@app.route('/')
def hello_maze():
    return "<p>Hello, Maze!</p>"

@app.route('/maze/map', methods=['GET', 'POST'])
def maze_map():
    if not ready:
        return Response(status=503, retry_after=10)
    if request.method == 'GET':
        return '<pre>' + maze.map() + '</pre>'
    else:
        moves = request.get_json()
        return maze.map(moves)

@app.route('/maze/start')
def maze_start():
    if not ready:
        return Response(status=503, retry_after=10)
    start = { 'x': maze.start[0], 'y': maze.start[1] }
    return json.dumps(start)

@app.route('/maze/size')
def maze_size():
    if not ready:
        return Response(status=503, retry_after=10)
    size = { 'width': maze.width, 'height': maze.height }
    return json.dumps(size)

@app.route('/maze/pos/<int:y>/<int:x>')
def maze_pos(y, x):
    if not ready:
        return Response(status=503, retry_after=10)
    pos = {
        'here': maze.get_label(x, y),
        'up': maze.get_pos_dir(x, y, Maze.UP),
        'down': maze.get_pos_dir(x, y, Maze.DOWN),
        'left': maze.get_pos_dir(x, y, Maze.LEFT),
        'right': maze.get_pos_dir(x, y, Maze.RIGHT),

    }
    return json.dumps(pos)


WIDTH = 80
HEIGHT = 20
maze = Maze(WIDTH, HEIGHT)
ready = True

The only requirement for the maze model (in requirements.txt) is the Flask module.

To create a container image running the maze model, I use this Dockerfile.

FROM --platform=linux/amd64 public.ecr.aws/docker/library/python:3.12-alpine

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

COPY . .

CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=5555"]

Here’s the code for the agent (agent.py). First, the agent asks the model for the size of the maze and the starting position. Then, it applies its own strategy to explore and solve the maze. In this implementation, the agent chooses its route at random, trying to avoid following the same path more than once.

import random
import requests
from requests.adapters import HTTPAdapter, Retry

HOST = '127.0.0.1'
PORT = 5555

BASE_URL = f"http://{HOST}:{PORT}/maze"

UP, RIGHT, DOWN, LEFT = 0, 1, 2, 3
OPEN, WALL = 0, 1

s = requests.Session()

retries = Retry(total=10,
                backoff_factor=1)

s.mount('http://', HTTPAdapter(max_retries=retries))

r = s.get(f"{BASE_URL}/size")
size = r.json()
print('SIZE', size)

r = s.get(f"{BASE_URL}/start")
start = r.json()
print('START', start)

y = start['y']
x = start['x']

found_key = False
been = set((x, y))
moves = [(x, y)]
moves_stack = [(x, y)]

while True:
    r = s.get(f"{BASE_URL}/pos/{y}/{x}")
    pos = r.json()
    if pos['here'] == 'K' and not found_key:
        print(f"({x}, {y}) key found")
        found_key = True
        been = set((x, y))
        moves_stack = [(x, y)]
    if pos['here'] == 'E' and found_key:
        print(f"({x}, {y}) exit")
        break
    dirs = list(range(4))
    random.shuffle(dirs)
    for d in dirs:
        nx, ny = x, y
        if d == UP and pos['up'] == 0:
            ny -= 1
        if d == RIGHT and pos['right'] == 0:
            nx += 1
        if d == DOWN and pos['down'] == 0:
            ny += 1
        if d == LEFT and pos['left'] == 0:
            nx -= 1 

        if nx < 0 or nx >= size['width'] or ny < 0 or ny >= size['height']:
            continue

        if (nx, ny) in been:
            continue

        x, y = nx, ny
        been.add((x, y))
        moves.append((x, y))
        moves_stack.append((x, y))
        break
    else:
        if len(moves_stack) > 0:
            x, y = moves_stack.pop()
        else:
            print("No moves left")
            break

print(f"Solution length: {len(moves)}")
print(moves)

r = s.post(f'{BASE_URL}/map', json=moves)

print(r.text)

s.close()

The only dependency of the agent (in requirements.txt) is the requests module.

This is the Dockerfile I use to create a container image for the agent.

FROM --platform=linux/amd64 public.ecr.aws/docker/library/python:3.12-alpine

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

COPY . .

CMD [ "python3", "agent.py"]

You can easily run this simplified version of a simulation locally, but the cloud allows you to run it at larger scale (for example, with a much bigger and more detailed maze) and to test multiple agents to find the best strategy to use. In a real-world scenario, the improvements to the agent would then be implemented into a physical device such as a self-driving car or a robot vacuum cleaner.

Running a simulation using multi-container jobs
To run a job with AWS Batch, I need to configure three resources:

The compute environment in which to run the job
The job queue in which to submit the job
The job definition describing how to run the job, including the container images to use

In the AWS Batch console, I choose Compute environments from the navigation pane and then Create. Now, I have the choice of using Fargate, Amazon EC2, or Amazon EKS. Fargate allows me to closely match the resource requirements that I specify in the job definitions. However, simulations usually require access to a large but static amount of resources and use GPUs to accelerate computations. For this reason, I select Amazon EC2.

I select the Managed orchestration type so that AWS Batch can scale and configure the EC2 instances for me. Then, I enter a name for the compute environment and select the service-linked role (that AWS Batch created for me previously) and the instance role that is used by the ECS container agent (running on the EC2 instances) to make calls to the AWS API on my behalf. I choose Next.

In the Instance configuration settings, I choose the size and type of the EC2 instances. For example, I can select instance types that have GPUs or use the Graviton processor. I do not have specific requirements and leave all the settings to their default values. For Network configuration, the console already selected my default VPC and the default security group. In the final step, I review all configurations and complete the creation of the compute environment.

Now, I choose Job queues from the navigation pane and then Create. Then, I select the same orchestration type I used for the compute environment (Amazon EC2). In the Job queue configuration, I enter a name for the job queue. In the Connected compute environments dropdown, I select the compute environment I just created and complete the creation of the queue.

I choose Job definitions from the navigation pane and then Create. As before, I select Amazon EC2 for the orchestration type.

To use more than one container, I disable the Use legacy containerProperties structure option and move to the next step. By default, the console creates a legacy single-container job definition if there’s already a legacy job definition in the account. That’s my case. For accounts without legacy job definitions, the console has this option disabled.

I enter a name for the job definition. Then, I have to think about which permissions this job requires. The container images I want to use for this job are stored in Amazon ECR private repositories. To allow AWS Batch to download these images to the compute environment, in the Task properties section, I select an Execution role that gives read-only access to the ECR repositories. I don’t need to configure a Task role because the simulation code is not calling AWS APIs. For example, if my code was uploading results to an Amazon Simple Storage Service (Amazon S3) bucket, I could select here a role giving permissions to do so.

In the next step, I configure the two containers used by this job. The first one is the maze-model. I enter the name and the image location. Here, I can specify the resource requirements of the container in terms of vCPUs, memory, and GPUs. This is similar to configuring containers for an ECS task.

I add a second container for the agent and enter name, image location, and resource requirements as before. Because the agent needs to access the maze as soon as it starts, I use the Dependencies section to add a container dependency. I select maze-model for the container name and START as the condition. If I don’t add this dependency, the agent container can fail before the maze-model container is running and able to respond. Because both containers are flagged as essential in this job definition, the overall job would terminate with a failure.

I review all configurations and complete the job definition. Now, I can start a job.

In the Jobs section of the navigation pane, I submit a new job. I enter a name and select the job queue and the job definition I just created.

In the next steps, I don’t need to override any configuration and create the job. After a few minutes, the job has succeeded, and I have access to the logs of the two containers.

The agent solved the maze, and I can get all the details from the logs. Here’s the output of the job to see how the agent started, picked up the key, and then found the exit.

SIZE {'width': 80, 'height': 20}
START {'x': 0, 'y': 18}
(32, 2) key found
(79, 16) exit
Solution length: 437
[(0, 18), (1, 18), (0, 18), ..., (79, 14), (79, 15), (79, 16)]

In the map, the red asterisks (*) follow the path used by the agent between the start (S), key (K), and exit (E) locations.

Increasing observability with a sidecar container
When running complex jobs using multiple components, it helps to have more visibility into what these components are doing. For example, if there is an error or a performance problem, this information can help you find where and what the issue is.

To instrument my application, I use AWS Distro for OpenTelemetry:

First, I update the agent and maze-model containers to use Python auto-instrumentation as described in this article. There are similar tutorials for other platforms, such as Go, Java, and JavaScript. To get information specific to my application, I have the option to manually instrument the code.
Then, I add a sidecar container using the AWS Distro for OpenTelemetry Collector from the Amazon ECR Public Gallery. The collector container will receive telemetry data from the other containers and send traces to AWS X-Ray and metrics to Amazon CloudWatch, Amazon Managed Service for Prometheus, or self-managed Prometheus. OpenTelemetry makes it easier to share data with multiple monitoring and observability platforms.

Using telemetry data collected in this way, I can set up dashboards (for example, using CloudWatch or Amazon Managed Grafana) and alarms (with CloudWatch or Prometheus) that help me better understand what is happening and reduce the time to solve an issue. More generally, a sidecar container can help integrate telemetry data from AWS Batch jobs with your monitoring and observability platforms.

Things to know
AWS Batch support for multi-container jobs is available today in the AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs in all AWS Regions where Batch is offered. For more information, see the AWS Services by Region list.

There is no additional cost for using multi-container jobs with AWS Batch. In fact, there is no additional charge for using AWS Batch. You only pay for the AWS resources you create to store and run your application, such as EC2 instances and Fargate containers. To optimize your costs, you can use Reserved Instances, Savings Plan, EC2 Spot Instances, and Fargate in your compute environments.

Using multi-container jobs accelerates development times by reducing job preparation efforts and eliminates the need for custom tooling to merge the work of multiple teams into a single container. It also simplifies DevOps by defining clear component responsibilities so that teams can quickly identify and fix issues in their own areas of expertise without distraction.

To learn more, see how to set up multi-container jobs in the AWS Batch User Guide.

— Danilo

AWS Weekly Roundup — New models for Amazon Bedrock, CloudFront embedded POPs, and more — March 4, 2024

2024-03-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-models-for-amazon-bedrock-cloudfront-embedded-pops-and-more-march-4-2024/

This has been a busy week – we introduced a new kind of Amazon CloudFront infrastructure, more efficient ways to analyze data stored on Amazon Simple Storage Service (Amazon S3), and new generative AI capabilities.

Last week’s launches
Here’s what got my attention:

Amazon Bedrock – Mistral AI’s Mixtral 8x7B and Mistral 7B foundation models are now generally available on Amazon Bedrock. More details in Donnie’s post. Here’s a deep dive into Mistral 7B and Mixtral 8x7B models, by my colleague Mike.

Knowledge Bases for Amazon Bedrock – With hybrid search support, you can improve the relevance of retrieved results, especially for keyword searches. More information and examples in this post on the AWS Machine Learning Blog.

Amazon CloudFront – We announced the availability of embedded Points of Presence (POPs), a new type of CloudFront infrastructure deployed closest to end viewers, within internet service provider (ISP) and mobile network operator (MNO) networks. Embedded POPs are custom-built to deliver large scale live-stream video, video-on-demand (VOD), and game downloads. Today, CloudFront has 600+ embedded POPs deployed across 200+ cities globally.

Amazon Kinesis Data Streams – To help you analyze and visualize the data in your streams in real-time, you can now run SQL queries with one click in the AWS Management Console.

Amazon EventBridge – API destinations now supports content-type header customization. By defining your own content-type, you can unlock more HTTP targets for API destinations, including support for CloudEvents. Read more in this X/Twitter thread by Nik, principal engineer at AWS Lambda.

Amazon MWAA – You can now create Apache Airflow version 2.8 environments on Amazon Managed Workflows for Apache Airflow (MWAA). More in this AWS Big Data blog post.

Amazon CloudWatch Logs – With CloudWatch Logs support for IPv6, you can simplify your network stack by running Amazon CloudWatch log groups on a dual-stack network that supports both IPv4 and IPv6. You can find more information on AWS services that support IPv6 in the documentation.

SQL Workbench for Amazon DynamoDB – As you use this client-side application to help you visualize and build scalable, high-performance data models, you can now clone tables between development environments. With this feature, you can develop and test your code with Amazon DynamoDB tables in the same state across multiple development environments.

AWS Cloud Development Kit (AWS CDK) – The new AWS AppConfig Level 2 (L2) constructs simplify provisioning of AWS AppConfig resources, including feature flags and dynamic configuration data.

Amazon Location Service – You can now use the authentication libraries for iOS and Android platforms to simplify the integration of Amazon Location Service into mobile apps. The libraries support API key and Amazon Cognito authentication.

Amazon SageMaker – You can now accelerate Amazon SageMaker Model Training using the Amazon S3 Express One Zone storage class to gain faster load times for training data, checkpoints, and model outputs. S3 Express One Zone is purpose-built to deliver the fastest cloud object storage for performance-critical applications, and delivers consistent single-digit millisecond request latency and high throughput.

Amazon Data Firehose – Now supports message extraction for CloudWatch Logs. CloudWatch log records use a nested JSON structure, and the message in each record is embedded within header information. It’s now easier to filter out the header information and deliver only the embedded message to the destination, reducing the cost of subsequent processing and storage.

Amazon OpenSearch – Terraform now supports Amazon OpenSearch Ingestion deployments, a fully managed data ingestion tier for Amazon OpenSearch Service that allows you to ingest and process petabyte-scale data before indexing it in Amazon OpenSearch-managed clusters and serverless collections. Read more in this AWS Big Data blog post.

AWS Mainframe Modernization – AWS Blu Age Runtime is now available for seamless deployment on Amazon ECS on AWS Fargate to run modernized applications in serverless containers.

AWS Local Zones – A new Local Zone in Atlanta helps applications that require single-digit millisecond latency for use cases such as real-time gaming, hybrid migrations, media and entertainment content creation, live video streaming, engineering simulations, and more.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, programs, and news items that you might find interesting.

The PartyRock Hackathon is closing this month, and there is still time to join and make apps without code! Here’s the screenshot of a quick app that I built to help me plan what to do when I visit a new place.

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock – A very interesting use case for generative AI.

Here’s a complete solution to build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources.

A nice overview of .NET 8 Support on AWS, the latest Long Term Support (LTS) version of cross-platform .NET.

Introducing the AWS WAF traffic overview dashboard – A new tool to help you make informed decisions about your security posture for applications protected by AWS WAF.

Some tips on how to improve the speed and cost of high performance computing (HPC) deployment with Mountpoint for Amazon S3, an open source file client that you can use to mount an S3 bucket on your compute instances, accessing it as a local file system.

My colleague Ricardo writes this weekly open source newsletter, in which he highlights new open source projects, tools, and demos from the AWS Community.

Upcoming AWS events
You can feel it in the air–the AWS Summits season is coming back! The first ones will be in Europe, you can join us in Paris (April 3), Amsterdam (April 9), and London (April 24). On March 12, you can meet public sector industry leaders and AWS experts at the AWS Public Sector Symposium in Brussels.

AWS Innovate are an online events designed to help you develop the right skills to design, deploy, and operate infrastructure and applications. AWS Innovate Generative AI + Data Edition for Americas is on March 14. It follows the ones for Asia Pacific & Japan and EMEA that we held in February.

There are still a few AWS Community re:Invent re:Cap events organized by volunteers from AWS User Groups and AWS Cloud Clubs around the world to learn about the latest announcements from AWS re:Invent.

You can browse all upcoming in-person and virtual events here.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS.

DNS over HTTPS is now available in Amazon Route 53 Resolver

2023-12-20 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/dns-over-https-is-now-available-in-amazon-route-53-resolver/

Starting today, Amazon Route 53 Resolver supports using the DNS over HTTPS (DoH) protocol for both inbound and outbound Resolver endpoints. As the name suggests, DoH supports HTTP or HTTP/2 over TLS to encrypt the data exchanged for Domain Name System (DNS) resolutions.

Using TLS encryption, DoH increases privacy and security by preventing eavesdropping and manipulation of DNS data as it is exchanged between a DoH client and the DoH-based DNS resolver.

This helps you implement a zero-trust architecture where no actor, system, network, or service operating outside or within your security perimeter is trusted and all network traffic is encrypted. Using DoH also helps follow recommendations such as those described in this memorandum of the US Office of Management and Budget (OMB).

DNS over HTTPS support in Amazon Route 53 Resolver
You can use Amazon Route 53 Resolver to resolve DNS queries in hybrid cloud environments. For example, it allows AWS services access for DNS requests from anywhere within your hybrid network. To do so, you can set up inbound and outbound Resolver endpoints:

Inbound Resolver endpoints allow DNS queries to your VPC from your on-premises network or another VPC.
Outbound Resolver endpoints allow DNS queries from your VPC to your on-premises network or another VPC.

After you configure the Resolver endpoints, you can set up rules that specify the name of the domains for which you want to forward DNS queries from your VPC to an on-premises DNS resolver (outbound) and from on-premises to your VPC (inbound).

Now, when you create or update an inbound or outbound Resolver endpoint, you can specify which protocols to use:

DNS over port 53 (Do53), which is using either UDP or TCP to send the packets.
DNS over HTTPS (DoH), which is using TLS to encrypt the data.
Both, depending on which one is used by the DNS client.
For FIPS compliance, there is a specific implementation (DoH-FIPS) for inbound endpoints.

Let’s see how this works in practice.

Using DNS over HTTPS with Amazon Route 53 Resolver
In the Route 53 console, I choose Inbound endpoints from the Resolver section of the navigation pane. There, I choose Create inbound endpoint.

I enter a name for the endpoint, select the VPC, the security group, and the endpoint type (IPv4, IPv6, or dual-stack). To allow using both encrypted and unencrypted DNS resolutions, I select Do53, DoH, and DoH-FIPS in the Protocols for this endpoint option.

After that, I configure the IP addresses for DNS queries. I select two Availability Zones and, for each, a subnet. For this setup, I use the option to have the IP addresses automatically selected from those available in the subnet.

After I complete the creation of the inbound endpoint, I configure the DNS server in my network to forward requests for the amazonaws.com domain (used by AWS service endpoints) to the inbound endpoint IP addresses.

Similarly, I create an outbound Resolver endpoint and and select both Do53 and DoH as protocols. Then, I create forwarding rules that tell for which domains the outbound Resolver endpoint should forward requests to the DNS servers in my network.

Now, when the DNS clients in my hybrid environment use DNS over HTTPS in their requests, DNS resolutions are encrypted. Optionally, I can enforce encryption and select only DoH in the configuration of inbound and outbound endpoints.

Things to know
DNS over HTTPS support for Amazon Route 53 Resolver is available today in all AWS Regions where Route 53 Resolver is offered, including GovCloud Regions and Regions based in China.

DNS over port 53 continues to be the default for inbound or outbound Resolver endpoints. In this way, you don’t need to update your existing automation tooling unless you want to adopt DNS over HTTPS.

There is no additional cost for using DNS over HTTPS with Resolver endpoints. For more information, see Route 53 pricing.

Start using DNS over HTTPS with Amazon Route 53 Resolver to increase privacy and security for your hybrid cloud environments.

— Danilo

Noise

All posts by Danilo Poccia

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

Amazon Bedrock Marketplace: Access over 100 foundation models in one place

Introducing Amazon Nova: Frontier intelligence and industry leading price performance

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

Introducing new PartyRock capabilities and free daily usage

Amazon FSx for Lustre increases throughput to GPU instances by up to 12x

Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon Bedrock

Introducing Llama 3.2 models from Meta in Amazon Bedrock: A new generation of multimodal vision and lightweight models

AWS Weekly Roundup: Amazon DynamoDB, AWS AppSync, Storage Browser for Amazon S3, and more (September 9, 2024)

Stability AI’s best image generating models now in Amazon Bedrock

Agents for Amazon Bedrock now support memory retention and code interpretation (preview)

AWS Weekly Roundup: Claude 3.5 Sonnet in Amazon Bedrock, CodeCatalyst updates, SageMaker with MLflow, and more (June 24, 2024)

Anthropic’s Claude 3.5 Sonnet model now available in Amazon Bedrock: Even more intelligence than Claude 3 Opus at one-fifth the cost

Simplify risk and compliance assessments with the new common control library in AWS Audit Manager

AWS Weekly Roundup: Amazon Bedrock, AWS CodeBuild, Amazon CodeCatalyst, and more (April 29, 2024)

Agents for Amazon Bedrock: Introducing a simplified creation and configuration experience

Import custom models in Amazon Bedrock (preview)

Run large-scale simulations with AWS Batch multi-container jobs

AWS Weekly Roundup — New models for Amazon Bedrock, CloudFront embedded POPs, and more — March 4, 2024

DNS over HTTPS is now available in Amazon Route 53 Resolver

The collective thoughts of the interwebz