Tag Archives: launch

Introducing Amazon Nova: Frontier intelligence and industry leading price performance

2024-12-03 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/

Today, we’re thrilled to announce Amazon Nova, a new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available exclusively in Amazon Bedrock.

You can use Amazon Nova to lower costs and latency for almost any generative AI task. You can build on Amazon Nova to analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents, from across a range of intelligence classes optimized for enterprise workloads.

Whether you’re developing document processing applications that need to process images and text, creating marketing content at scale, or building AI assistants that can understand and act on visual information, Amazon Nova provides the intelligence and flexibility you need with two categories of models: understanding and creative content generation.

Amazon Nova understanding models accept text, image, or video inputs to generate text output. Amazon creative content generation models accept text and image inputs to generate image or video output.

Understanding models: Text and visual intelligence
The Amazon Nova models include three understanding models (with a fourth one coming soon) designed to meet different needs:

Amazon Nova Micro – A text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat and brainstorming, and simple mathematical reasoning and coding. Amazon Nova Micro also supports customization on proprietary data using fine-tuning and model distillation to boost accuracy.

Amazon Nova Lite – A very low-cost multimodal model that is lightning fast for processing image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. The model processes inputs up to 300K tokens in length and can analyze multiple images or up to 30 minutes of video in a single request. Amazon Nova Lite also supports text and multimodal fine-tuning and can be optimized to deliver the best quality and costs for your use case with techniques such as model distillation.

Amazon Nova Pro – A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro is capable of processing up to 300K input tokens and sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. It achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and excels at analyzing financial documents. With an input context of 300K tokens, it can process code bases with over fifteen thousand lines of code. Amazon Nova Pro also serves as a teacher model to distill custom variants of Amazon Nova Micro and Lite.

Amazon Nova Premier – Our most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. We’re targeting availability in early 2025.

Amazon Nova understanding models excel in Retrieval-Augmented Generation (RAG), function calling, and agentic applications. This is reflected in Amazon Nova model scores in the Comprehensive RAG Benchmark (CRAG) evaluation, Berkeley Function Calling Leaderboard (BFCL), VisualWebBench, and Mind2Web.

What makes Amazon Nova particularly powerful for enterprises is its customization capabilities. Think of it as tailoring a suit: you start with a high-quality foundation and adjust it to fit your exact needs. You can fine-tune the models with text, image, and video to understand your industry’s terminology, align with your brand voice, and optimize for your specific use cases. For instance, a legal firm might customize Amazon Nova to better understand legal terminology and document structures.

You can see the latest benchmark scores for these models on the Amazon Nova product page.

Creative content generation: Bringing concepts to life
The Amazon Nova models also include two creative content generation models:

Amazon Nova Canvas – A state-of-the-art image generation model producing studio-quality images with precise control over style and content, including rich editing features such as inpainting, outpainting, and background removal. Amazon Nova Canvas excels on human evaluations and key benchmarks such as text-to-image faithfulness evaluation with question answering (TIFA) and ImageReward.

Amazon Nova Reel – A state-of-the-art video generation model. Using Amazon Nova Reel, you can produce short videos through text prompts and images, control visual style and pacing, and generate professional-quality video content for marketing, advertising, and entertainment. Amazon Nova Reel outperforms existing models on human evaluations of video quality and video consistency.

All Amazon Nova models include built-in safety controls and creative content generation models include watermarking capabilities to promote responsible AI use.

Let’s see how these models work in practice for a few use cases.

Using Amazon Nova Pro for document analysis
To demonstrate the capabilities of document analysis, I downloaded the Choosing a generative AI service decision guide in PDF format from the AWS documentation.

First, I choose Model access in the Amazon Bedrock console navigation pane and request access to the new Amazon Nova models. Then, I choose Chat/text in the Playground section of the navigation pane and select the Amazon Nova Pro model. In the chat, I upload the decision guide PDF and ask:

Write a summary of this doc in 100 words. Then, build a decision tree.

The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.

Using Amazon Nova Pro for video analysis
To demonstrate video analysis, I prepared a video by joining two short clips (more on this in the next section):

This time, I use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video:

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "the-sea.mp4"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
    video = f.read()

user_message = "Describe this video."

messages = [ { "role": "user", "content": [
    {"video": {"format": "mp4", "source": {"bytes": video}}},
    {"text": user_message}
] } ]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
    inferenceConfig={"temperature": 0.0}
 )

response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.

In the script, I ask to describe the video. I run the script from the command line. Here’s the result:

The video begins with a view of a rocky shore on the ocean, and then transitions to a close-up of a large seashell resting on a sandy beach.

I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.

Using Amazon Nova for video creation
Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image.

Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:

StartAsyncInvoke – To start an asynchronous invocation

GetAsyncInvoke – To get the current status of a specific asynchronous invocation

ListAsyncInvokes – To list the status of all asynchronous invocations with optional filters such as status or date

Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:

Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close.

After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.

import random
import time

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"

video_prompt = "Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close."

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {"text": video_prompt},
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)

if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

I run the script:

Status: InProgress
. . .
Status: Completed

Video is ready at s3://BUCKET/PREFIX/output.mp4

After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI):

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4

This is the resulting video. As requested, the camera zooms in on the subject.

Using Amazon Nova Reel with a reference image
To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:

This script uses the reference image and a text prompt with a camera action (drone view flying over a coastal landscape) to create a video:

import base64
import random
import time

import boto3

S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view flying over a coastal landscape"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
    input_image_bytes = f.read()
    input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")

model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {
        "text": video_prompt,
        "images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
        },
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"

print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)
if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Again, I download the output using the AWS CLI:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4

This is the resulting video. The camera starts from the reference image and moves forward.

Building AI responsibly
Amazon Nova models are built with a focus on customer safety, security, and trust throughout the model development stages, offering you peace of mind as well as an adequate level of control to enable your unique use cases.

We’ve built in comprehensive safety features and content moderation capabilities, giving you the controls you need to use AI responsibly. Every generated image and video include digital watermarking.

The Amazon Nova foundation models are built with protections that match its increased capabilities. Amazon Nova extends our safety measures to combat the spread of misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks.

Things to know
Amazon Nova models are available in Amazon Bedrock in the US East (N. Virginia) AWS region. Amazon Nova Micro, Lite, and Pro are also available in the US West (Oregon), and US East (Ohio) regions via cross-Region inference. As usual with Amazon Bedrock, the pricing follows a pay-as-you-go model. For more information, see Amazon Bedrock pricing.

The new generation of Amazon Nova understanding models speaks your language. These models understand and generate content in over 200 languages, with particularly strong capabilities in English, German, Spanish, French, Italian, Japanese, Korean, Arabic, Simplified Chinese, Russian, Hindi, Portuguese, Dutch, Turkish, and Hebrew. This means you can build truly global applications without worrying about language barriers or maintaining separate models for different regions. Amazon Nova models for creative content generation support English prompts.

As you explore Amazon Nova, you’ll discover its ability to handle increasingly complex tasks. You can use these models to process lengthy documents up to 300K tokens, analyze multiple images in a single request, understand up to 30 minutes of video content, and generate images and videos at scale from natural language. This makes these models suitable for a variety of business use cases, from quick customer service interactions to deep analysis of corporate documentation and asset creation for advertising, ecommerce, and social media applications.

Integration with Amazon Bedrock makes deployment and scaling straightforward. You can leverage features like Amazon Bedrock Knowledge Bases to enhance your model with proprietary information, use Amazon Bedrock Agents to automate complex workflows, and implement Amazon Bedrock Guardrails to promote responsible AI use. The platform supports real-time streaming for interactive applications, batch processing for high-volume workloads, and detailed monitoring to help you optimize performance.

Ready to start building with Amazon Nova? Give the new models a try in the Amazon Bedrock console today, visit the Amazon Nova models section of the Amazon Bedrock documentation, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new models!

— Danilo

Introducing multi-agent collaboration capability for Amazon Bedrock (preview)

2024-12-03 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/introducing-multi-agent-collaboration-capability-for-amazon-bedrock/

Today, we’re announcing the multi-agent collaboration capability for Amazon Bedrock (preview). With multi-agent collaboration, you can build, deploy, and manage multiple AI agents working together on complex multi-step tasks that require specialized skills.

When you need more than a single agent to handle a complex task, you can create additional specialized agents to address different aspects of the process. However, managing these agents becomes technically challenging as tasks grow in complexity. As a developer using open source solutions, you may find yourself navigating the complexities of agent orchestration, session handling, memory management, and other technical aspects that require manual implementation.

With the fully managed multi-agent collaboration capability on Amazon Bedrock, specialized agents work within their domains of expertise, coordinated by a supervisor agent. The supervisor breaks down requests, delegates tasks, and consolidates outputs into a final response. For example, an investment advisory multi-agent system might include agents specialized in financial data analysis, research, forecasting, and investment recommendations. Similarly, a retail operations multi-agent system could handle demand forecasting, inventory allocation, supply chain coordination, and pricing optimization.

Amazon Bedrock Agents manages the collaboration, communication, and task delegation behind the scenes. By enabling agents to work together, you can achieve higher task success rates, accuracy, and enhanced productivity. In internal benchmark testing, multi-agent collaboration has shown marked improvements compared to single-agent systems for handling complex, multi-step tasks.

Highlights of multi-agent collaboration in Amazon Bedrock
A key challenge in building eﬀective multi-agent collaboration systems is managing the complexity and overhead of coordinating multiple specialized agents at scale. Amazon Bedrock simplifies the process of building, deploying, and orchestrating effective multi-agent collaboration systems while addressing efficiency challenges through several key features and optimizations:

Quick setup – Create, deploy, and manage AI agents working together in minutes without the need for complex coding.
Composability – Integrate your existing agents as subagents within a larger agent system, allowing them to seamlessly work together to tackle complex workflows.
Efficient inter-agent communication – The supervisor agent can interact with subagents using a consistent interface, supporting parallel communication for more efficient task completion.
Optimized collaboration modes – Choose between supervisor mode and supervisor with routing mode. With routing mode, the supervisor agent will route simple requests directly to specialized subagents, bypassing full orchestration. For complex queries or when no clear intention is detected, it automatically falls back to the full supervisor mode, where the supervisor agent analyzes, breaks down problems, and coordinates multiple subagents as needed.
Integrated trace and debug console – Visualize and analyze multi-agent interactions behind the scenes using the integrated trace and debug console.

These features collectively improve coordination capabilities, communication speed, and overall effectiveness of the multi-agent collaboration framework in tackling complex, real-world problems.

Here’s how to get started.

Using multi-agent collaboration in Amazon Bedrock
For this demo, I create a social media campaign manager agent that’s composed of a content strategist agent creating posts and an engagement predictor agent optimizing their timing and reach. The following figure shows the team of agents that I’m creating and how multi-agent collaboration works in this scenario.

To get started, you can use the Amazon Bedrock console or APIs to create a supervisor agent and associate specialist subagents in just a few steps.

Create subagents
First, I create the two subagents using the existing agent builder workflow. I open the Amazon Bedrock console, select Agents in the left navigation panel, then choose Create Agent. I create one agent that I name content-strategist, an agent that generates creative social media content ideas. Note the new option to enable the agent for multi-agent collaboration. I leave this option unchecked for now; we need to enable this option later for the supervisor agent. Next, I choose Create.

In the Agent builder dialog box, I choose to create and use a new service role, select Anthropic’s Claude 3.5 Sonnet v2 as the model, and provide the following instructions for the agent:

You are a social media content strategist with expertise in converting business goals into engaging social posts. Your task is to generate creative, on-brand content ideas that align with specified campaign goals and target audience. Each suggestion should include a topic, content type (image/video/text/poll), specific copy, and relevant hashtags. Focus on variety, authenticity, and ensuring each post serves a strategic purpose.

I also create and attach a knowledge base that contains high-performing post templates. As with any other agent, you could also configure additional settings, such as action groups to perform tasks, enable code interpretation, or add guardrails. I leave all other settings to their defaults.

Then, I choose Save and exit.

I repeat the steps to create a second agent that I name engagement-predictor, an agent that predicts social media post performance and optimal posting times. For this agent, I provide the following instructions:

You are a social media analytics expert who predicts post performance and optimal timing. For each content idea, analyze potential reach and engagement based on content type, industry benchmarks, and audience behavior patterns. Your task is to estimate reach, engagement rate, and determine the best posting time (day/hour). Support each prediction with data-driven reasoning and industry-specific insights. Focus on actionable metrics that will maximize campaign impact.

I create and attach a knowledge base that contains platform-specific peak engagement times, industry benchmark metrics, and content performance multipliers for predicting and optimizing social media post performance. Again, I choose Save and exit.

I now have my two specialist subagents.

Before moving on, test each agent individually, and once you’ve confirmed their functionality, create an alias for each one. This approach will streamline the process of creating supervisor agents in the future.

Create supervisor agent and associate subagents
Next, I create the supervisor agent. I name this agent social-media-campaign-manager, an agent that combines the outputs from the content strategy agent and the engagement predictor agent into a comprehensive campaign plan.

This time, I turn on Enable Multi-agent collaboration before I choose Create.

In the Agent builder dialog box, I again choose to create and use a new service role, select Anthropic’s Claude 3.5 Sonnet v2 as the model, and provide the following instructions for the agent:

You are a strategic campaign manager who orchestrates social media campaigns from concept to execution.

I create and attach a knowledge base that contains a collection of proven campaign templates, content mix ratios, and cross-platform posting requirements.

Next, I scroll down to Multi-agent collaboration and choose Edit.

The option to turn on multi-agent collaboration should already be checked because I enabled this option when I started creating the agent.

Then, you can choose between two collaboration configurations that determine how information is handled across the agent’s team to coordinate a final response.

In Supervisor mode, the supervisor agent analyzes the input, breaking down complex problems or paraphrasing the request. It then invokes subagents either serially or in parallel, and it might consult knowledge bases or invoke action groups. After receiving responses from subagents, the supervisor agent processes them to determine if the problem is solved or if further action is needed.

Alternatively, in Supervisor with routing mode, the supervisor agent first attempts to route simple requests directly to a relevant subagent, whose response is then forwarded to the user. For complex or ambiguous inputs, the system switches to supervisor mode, where the supervisor agent breaks down the problem or asks follow-up questions before proceeding similarly to standard supervisor mode. This approach allows for efficient handling of both straightforward and complex queries within a single framework.

For my demo, I choose Supervisor mode.

As a last step, I associate the two subagents by adding each subagent in Agent collaborator. I provide a collaborator name for each agent and a collaborator instruction.

I select the content-strategist agent and provide the collaborator name content-strategist along with the following instruction:

You can invoke this agent for social media content strategy tasks such as converting business goals into engaging social posts. The agent generates creative, on-brand content ideas that align with specified campaign goals and target audience.

Then, I choose Add collaborator, select the engagement-predictor agent, and provide the collaborator name engagement-predictor along with the following instructions:

You can invoke this agent for social media analytics to predict post performance and optimal timing.

Note: Enable conversation history sharing allows the supervisor agent to pass the full context of a user interaction to subagents. This helps maintain coherence and avoid repeating questions, especially when routing or switching between agents. Keep in mind, it might confuse simpler subagents with complex task histories. We recommend enabling this feature when you need continuity and disabling it when you’re focusing on task simplification or using specialized agents. I keep it disabled for my demo.

Choose Save and complete the Agent builder workflow.

Let’s test it!

Test multi-agent collaboration
Prepare the social media campaign manager agent and choose Test.

I use the following input prompt:

Create a 2-week social campaign for EcoTech's new solar panel launch. Target: B2B (facility managers, sustainability directors) Key points: 30% more efficient, AI-optimized, 2-year ROI Need: 4 posts/week on LinkedIn/Twitter (40% educational, 30% product, 30% thought leadership).

After the response comes back, I choose Show trace to inspect the workflow. In the Multi-agent collaboration trace timeline, you can observe that each subagent got invoked. You can also inspect the trace steps to check the orchestration details.

You can find more examples of how to work with Amazon Bedrock Agents and the new multi-agent collaboration capability in the Amazon Bedrock Agent Samples GitHub repo.

Things to know

During preview, multi-agent collaboration supports real-time chat assistant (synchronous) use cases.
Subagents can have collaboration enabled themselves with an overall soft limit of three hierarchical agent team layers.

Join the preview
Multi-agent collaboration in Amazon Bedrock is available today in preview in all AWS Regions that support Amazon Bedrock Agents, except AWS GovCloud (US-West). Check the full Region list for future updates. To learn more, visit Amazon Bedrock Agents.

Give multi-agent collaboration a try in the Amazon Bedrock console today and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

I’m excited to see what you build with multi-agent collaboration.

— Antje

Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview)

2024-12-03 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/prevent-factual-errors-from-llm-hallucinations-with-mathematically-sound-automated-reasoning-checks-preview/

Today, we’re adding Automated Reasoning checks (preview) as a new safeguard in Amazon Bedrock Guardrails to help you mathematically validate the accuracy of responses generated by large language models (LLMs) and prevent factual errors from hallucinations.

Amazon Bedrock Guardrails lets you implement safeguards for generative AI applications by filtering undesirable content, redacting personal identifiable information (PII), and enhancing content safety and privacy. You can configure policies for denied topics, content filters, word filters, PII redaction, contextual grounding checks, and now Automated Reasoning checks.

Automated Reasoning checks help prevent factual errors from hallucinations using sound mathematical, logic-based algorithmic verification and reasoning processes to verify the information generated by a model, so outputs align with known facts and aren’t based on fabricated or inconsistent data.

Amazon Bedrock Guardrails is the only responsible AI capability offered by a major cloud provider that helps customers to build and customize safety, privacy, and truthfulness for their generative AI applications within a single solution.

Primer on automated reasoning
Automated reasoning is a field of computer science that uses mathematical proofs and logical deduction to verify the behavior of systems and programs. Automated reasoning differs from machine learning (ML), which makes predictions, in that it provides mathematical guarantees about a system’s behavior. Amazon Web Services (AWS) already uses automated reasoning in key service areas such as storage, networking, virtualization, identity, and cryptography. For example, automated reasoning is used to formally verify the correctness of cryptographic implementations, improving both performance and development speed. To learn more, check out Provable Security and the Automated reasoning research area in the Amazon Science Blog.

Now AWS is applying a similar approach to generative AI. The new Automated Reasoning checks (preview) in Amazon Bedrock Guardrails is the first and only generative AI safeguard that helps prevent factual errors due to hallucinations using logically accurate and verifiable reasoning that explains why generative AI responses are correct. Automated Reasoning checks are particularly useful for use cases where factual accuracy and explainability are important. For example, you could use Automated Reasoning checks to validate LLM-generated responses about human resources (HR) policies, company product information, or operational workflows.

Used alongside other techniques such as prompt engineering, Retrieval-Augmented Generation (RAG), and contextual grounding checks, Automated Reasoning checks add a more rigorous and verifiable approach to making sure that LLM-generated output is factually accurate. By encoding your domain knowledge into structured policies, you can have confidence that your conversational AI applications are providing reliable and trustworthy information to your users.

Using Automated Reasoning checks (preview) in Amazon Bedrock Guardrails
With Automated Reasoning checks in Amazon Bedrock Guardrails, you can create Automated Reasoning policies that encode your organization’s rules, procedures, and guidelines into a structured, mathematical format. These policies can then be used to verify that the content generated by your LLM-powered applications is consistent with your guidelines.

Automated Reasoning policies are composed of a set of variables, defined with a name, type, and description, and the logical rules that operate on the variables. Behind the scenes, rules are expressed in formal logic, but they’re translated to natural language to make it easier for a user without formal logic expertise to refine a model. Automated Reasoning checks uses the variable descriptions to extract their values when validating a Q&A.

Here’s how it works.

Create Automated Reasoning policies
Using the Amazon Bedrock console, you can upload documents that describe your organization’s rules and procedures. Amazon Bedrock will analyze these documents and automatically create an initial Automated Reasoning policy, which represents the key concepts and their relationships in a mathematical format.

Navigate to the new Automated Reasoning menu item in Safeguards. Create a new policy and give it a name. Upload an existing document that defines the right solution space, such as an HR guideline or an operational manual. For this demo, I’m using an example airline ticket policy document that includes the airline’s policies for ticket changes.

Then, define the policy’s intent and any processing parameters. For example, specify if it will validate airport staff inquiries and identify any elements to exclude from processing, such as internal reference numbers. Include one or more sample Q&As to help the system understand typical interactions.

Here’s my intent description:

Ignore the policy ID number, it's irrelevant. Airline employees will ask questions about whether customers are allowed to modify their tickets providing the customer details. Below is an example question:

QUESTION: I’m flying to Wonder City with Unicorn Airlines and noticed my last name is misspelled on the ticket, can modify it at the airport?
ANSWER: No. Changes to the spelling of the names on the ticket must be submitted via email within 24 hours of ticket purchase.

Then, choose Create.

The system now initiates an automated process to create your Automated Reasoning policy. This process involves analyzing your document, identifying key concepts, breaking down the document into individual units, translating these natural language units into formal logic, validating the translations, and finally combining them into a comprehensive logical model. Once complete, review the generated structure, including the rules and variables. You can edit these for accuracy through the user interface.

To test the Automated Reasoning policy, you first have to create a guardrail.

Create a guardrail and configure Automated Reasoning checks
When building your conversational AI application with Amazon Bedrock Guardrails, you can enable Automated Reasoning checks and specify which Automated Reasoning policies to use for validation.

Navigate to the Guardrails menu item in Safeguards. Create a new guardrail and give it a name. Choose Enable Automated Reasoning policy and select the policy and policy version you want to use. Then, complete your guardrail configuration.

Test Automated Reasoning checks
You can use the Test playground in the Automated Reasoning console to verify the effectiveness of your Automated Reasoning policy. Enter a test question just like a user of your application would, together with an example answer to validate.

For this demo, I enter an incorrect answer to see what will happen.

Question: I'm flying to Wonder City with Unicorn Airlines and noticed my last name is misspelled on the ticket, I'm currently in person at the airport, can I submit the change in person?

Answer: Yes. You are allowed to change names on tickets at any time, even in person at the airport.

Then, select the guardrail you’ve just created and choose Submit.

Automated Reasoning checks will analyze the content and validate it against the Automated Reasoning policies you’ve configured. The checks will identify any factual inaccuracies or inconsistencies and provide an explanation for the validation results.

In my demo, the Automated Reasoning checks correctly identified the response as Invalid. It shows which rule led to the finding, along with the extracted variables and suggestions.

When the validation result is invalid, the suggestions show a set of variable assignments that would make the conclusion valid. In my scenario, the suggestions show that the change submission method needs to be email for the validation result to be valid.

If no factual inaccuracies are detected and the validation result is Valid, suggestions show a list of assignments that are necessary for the result to hold; these are unstated assumptions in the answer. In my scenario, this might be assumptions such as that it’s the original ticket on which name corrections must be made or that the type of ticket stock is eligible for changes.

If factual inconsistencies are detected, the console will display Mixed results as the validation result. In the API response, you will see a list of findings, with some marked as valid and others as invalid. If this happens, review the system’s findings and suggestions and edit any unclear policy rules.

You can also use the validation results to enhance LLM-generated responses based on the feedback. For example, the following code snippet demonstrates how you can ask the model to regenerate its answer based on the received feedback:

for f in findings:
    if f.result == "INVALID":
        if f.rules is not None:
            for r in f.rules:
                feedback += f"<feedback>{r.description}</feedback>\n"

new_prompt = (
    "The answer you generated is inaccurate. Consider the feedback below within "
    f"<feedback> tags and rewrite your answer.\n\n{feedback}"
)

Achieving high validation accuracy is an iterative process. As a best practice, regularly review policy performance and adjust it as needed. You can edit rules in natural language and the system will automatically update the logical model.

For example, updating variable descriptions can significantly improve validation accuracy. Consider a scenario where a question states, “I’m a full-time employee…,” and the description of the is_full_time variable only states, “works more than 20 hours per week.” In this case, Automated Reasoning checks might not recognize the phrase “full-time.” To enhance accuracy, you should update the variable description to be more comprehensive, such as: “Works more than 20 hours per week. Users may refer to this as full-time or part-time. The value should be true for full-time and false for part-time.” This detailed description helps the system pick up all relevant factual claims for validation in natural language questions and answers, providing more accurate results.

Available in preview
The new Automated Reasoning checks safeguard is available today in preview in Amazon Bedrock Guardrails in the US West (Oregon) AWS Region. To request to be considered for access to the preview today, contact your AWS account team. In the next few weeks, look for a sign-up form in the Amazon Bedrock console. To learn more, visit Amazon Bedrock Guardrails.

— Antje

Build faster, more cost-efficient, highly accurate models with Amazon Bedrock Model Distillation (preview)

2024-12-03 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/build-faster-more-cost-efficient-highly-accurate-models-with-amazon-bedrock-model-distillation-preview/

Today, we’re announcing the availability of Amazon Bedrock Model Distillation in preview that automates the process of creating a distilled model for your specific use case by generating responses from a large foundation model (FM) called a teacher model and fine-tunes a smaller FM called a student model with the generated responses. It uses data synthesis techniques to improve response from a teacher model. Amazon Bedrock then hosts the final distilled model for inference giving you a faster and more cost-efficient model with accuracy close to the teacher model, for your use case.

Customers are excited to use the most powerful and accurate FMs on Amazon Bedrock for their generative AI applications. But for some use cases, the latency associated with these models isn’t ideal. In addition, customers are looking for better price performance as they scale their generative AI applications to many billions of user interactions. To reduce latency and be more cost-efficient for their use case, customers are turning to smaller models. However, for some use cases, smaller models can’t provide optimal accuracy. Fine-tuning models requires an additional skillset to create the high-quality labeled datasets to increase model accuracy for customer’s use cases.

With Amazon Bedrock Model Distillation, you can increase the accuracy of a smaller-sized student model to mimic a higher-performance teacher model with the process of knowledge transfer. You can create distilled models that for a certain use case, are up to five times faster and up to 75 percent less expensive than original large models, with less than two percent accuracy loss for use cases such as Retrieval Augmented Generation (RAG), by transferring knowledge from a teacher model of your choice to a student model in the same family.

How does it work?
Amazon Bedrock Model Distillation generates responses from teacher models, improves response generation from a teacher model by adding proprietary data synthesis, and fine-tunes a student model.

Amazon Bedrock employs various data synthesis techniques to enhance response generation from the teacher model and create high-quality fine-tuning datasets. These techniques are tailored to specific use cases. For instance, Amazon Bedrock may augment the training dataset by generating similar prompts, effectively increasing the volume of the fine-tuning dataset.

Alternatively, it can produce high-quality teacher responses by using provided prompt-response pairs as golden examples. At preview, Amazon Bedrock Model Distillation supports Anthropic, Meta, and Amazon models.

Get started with Amazon Bedrock Model Distillation
To get started, go to the Amazon Bedrock console and choose Custom models in the left navigation pane. Now you have three customization methods: Fine-tuning, Distillation, and Continued pre-training.

Choose Create Distillation job to start fine-tuning your model using model distillation.

Enter your distilled model name and job name.

Then, choose the teacher model and, based on your choice of the teacher model, select a student model from the list of available student models. The teacher and the student model must be from the same family. For example, if you choose Meta Llama 3.1 405B Instruct model as a teacher model, you can only choose either Llama 3.1 70B or 8B Instruct model as a student model.

To generate synthetic data, set the value of Max response length, an inference parameter to determine the response generated by the teacher model. Choose the distillation input dataset located in your Amazon Simple Storage Service (Amazon S3) bucket. This input dataset presents the prompts or golden prompt-response pairs for your use case. The input files must be in the dataset format according to your model. To learn more, visit Prepare the datasets in the Amazon Bedrock User Guide.

Then, choose Create Distillation job after setting up the Amazon S3 location to store the distillation output metrics data and permissions to write to Amazon S3 on your behalf.

After the distillation job is created successfully, you can track the training progress on the Jobs tab, and the model will be available on the Models tab.

Using production data with Amazon Bedrock Model Distillation
If you want to reuse your production data for distillation and skip generating teacher responses again, you do so by turning on model invocation logging to collect invocation logs, model input data, and model output data for all invocations in your AWS account used in Amazon Bedrock. Adding request metadata helps you to easily filter invocation logs at a later point.

request_params = {
    'modelId': 'meta.llama3-1-405b-instruct-v1:0',
    'messages': [
        {
            'role': 'user',
            'content': [
                {
                    "text": "What is model distillation in generative AI?"
                }
            ]
        }
    },
    'requestMetadata': {
    "ProjectName": "myLlamaDistilledModel",
    "CodeName": "myDistilledCode"
    }
}
response = bedrock_runtime_client.converse(**request_params)
pprint(response)
---
'output': {'message': {'content': [{'text': '\n''\n'
    'Model distillation is a technique in generative AI that involves training a smaller,'
    'more efficient model (the '"student") to mimic the behavior of a larger, '
    'more complex model '(the "teacher"). The goal of model distillation is to'
    'transfer the knowledge and capabilities of the teacher model to the student model,'
    'allowing the student to perform similarly well on a given task, but with much less computational'
    'resources and memory.\n'
    '\n'}]
    }
}

Next, when using Amazon Bedrock Model Distillation, select a teacher model whose accuracy you want to aim for your use case and a student model that you want to fine-tune. Then give access to Amazon Bedrock to read your invocation logs. Here, you can specify the request metadata filters so that only specific logs, which are valid for your use case, are read to fine-tune the student model. The teacher model selected for distillation and the model used in the invocation logs must be the same if you want Amazon Bedrock to reuse the responses from invocation logs.

Inference from your distilled model
Before using the distilled model, you need to purchase Provisioned Throughput for Amazon Bedrock and then use the resulting distilled model for inference. When you purchase Provisioned Throughput, you can select a commitment term, choose the number of model units, and check estimated hourly, daily, and monthly costs.

You can complete the model distillation job using AWS APIs, AWS SDKs, or the AWS Command Line Interface (AWS CLI). To learn more about using the AWS CLI, visit Code samples for model customization in the AWS documentation.

Things to know
Here are a few important things to know.

Model distillation aims to increase the accuracy of the student model to match the performance of the teacher model for your specific use case. Before you begin model distillation, we recommend that you evaluate different teacher models for your use case and select the teacher model that works well for your use case.
We recommend optimizing your prompts for your use case against which you find the teacher model accuracy to be acceptable. Submit these prompts as the distillation input data.
To choose a corresponding student model to fine-tune, evaluate the latency profiles of different student model options for your use case. The final distilled model will have the same latency profile as the student model that you select.
If a specific student model already performs well for your use case, then we recommend using the student model as is instead of creating a distilled model.

Join the preview!
Amazon Bedrock Model Distillation is now available in preview in the US East (N. Virginia) and US West (Oregon) AWS Regions. Check the full Region list for future updates. To learn more, visit Model Distillation in the Amazon Bedrock User Guide.

You pay the cost to generate synthetic data by the teacher model and the cost to fine-tune the student model during model distillation. After the distilled model is created, you pay the cost to store the distilled model monthly. Inference from the distilled model is charged under Provisioned Throughput per hour per model unit. To learn more, visit the Amazon Bedrock Pricing page.

Give Amazon Bedrock Model Distillation a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Introducing queryable object metadata for Amazon S3 buckets (preview)

2024-12-03 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/introducing-queryable-object-metadata-for-amazon-s3-buckets-preview/

AWS customers make use of Amazon Simple Storage Service (Amazon S3) at an incredible scale, regularly creating individual buckets that contain billions or trillions of objects! At that scale, finding the objects which meet particular criteria — objects with keys that match a pattern, objects of a particular size, or objects with a specific tag — becomes challenging. Our customers have had to build systems that capture, store, and query for this information. These systems can become complex and hard to scale, and can fall out of sync with the actual state of the bucket and the objects within.

Rich Metadata
Today we are enabling in preview automatic generation of metadata that is captured when S3 objects are added or modified, and stored in fully managed Apache Iceberg tables. This allows you to use Iceberg-compatible tools such as Amazon Athena, Amazon Redshift, Amazon QuickSight, and Apache Spark to easily and efficiently query the metadata (and find the objects of interest) at any scale. As a result, you can quickly find the data that you need for your analytics, data processing, and AI training workloads.

For video inference responses stored in S3, Amazon Bedrock will annotate the content it generates with metadata that will allow you to identify the content as AI-generated, and to know which model was used to generate it.

The metadata schema contains over 20 elements including the bucket name, object key, creation/modification time, storage class, encryption status, tags, and user metadata. You can also store additional, application-specific descriptive information in a separate table and then join it with the metadata table as part of your query.

How it Works
You can enable capture of rich metadata for any of your S3 buckets by specifying the location (an S3 table bucket and a table name) where you want the metadata to be stored. Capture of updates (object creations, object deletions, and changes to object metadata) begins right away and will be stored in the table within minutes. Each update generates a new row in the table, with a record type (CREATE, UPDATE_METADATA, or DELETE) and a sequence number. You can retrieve the historical record for a given object by running a query that orders the results by sequence number.

Enabling and Querying Metadata
I start by creating a table bucket for my metadata using the create-table-bucket command (this can also be done from the AWS Management Console or with an API call):

$ aws s3tables create-table-bucket --name jbarr-table-bucket-1 --region us-east-2
--------------------------------------------------------------------------------
|                               CreateTableBucket                              |
+-----+------------------------------------------------------------------------+
|  arn|  arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-1   |
+-----+------------------------------------------------------------------------+

Then I specify the table bucket (by ARN) and the desired table name by putting this JSON into a file (I’ll call it config.json):

{
  "S3TablesDestination": {
    "TableBucketArn": "arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-1",
    "TableName": "jbarr_data_bucket_1_table"
  }
}

And then I attach this configuration to my data bucket (the one that I want to capture metadata for):

$ aws s3tables create-bucket-metadata-table-configuration \
  --bucket jbarr-data-bucket-1 \
  --metadata-table-configuration file://./config.json \
  --region us-east-2

For testing purposes I installed Apache Spark on an EC2 instance and after a little bit of setup I was able to run queries by referencing the Amazon S3 Tables Catalog for Apache Iceberg package and adding the metadata table (as mytablebucket) to the command line:

$ bin/spark-shell \
--packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0 \
--jars ~/S3TablesCatalog.jar \
--master yarn \
--conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.sql.catalog.mytablebucket=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.mytablebucket.catalog-impl=com.amazon.s3tables.iceberg.S3TablesCatalog" \
--conf "spark.sql.catalog.mytablebucket.warehouse=arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-1"

Here is the current schema for the Iceberg table:

scala> spark.sql("describe table mytablebucket.aws_s3_metadata.jbarr_data_bucket_1_table").show(100,35)

+---------------------+------------------+-----------------------------------+
|             col_name|         data_type|                            comment|
+---------------------+------------------+-----------------------------------+
|               bucket|            string|   The general purpose bucket name.|
|                  key|            string|The object key name (or key) tha...|
|      sequence_number|            string|The sequence number, which is an...|
|          record_type|            string|The type of this record, one of ...|
|     record_timestamp|     timestamp_ntz|The timestamp that's associated ...|
|           version_id|            string|The object's version ID. When yo...|
|     is_delete_marker|           boolean|The object's delete marker statu...|
|                 size|            bigint|The object size in bytes, not in...|
|   last_modified_date|     timestamp_ntz|The object creation date or the ...|
|                e_tag|            string|The entity tag (ETag), which is ...|
|        storage_class|            string|The storage class that's used fo...|
|         is_multipart|           boolean|The object's upload type. If the...|
|    encryption_status|            string|The object's server-side encrypt...|
|is_bucket_key_enabled|           boolean|The object's S3 Bucket Key enabl...|
|          kms_key_arn|            string|The Amazon Resource Name (ARN) f...|
|   checksum_algorithm|            string|The algorithm that's used to cre...|
|          object_tags|map<string,string>|The object tags that are associa...|
|        user_metadata|map<string,string>|The user metadata that's associa...|
|            requester|            string|The AWS account ID of the reques...|
|    source_ip_address|            string|The source IP address of the req...|
|           request_id|            string|The request ID. For records that...|
+---------------------+------------------+-----------------------------------+

Here’s a simple query that shows some of the metadata for the ten most recent updates:

scala> spark.sql("SELECT key,size, storage_class,encryption_status \
  FROM mytablebucket.aws_s3_metadata.jbarr_data_bucket_1_table \
  order by last_modified_date DESC LIMIT 10").show(false)
+--------------------+------+-------------+-----------------+                   
|key                 |size  |storage_class|encryption_status|
+--------------------+------+-------------+-----------------+
|wnt_itco_2.png      |36923 |STANDARD     |SSE-S3           |
|wnt_itco_1.png      |37274 |STANDARD     |SSE-S3           |
|wnt_imp_new_1.png   |15361 |STANDARD     |SSE-S3           |
|wnt_imp_change_3.png|67639 |STANDARD     |SSE-S3           |
|wnt_imp_change_2.png|67639 |STANDARD     |SSE-S3           |
|wnt_imp_change_1.png|71182 |STANDARD     |SSE-S3           |
|wnt_email_top_4.png |135164|STANDARD     |SSE-S3           |
|wnt_email_top_2.png |117171|STANDARD     |SSE-S3           |
|wnt_email_top_3.png |55913 |STANDARD     |SSE-S3           |
|wnt_email_top_1.png |140937|STANDARD     |SSE-S3           |
+--------------------+------+-------------+-----------------+

In a real-world situation I would query the table using one of the AWS or open source analytics tools that I mentioned earlier.

Console Access
I can also set up and manage the metadata configuration for my buckets using the Amazon S3 Console by clicking the Metadata tab:

Available Now
Amazon S3 Metadata is available in preview now and you can start using it today in the US East (Ohio, N. Virginia) and US West (Oregon) AWS Regions.

Integration with AWS Glue Data Catalog is in preview, allowing you to query and visualize data—including S3 Metadata tables—using AWS Analytics services such as Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight.

Pricing is based on the number updates (object creations, object deletions, and changes to object metadata) with an additional charge for storage of the metadata table. For more pricing information, visit the S3 Pricing page.

I’m confident that you will be able to make use of this metadata in many powerful ways, and am looking forward to hearing about your use cases. Let me know what you think!

— Jeff;

New Amazon S3 Tables: Storage optimized for analytics workloads

2024-12-03 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/

Amazon S3 Tables give you storage that is optimized for tabular data such as daily purchase transactions, streaming sensor data, and ad impressions in Apache Iceberg format, for easy queries using popular query engines like Amazon Athena, Amazon EMR, and Apache Spark. When compared to self-managed table storage, you can expect up to 3x faster query performance and up to 10x more transactions per second, along with the operational efficiency that is part-and-parcel when you use a fully managed service.

Iceberg has become the most popular way to manage Parquet files, with thousands of AWS customers using Iceberg to query across often billions of files containing petabytes or even exabytes of data.

Table Buckets, Tables, and Namespaces
Table buckets are the third type of S3 bucket, taking their place alongside the existing general purpose and directory buckets. You can think of a table bucket as an analytics warehouse that can store Iceberg tables with various schemas. Additionally, S3 Tables deliver the same durability, availability, scalability, and performance characteristics as S3 itself, and automatically optimize your storage to maximize query performance and to minimize cost.

Each table bucket resides in a specific AWS Region and has a name that must be unique within the AWS account with respect to the region. Buckets are referenced by ARN and also have a resource policy. Finally, each bucket uses namespaces to logically group the tables in the bucket.

Tables are structured datasets stored in a table bucket. Like table buckets, they have ARNs and resource policies, and exist within one of the bucket’s namespaces. Tables are fully managed, with automatic, configurable continuous maintenance including compaction, management of aged snapshots, and removal of unreferenced files. Each table has an S3 API endpoint for storage operations.

Namespaces can be referenced from access policies in order to simplify access management.

Buckets and Tables from the Command Line
Ok, let’s dive right in, create a bucket, and put a table or two inside. I’ll use the AWS Command Line Interface (AWS CLI), but AWS Management Console and API support is also available. For conciseness, I will pipe the output of the more verbose commands through jq and show you only the most relevant values.

The first step is to create a table bucket:

$ aws s3tables create-table-bucket --name jbarr-table-bucket-2 | jq .arn
"arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-2"

For convenience, I create an environment variable with the ARN of the table bucket:

$ export ARN="arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-2"

And then I list my table buckets:

$ aws s3tables list-table-buckets | jq .tableBuckets[].arn
"arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-1"
"arn:aws:s3tables:us-east-2:123456789012:bucket/jbarr-table-bucket-2"

I can access and populate the table in many different ways. For testing purposes I installed Apache Spark, then invoked the Spark shell with command-line arguments to use the Amazon S3 Tables Catalog for Apache Iceberg package and to set mytablebucket to the ARN of my table.

I create a namespace (mydata) that I will use to group my tables:

scala> spark.sql("""CREATE NAMESPACE IF NOT EXISTS mytablebucket.mydata""")

Then I create a simple Iceberg table in the namespace:

spark.sql("""CREATE TABLE IF NOT EXISTS mytablebucket.mydata.table1
 (id INT,
  name STRING,
  value INT)
  USING iceberg
  """)

I use somes3tables commands to check my work:

$ aws s3tables list-namespaces --table-bucket-arn $ARN | jq .namespaces[].namespace[] 
"mydata"
$
$ aws s3tables list-tables --table-bucket-arn $ARN | jq .tables[].name
"table1"

Then I return to the Spark shell and add a few rows of data to my table:

spark.sql("""INSERT INTO mytablebucket.mydata.table1
  VALUES
  (1, 'Jeff', 100),
  (2, 'Carmen', 200),
  (3, 'Stephen', 300),
  (4, 'Andy', 400),
  (5, 'Tina', 500),
  (6, 'Bianca', 600),
  (7, 'Grace', 700)
  """)

Buckets and Tables from the Console
I can also create and work on table buckets using the S3 Console. I click Table buckets to get started:

Before creating my first bucket I click Enable integration so that I can access my table buckets from Amazon Athena, Amazon Redshift, Amazon EMR, and other AWS query engines (I can do this later if I don’t do it now):

I read the fine print and click Enable integration to create the specified IAM role and an entry in the AWS Glue Data Catalog:

After a few seconds the integration is enabled and I click Create table bucket to move ahead:

I enter a name (jbarr-table-bucket-3) and click Create table bucket:

From here I can create and use tables as I showed you earlier in the CLI section.

Table Maintenance
Table buckets take care of some important maintenance duties that would be your responsibility if you were creating and managing your own Iceberg tables. To relieve you of these duties so that you can spend more time on your table, the following maintenance operations are performed automatically:

Compaction – This process combines multiple small table objects into a larger object to improve query performance, in pursuit of a target file size that can be configured to be between 64 MiB and 512 MiB. The new object is rewritten as a new snapshot.

Snapshot Management – This process expires and ultimately removes table snapshots, with configuration options for the minimum number of snapshots to retain and the maximum age of a snapshot to retain. Expired snapshots are marked as non-current, then later deleted after a specified number of days.

Unreferenced File Removal – This process removes and deletes objects that are not referenced by any table snapshots.

Things to Know
Here are a couple of important things that you should know about table buckets and tables:

AWS Integration – S3 Tables integration with AWS Glue Data Catalog is in preview, allowing you to query and visualize data using AWS Analytics services such as Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight.

S3 API Support – Table buckets support relevant S3 API functions including GetObject, HeadObject, PutObject, and the multi-part upload operations.

Security – All objects stored in table buckets are automatically encrypted. Table buckets are configured to enforce Block Public Access.

Pricing – You pay for storage, requests, an object monitoring fee, and and fees for compaction. See the S3 Pricing page for more info.

Regions – You can use this new feature in the US East (Ohio, N. Virginia) and US West (Oregon) AWS Regions.

— Jeff;

Amazon EC2 Trn2 Instances and Trn2 UltraServers for AI/ML training and inference are now available

2024-12-03 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-ec2-trn2-instances-and-trn2-ultraservers-for-aiml-training-and-inference-is-now-available/

The new Amazon Elastic Compute Cloud (Amazon EC2) Trn2 instances and Trn2 UltraServers are the most powerful EC2 compute options for ML training and inference. Powered by the second generation of AWS Trainium chips (AWS Trainium2), the Trn2 instances are 4x faster, offer 4x more memory bandwidth, and 3x more memory capacity than the first-generation Trn1 instances. Trn2 instances offer 30-40% better price performance than the current generation of GPU-based EC2 P5e and P5en instances.

In addition to the 16 Trainium2 chips, each Trn2 instance features 192 vCPUs, 2 TiB of memory, and 3.2 Tbps of Elastic Fabric Adapter (EFA) v3 network bandwidth, which offers up to 50% lower latency than the previous generation.

The Trn2 UltraServers, which are a completely new compute offering, feature 64x Trainium2 chips connected with a high-bandwidth, low-latency NeuronLink interconnect, for peak inference and training performance on frontier foundation models.

Tens of thousands of Trainium chips are already powering Amazon and AWS services. For example, over 80,000 AWS Inferentia and Trainium1 chips supported the Rufus shopping assistant on the most recent Prime Day. Trainium2 chips are already powering the latency-optimized versions of Llama 3.1 405B and Claude 3.5 Haiku models on Amazon Bedrock.

Up and Out and Up
Sustained growth in the size and complexity of the frontier models is enabled by innovative forms of compute power, assembled into equally innovative architectural forms. In simpler times we could talk about architecting for scalability in two ways: scaling up (using a bigger computer) and scaling out (using more computers). Today, when I look at the Trainium2 chip, the Trn2 instance, and the even larger compute offerings that I will talk about in a minute, it seems like both models apply, but at different levels of the overall hierarchy. Let’s review the Trn2 building blocks, starting at the NeuronCore and scaling to an UltraCluster:

NeuronCores are at the heart of the Trainium2 chip. Each third-generation NeuronCore includes a scalar engine (1 input to 1 output), a vector engine (multiple inputs to multiple outputs), a tensor engine (systolic array multiplication, convolution, and transposition), and a GPSIMD (general purpose single instruction multiple data) core.

Each Trainium2 chip is home to eight NeuronCores and 96 GiB of High Bandwidth Memory (HBM), and supports 2.9 TB/second of HBM bandwidth. The cores can be addressed and used individually, or pairs of physical cores can be grouped into a single logical core. A single Trainium2 chip delivers up to 1.3 petaflops of dense FP8 compute and up to 5.2 petaflops of sparse FP8 compute, and can drive 95% utilization of memory bandwidth thanks to automated reordering of the HBM queue.

Each Trn2 instance is, in turn, home to 16 Trainum2 chips. That’s a total of 128 NeuronCores, 1.5 TiB of HBM, and 46 TB/second of HBM bandwidth. Altogether this multiplies out to up to 20.8 petaflops of dense FP8 compute and up to 83.2 petaflops of sparse FP8 compute. The Trainium2 chips are connected across NeuronLink in a 2D torus for high bandwidth, low latency chip-to-chip communication at 1 GB/second.

An UltraServer is home to four Trn2 instances connected with low-latency, high-bandwidth NeuronLink. That’s 512 NeuronCores, 64 Trainium2 chips, 6 TiB of HBM, and 185 TB/second of HBM bandwidth. Doing the math, this results in up to 83 petaflops of dense FP compute and up to 332 petaflops of sparse FP8 compute. In addition to the 2D torus that connects NeuronCores within an instance, Cores at corresponding XY positions in each of the four instances are connected in a ring. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism when compared to standalone instances. UltraServers are designed to support training and inference at the trillion parameter level and beyond; they are available in preview form and you can contact us to join the preview.

Trn2 instances and UltraServers are being deployed in EC2 UltraClusters to enable scale-out distributed training across tens of thousands of Trainium chips on a single petabit scale, non-blocking network, with access to Amazon FSx for Lustre high performance storage.

Using Trn2 Instances
Trn2 instances are available today for production use in the US East (Ohio) AWS Region and can be reserved by using Amazon EC2 Capacity Blocks for ML. You can reserve up to 64 instances for up to six months, with reservations accepted up to eight weeks in advance, with instant start times and the ability to extend your reservations if needed. To learn more, read Announcing Amazon EC2 Capacity Blocks for ML to reserve GPU capacity for your machine learning workloads.

On the software side, you can start with the AWS Deep Learning AMIs. These images are preconfigured with the frameworks and tools that you probably already know and use: PyTorch, JAX, and a lot more.

If you used the AWS Neuron SDK to build your apps, you can bring them over and recompile them for use on Trn2 instances. This SDK integrates natively with JAX, PyTorch, and essential libraries like Hugging Face, PyTorch Lightning, and NeMo. Neuron includes out-of-the-box optimizations for distributed training and inference with the open source PyTorch libraries NxD Training and NxD Inference, while providing deep insights for profiling and debugging. Neuron also supports OpenXLA, including stable HLO and GSPMD, enabling PyTorch/XLA and JAX developers to utilize Neuron’s compiler optimizations for Trainium2.

— Jeff;

New Amazon EC2 P5en instances with NVIDIA H200 Tensor Core GPUs and EFAv3 networking

2024-12-03 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p5en-instances-with-nvidia-h200-tensor-core-gpus-and-efav3-networking/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P5en instances, powered by NVIDIA H200 Tensor Core GPUs and custom 4th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz (max core turbo frequency of 3.8 GHz) available only on AWS. These processors offer 50 percent higher memory bandwidth and up to four times throughput between CPU and GPU with PCIe Gen5, which help boost performance for machine learning (ML) training and inference workloads.

P5en, with up to 3200 Gbps of third generation of Elastic Fabric Adapter (EFAv3) using Nitro v5, shows up to 35% improvement in latency compared to P5 that uses the previous generation of EFA and Nitro. This helps improve collective communications performance for distributed training workloads such as deep learning, generative AI, real-time data processing, and high-performance computing (HPC) applications.

Here are the specs for P5en instances:

Instance size	vCPUs	Memory (GiB)	GPUs (H200)	Network bandwidth (Gbps)	GPU Peer to peer (GB/s)	Instance storage (TB)	EBS bandwidth (Gbps)
p5en.48xlarge	192	2048	8	3200	900	8 x 3.84	100

On September 9, we introduced Amazon EC2 P5e instances, powered by 8 NVIDIA H200 GPUs with 1128 GB of high bandwidth GPU memory, 3rd Gen AMD EPYC processors, 2 TiB of system memory, and 30 TB of local NVMe storage. These instances provide up to 3,200 Gbps of aggregate network bandwidth with EFAv2 and support GPUDirect RDMA, enabling lower latency and efficient scale-out performance by bypassing the CPU for internode communication.

With P5en instances, you can increase the overall efficiency in a wide range of GPU-accelerated applications by further reducing the inference and network latency. P5en instances increases local storage performance by up to two times and Amazon Elastic Block Store (Amazon EBS) bandwidth by up to 25 percent compared with P5 instances, which will further improve inference latency performance for those of you who are using local storage for caching model weights.

The transfer of data between CPUs and GPUs can be time-consuming, especially for large datasets or workloads that require frequent data exchanges. With PCIe Gen 5 providing up to four times bandwidth between CPU and GPU compared with P5eand P5e instances, you can further improve latency for model training, fine-tuning, and running inference for complex large language models (LLMs) and multimodal foundation models (FMs), and memory-intensive HPC applications such as simulations, pharmaceutical discovery, weather forecasting, and financial modeling.

Getting started with Amazon EC2 P5en instances
You can use EC2 P5en instances available in the US East (Ohio), US West (Oregon), and Asia Pacific (Tokyo) AWS Regions through EC2 Capacity Blocks for ML, On Demand, and Savings Plan purchase options.

I want to introduce how to use P5en instances with Capacity Reservation as an option. To reserve your EC2 Capacity Blocks, choose Capacity Reservations on the Amazon EC2 console in the US East (Ohio) AWS Region.

Select Purchase Capacity Blocks for ML and then choose your total capacity and specify how long you need the EC2 Capacity Block for p5en.48xlarge instances. The total number of days that you can reserve EC2 Capacity Blocks is 1–14, 21, or 28 days. EC2 Capacity Blocks can be purchased up to 8 weeks in advance.

When you select Find Capacity Blocks, AWS returns the lowest-priced offering available that meets your specifications in the date range you have specified. After reviewing EC2 Capacity Blocks details, tags, and total price information, choose Purchase.

Now, your EC2 Capacity Block will be scheduled successfully. The total price of an EC2 Capacity Block is charged up front, and the price does not change after purchase. The payment will be billed to your account within 12 hours after you purchase the EC2 Capacity Blocks. To learn more, visit Capacity Blocks for ML in the Amazon EC2 User Guide.

To run instances within your purchased Capacity Block, you can use AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs.

Here is a sample AWS CLI command to run 16 P5en instances to maximize EFAv3 benefits. This configuration provides up to 3200 Gbps of EFA networking bandwidth and up to 800 Gbps of IP networking bandwidth with eight private IP address:

$ aws ec2 run-instances --image-id ami-abc12345 \
  --instance-type p5en.48xlarge \
  --count 16 \
  --key-name MyKeyPair \
  --instance-market-options MarketType='capacity-block' \
  --capacity-reservation-specification CapacityReservationTarget={CapacityReservationId=cr-a1234567}
--network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=1,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=2,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=3,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=4,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=5,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=6,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=7,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=8,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=9,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=10,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=11,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=12,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=13,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=14,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=15,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=16,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=17,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=18,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=19,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=20,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=21,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=22,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=23,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=24,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=25,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=26,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=27,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=28,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa" \
"NetworkCardIndex=29,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=30,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only" \
"NetworkCardIndex=31,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only"
...

When launching P5en instances, you can use AWS Deep Learning AMIs (DLAMI) to support EC2 P5en instances. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure, distributed ML applications in preconfigured environments.

You can run containerized ML applications on P5en instances with AWS Deep Learning Containers using libraries for Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS).

For fast access to large datasets, you can use up to 30 TB of local NVMe SSD storage or virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3). You can also use Amazon FSx for Lustre file systems in P5en instances so you can access data at the hundreds of GB/s of throughput and millions of input/output operations per second (IOPS) required for large-scale deep learning and HPC workloads.

Now available
Amazon EC2 P5en instances are available today in the US East (Ohio), US West (Oregon), and Asia Pacific (Tokyo) AWS Regions and US East (Atlanta) Local Zone us-east-1-atl-2a through EC2 Capacity Blocks for ML, On Demand, and Savings Plan purchase options. For more information, visit the Amazon EC2 pricing page.

Give Amazon EC2 P5en instances a try in the Amazon EC2 console. To learn more, see Amazon EC2 P5 instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

Top announcements of AWS re:Invent 2024

2024-12-02 AWS Editorial Team

Post Syndicated from AWS Editorial Team original https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2024/

AWS re:Invent 2024, our flagship annual conference, is taking place Dec. 2-6, 2024, in Las Vegas. This premier cloud computing event brings together the global cloud computing community for a week of keynotes, technical sessions, product launches, and networking opportunities. As AWS continues to unveil its latest innovations and services throughout the conference, we’ll keep you updated here with all the major product announcements.

Additional re:Invent resources:

AWS News Blog: Chief Evangelist Jeff Barr and colleagues keep you posted on the biggest and best new AWS offerings.
What’s New with AWS: A comprehensive list of all AWS launches.
The Official AWS Podcast: A podcast for developers and IT professionals looking for the latest news and trends from AWS.
AWS On Air: Live-streamed announcements and hands-on demos.
AWS re:Post: Join the community in conversation through Q&A.

(This post was last updated: 9:08 p.m. PST, Dec. 1, 2024.)

Quick category links:

Analytics

AWS Clean Rooms now supports multiple clouds and data sources
With expanded data sources, AWS Clean Rooms helps customers securely collaborate with their partners’ data across clouds, eliminating data movement, safeguarding sensitive information, promoting data freshness, and streamlining cross-company insights.

Application Integration

Securely share AWS resources across VPC and account boundaries with PrivateLink, VPC Lattice, EventBridge, and Step Functions

Orchestrate hybrid workflows accessing private HTTPS endpoints – no more Lambda/SQS workarounds. EventBridge and Step Functions natively support private resources, simplifying cloud modernization.

Business Applications

Newly enhanced Amazon Connect adds generative AI, WhatsApp Business, and secure data collection
Use innovative tools like generative AI for segmentation and campaigns, WhatsApp Business, data privacy controls for chat, AI guardrails, conversational AI bot management, and enhanced analytics to elevate customer experiences securely and efficiently.

Compute

Introducing storage optimized Amazon EC2 I8g instances powered by AWS Graviton4 processors and 3rd gen AWS Nitro SSDs
Elevate storage performance with AWS’s newest I8g instances, which deliver unparalleled speed and efficiency for I/O-intensive workloads.

Now available: Storage optimized Amazon EC2 I7ie instances
New AWS I7ie instances deliver unbeatable storage performance: up to 120TB NVMe, 40% better compute performance and up to 65% better real-time storage performance.

Containers

Use your on-premises infrastructure in Amazon EKS clusters with Amazon EKS Hybrid Nodes
Unify Kubernetes management across your cloud and on-premises environments with Amazon EKS Hybrid Nodes – use existing hardware while offloading control plane responsibilities to EKS for consistent operations.

Streamline Kubernetes cluster management with new Amazon EKS Auto Mode
With EKS Auto Mode, AWS simplifies Kubernetes cluster management, automating compute, storage, and networking, enabling higher agility and performance while reducing operational overhead.

Database

Amazon MemoryDB Multi-Region is now generally available
Build highly available, globally distributed apps with microsecond latencies across Regions, automatic conflict resolution, and up to 99.999% availability.

Generative AI / Machine Learning

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock
Evaluate AI models and applications efficiently with Amazon Bedrock’s new LLM-as-a-judge capability for model evaluation and RAG evaluation for Knowledge Bases, offering a variety of quality and responsible AI metrics at scale.

Enhance your productivity with new extensions and integrations in Amazon Q Business
Seamlessly access AI assistance within work applications with Amazon Q Business’s new browser extensions and integrations.

New APIs in Amazon Bedrock to enhance RAG applications, now available
With custom connectors and reranking models, you can enhance RAG applications by enabling direct ingestion to knowledge bases without requiring a full sync, and improving response relevance through advanced reranking models.

Introducing new PartyRock capabilities and free daily usage
Unleash your creativity with PartyRock’s new AI capabilities: generate images, analyze visuals, search hundreds of thousands of apps, and process multiple docs simultaneously – no coding required.

Amazon Q Business adds support to extract insights from visual elements within documents

Users can now query information embedded in various types of visuals, including diagrams, infographics, charts, and other image-based content.

Management & Governance

Container Insights with enhanced observability now available in Amazon ECS
With granular visibility into container workloads, CloudWatch Container Insights with enhanced observability for Amazon ECS enables proactive monitoring and faster troubleshooting, enhancing observability and improving application performance.

New Amazon CloudWatch Database Insights: Comprehensive database observability from fleets to instances
Monitor Amazon Aurora databases and gain comprehensive visibility into MySQL and PostgreSQL fleets and instances, analyze performance bottlenecks, track slow queries, set SLOs, and explore rich telemetry.

New Amazon CloudWatch and Amazon OpenSearch Service launch an integrated analytics experience
Unlock out-of-the-box OpenSearch dashboards and two additional query languages, OpenSearch SQL and PPL, for analyzing CloudWatch logs. OpenSearch customers can now analyze CloudWatch Logs without having to duplicate data.

Migration & Transfer Services

AWS Database Migration Service now automates time-intensive schema conversion tasks using generative AI
AWS DMS Schema Conversion converts up to 90% of your schema to accelerate your database migrations and reduce manual effort with the power of generative AI.

Announcing AWS Transfer Family web apps for fully managed Amazon S3 file transfers
AWS Transfer Family web apps are a new resource that you can use to create a simple interface for authorized line-of-business users to access data in Amazon S3 through a customizable web browser.

Introducing default data integrity protections for new objects in Amazon S3
Amazon S3 updates the default behavior of object upload requests with new data integrity protections that build upon S3’s existing durability posture.

Security, Identity, & Compliance

New AWS Security Incident Response helps organizations respond to and recover from security events
AWS introduces a new service to streamline security event response, providing automated triage, coordinated communication, and expert guidance to recover from cybersecurity threats.

Introducing Amazon GuardDuty Extended Threat Detection: AI/ML attack sequence identification for enhanced cloud security
AWS extends GuardDuty with AI/ML capabilities to detect complex attack sequences across workloads, applications, and data, correlating multiple security signals over time for proactive cloud security.

Simplify governance with declarative policies
With only a few steps, create declarative policies and enforce desired configuration for AWS services across your organization, reducing ongoing governance overhead and providing transparency for administrators and end users.

AWS Verified Access now supports secure access to resources over non-HTTP(S) protocols (preview)
With only a few steps, create declarative policies and enforce desired configuration for AWS services across your organization, reducing ongoing governance overhead and providing transparency for administrators and end users.

Introducing Amazon OpenSearch Service and Amazon Security Lake integration to simplify security analytics
Analyze security logs without data duplication; Amazon OpenSearch Service now offers zero-ETL integration with Amazon Security Lake for efficient threat hunting and investigations.

Storage

Announcing Amazon FSx Intelligent-Tiering, a new storage class for FSx for OpenZFS
Delivering NAS capabilities with automatic data tiering among frequently accessed, infrequent, and archival storage tiers, Amazon FSx Intelligent-Tiering offers high performance up to 400K IOPS, 20 GB/s throughput, seamless integration with AWS services.

New physical AWS Data Transfer Terminals let you upload to the cloud faster
Rapidly upload large datasets to AWS at blazing speeds with the new AWS Data Transfer Terminal, secure physical locations offering high throughput connection.

Connect users to data through your apps with Storage Browser for Amazon S3
Storage Browser for Amazon S3 is an open source interface component that you can add to your web applications to provide your authorized end users, such as customers, partners, and employees, with access to easily browse, upload, download, copy, and delete data in S3.

Introducing Amazon GuardDuty Extended Threat Detection: AI/ML attack sequence identification for enhanced cloud security

2024-12-02 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/introducing-amazon-guardduty-extended-threat-detection-aiml-attack-sequence-identification-for-enhanced-cloud-security/

Today, I’m happy to introduce advanced AI/ML threat detection capabilities in Amazon GuardDuty. This new feature uses the extensive cloud visibility and scale of AWS to provide improved threat detection for your applications, workloads, and data. GuardDuty Extended Threat Detection employs sophisticated AI/ML to identify both known and previously unknown attack sequences, offering a more comprehensive and proactive approach to cloud security. This enhancement addresses the growing complexity of modern cloud environments and the evolving landscape of security threats, simplifying threat detection and response.

Many organizations face challenges in efficiently analyzing and responding to the high volume of security events generated across their cloud environments. With the increasing frequency and sophistication of security threats, it has become more challenging to effectively detect and respond to attacks that occur as sequences of events over time. Security teams often struggle to piece together related activities that might be part of a larger attack, potentially missing critical threats or responding too late to prevent significant impact.

To address these challenges, we have expanded GuardDuty threat detection capabilities to include new AI/ML capabilities that correlate security signals to identify active attack sequences in your AWS environment. These sequences can include multiple steps taken by an adversary, such as privilege discovery, API manipulation, persistence activities, and data exfiltration. These detections are represented as attack sequence findings, a new type of GuardDuty finding with critical severity. Previously, GuardDuty had never used critical severity, reserving this level for findings with the utmost confidence and urgency. These new findings introduce critical severity and include a natural language summary of the threat’s nature and significance, observed activities mapped to tactics and techniques from the MITRE ATT&CK® framework, and prescriptive remediation recommendations based on AWS best practices.

GuardDuty Extended Threat Detection introduces new attack sequence findings and improves actionability for existing detections in areas such as credential exfiltration, privilege escalation, and data exfiltration. This enhancement enables GuardDuty to offer composite detections that span multiple data sources, time periods, and resources within an account, providing you with a more comprehensive understanding of sophisticated cloud attacks.

Let me show you how the new capabilities work.

How to use the new AI/ML threat detection in Amazon GuardDuty
To experience the new AI/ML threat detection in GuardDuty, go to the Amazon GuardDuty console and explore the new widgets on the Summary page. The overview widget now helps you view the number of attack sequences you have and consider the details of those attack sequences. Cloud environment findings often reveal multistage attacks, but these sophisticated attack sequences are low volume and account for a small fraction of the total number of findings. For this particular account, you can observe a variety of findings in the cloud environment, but only a handful of actual attack sequences. In a larger cloud environment, you may see hundreds or even thousands of findings, yet the number of attack sequences will likely remain relatively small in comparison.

We’ve also added a new widget that helps you view the findings broken down by severity. This makes it easier to quickly pivot into and investigate specific findings that are of interest to you. The findings are now sorted by Severity, providing you with a clear overview of the most critical issues, including an additional Critical severity category, ensuring that the most urgent detections are immediately brought to your attention. You can also filter just for the attack sequences by choosing Top attack sequences only.

This new capability is enabled by default, so you don’t need to take any additional steps for it to start working. There are no extra costs for this feature beyond the underlying charges for GuardDuty and its associated protection plans. As you enable additional GuardDuty protection plans, this capability will provide more integrated security value, helping you gain deeper insights.

You can observe two types of findings. The first one is data compromise, which indicates a potential data compromise that can be a part of a larger ransomware attack. Data is the most critical organizational asset for most customers, making this an important area of concern. The second finding is compromised credential type, which helps you detect the misuse of compromised credentials, typically during the earlier stages of an attack in your cloud environment.

Let me dive into one of the compromise data findings. I’ll focus on “Potential data compromise of one or more S3 buckets involving a sequence of actions over multiple signals associated with a user in your account”. This finding indicates that we have observed data being compromised across multiple Amazon Simple Storage Service (Amazon S3) buckets with multiple associated signals.

The summary provided with this finding gives you key details, including the specific user (identified by their principal ID) who performed the actions, the account and resources affected, and the extended time period (nearly a full day) over which the activity occurred. This information can help you quickly understand the scope and severity of the potential compromise.

This finding has eight distinct signals observed over a nearly 24-hour period, indicating the use of multiple tactics and techniques mapped to the MITRE ATT&CK® framework. This broad coverage across the attack chain—from credential access, to discovery, evasion, persistence, and even impact and exfiltration—suggests this may indeed be a true positive incident. The finding also surfaces a concerning technique of data destruction, which is particularly alarming.

Additionally, GuardDuty provides further security context by highlighting sensitive API calls, such as the user deleting the AWS CloudTrail trail. This type of evasive behavior, combined with the creation of new access keys and actions targeting Amazon S3 objects, further reinforces the severity and potential scope of the incident. Based on the information presented in this finding, you would likely want to investigate this incident more thoroughly.

Reviewing the ATT&CK tactics associated with the findings provides visibility into the specific tactics involved, whether it’s a single tactic or multiple. GuardDuty also offers security indicators that explain why the activity was flagged as suspicious and assigned a critical severity, including the high-risk APIs called and the tactics observed.

Diving deeper, you can view details about the actor responsible. The information includes how the user connected to and carried out these actions, including the network locations. This additional context helps you better understand the full scope and nature of the incident, which is crucial for investigation and response. You can follow prescriptive remediation recommendations based on AWS best practices, offering you actionable insights to swiftly address and resolve identified detections. These tailored recommendations help you improve your cloud security posture and ensure alignment with security guidelines.

The Signals tab can be sorted by newest or oldest first. If responding to an active attack, you’ll want to start with the latest signals to quickly understand and mitigate the situation. For a post-incident review, you can trace back from the initial activities. Diving into each activity provides detailed information about the specific finding. We also offer a quick view through Indicators, Actors, and Endpoints to summarize what occurred and who took action.

Another way to follow the details is to access the Resources tab, where you can check the different buckets that are involved and the access keys. For each resource, you can check which tactics and techniques happened. Select the open resource to pivot directly to the relevant console and learn more details.

We’ve introduced a full-page view for GuardDuty findings, making it easier to see all the contextual data in one place. However, the traditional findings page with the side panel is still available if you prefer that layout, which provides a quick view of the details for specific findings.

GuardDuty Extended Threat Detection is automatically enabled for all GuardDuty accounts in a Region, leveraging foundational data sources without requiring additional protection plans. Enabling additional protection plans expands the range of security signals analyzed, improving the service’s ability to identify complex attack sequences. GuardDuty specifically recommends activating S3 Protection to detect potential data compromises in Amazon S3 buckets. Without S3 Protection enabled, GuardDuty cannot generate S3-specific findings or identify attack sequences involving S3 resources, limiting its capacity to detect data compromise scenarios in your Amazon S3 environment.

GuardDuty Extended Threat Detection integrates with existing GuardDuty workflows, including the AWS Security Hub, Amazon EventBridge, and third-party security event management systems.

Now available
Amazon GuardDuty Extended Threat Detection significantly enhances cloud security by automating the analysis of complex attack sequences and providing actionable insights, helping you focus on addressing the most critical threats efficiently, reducing the time and effort required for manual analysis.

These capabilities are automatically enabled for all new and existing GuardDuty customers at no additional cost in all commercial AWS Regions where GuardDuty is supported.

To learn more and start benefiting from these new capabilities, visit the Amazon GuardDuty documentation.

— Esra

Container Insights with enhanced observability now available in Amazon ECS

2024-12-02 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/container-insights-with-enhanced-observability-now-available-in-amazon-ecs/

Last year, we announced enhanced observability in Amazon CloudWatch Container Insights, a new capability to improve your observability for Amazon Elastic Kubernetes Service (Amazon EKS). This capability helps you detect and fix container issues faster by providing detailed performance metrics and logs.

Expanding this capability, today we’re launching enhanced observability for your container workloads running on Amazon Elastic Container Service (Amazon ECS). This new capability will help reduce your mean time to detect (MTTD) and mean time to repair (MTTR) for your overall applications, helping prevent issues that could negatively impact your user experience.

Here’s a quick look at Container Insights with enhanced observability for Amazon ECS.

Container Insights with enhanced observability addresses a critical gap in container monitoring. Previously, correlating metrics with logs and events was a time-consuming process, often requiring manual searches and expertise in application architecture. Now, with this capability, CloudWatch and Amazon ECS automatically collect granular performance metrics such as CPU utilization at both the task and container levels while providing visual drill downs enabling easy root-cause analysis.

This new capability enables the following use cases:

Quickly identify root causes by viewing granular resource usage patterns and correlating telemetry data.
Proactively manage your ECS resources using curated dashboards based on AWS best practices.
Track your recent deployments and root causes of your deployment failures with the matching infrastructure anomalies enabling faster issue detection and quicker rollbacks when necessary.
Effortlessly monitor resources across multiple accounts without manual setup. Built-in cross-account support reduces operational overhead with single pane of glass observability.
Integration with other CloudWatch services such as Application Signals and CloudWatch Logs provides a seamless experience to correlate infrastructure with the services running and identify the impacted services.

Using container insights with enhanced observability for Amazon ECS
There are two ways to enable Container Insights with enhanced observability:

Cluster-level onboarding – You can enable it for specific clusters individually.
Account-level onboarding – You can also enable it at the account level, which automatically enables observability for all new clusters created in your account. This approach saves time and effort by eliminating the need to manually enable it for each new cluster.

To enable this feature at the account level, I navigate to the Amazon ECS console and select Account settings. Under the CloudWatch Container Insights observability section, I can see it’s currently disabled. I choose Update.

On this page, I find a new option called Container Insights with enhanced observability. I select this option and then choose Save changes.

If I need to enable this capability at the cluster level, I can do so when creating a new cluster.

I can also enable this capability for my existing clusters. To do so, I select Update cluster, and then choose the option.

Once enabled, I can see task-level metrics by navigating to the Metrics tab in my cluster overview console. To access health and performance metrics across my clusters, I can select View Container Insights, which will redirect me to the Container Insights page.

To get a big picture of all my workloads across different clusters, I can navigate to Amazon CloudWatch and then to Container Insights.

This view addresses the challenge of effectively monitoring clusters, services, tasks, and containers by providing a honeycomb visualization that offers an intuitive, high-level summary of cluster health. The dashboard employs a dual-state monitoring approach:

Alarm state (red or green) – Reflects customer-defined thresholds and alerts, allowing teams to configure monitoring based on their specific requirements
Utilization state (dark blue or light blue) – Uses CloudWatch built-in best practices to monitor resource usage patterns across containers. The darker blue indicates clusters operating under higher utilization, enabling teams to proactively identify potential resource constraints before they impact performance

Let’s say there’s an issue in one of my clusters. I can hover over the cluster to display all the alarms created under that cluster at different layers, from the cluster layer down to the container layer.

I also have the option to view all clusters in a list format. The list format is essential for cross-account observability, displaying account IDs and labels for cluster ownership. This helps DevOps engineers quickly identify and collaborate with account owners to resolve potential application issues.

Now, I’d like to explore further. I select my cluster link, which redirects me to the Container Insights detailed dashboard view. Here, I can see a spike in memory utilization for this cluster.

I can dive deeper into container-level details, which help me quickly identify which services are causing this issue.

Another useful feature I found is the Filters option, which helps me conduct more thorough investigations across containers, services, or tasks in this cluster.

If I need to delve deeper into the application logs to understand the root cause of this issue, I can select the task, choose Actions, and choose which logs I would like to view.

On top of using AWS X-Ray traces, I can investigate another two types of logs here. First, I can use performance logs—structured logs containing metric data—to drill down and identify container-level root causes. Second, I examine collected application or container logs . These logs give me detailed insights into application behavior within the container, helping me trace the sequence of events that led to any issues.

In this case, I use application logs.

This streamlines my journey to troubleshoot my application. In this case, the issue is on the downstream calls to third-party applications, which return timeouts.

This enhanced capability also works with Amazon CloudWatch Application Signals to automatically instrument my application. I can monitor current application health and track long-term application performance against service-level objectives.

I select the Application Signals tab.

This integration with Amazon CloudWatch Application Signals provides me with end-to-end visibility, helping me correlate container performance with end-user experience.

When I select datapoints in the graphs, I can see associated traces, which show me all correlated services and their impact. I can also access relevant logs to understand root causes.

Additional things to know
Here are a couple of important points to note:

Availability – Container Insights with enhanced observability for ECS is now available in all AWS Regions including the China Regions.
Pricing – Container Insights with enhanced observability for ECS comes with a flat metric pricing, visit the Amazon CloudWatch Pricing page.

Get started today and experience improved observability for your container workloads. Learn more on the Amazon CloudWatch documentation page.

Happy monitoring,
— Donnie Prakoso

AWS Clean Rooms now supports multiple clouds and data sources

2024-12-02 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/aws-clean-rooms-now-supports-multiple-clouds-and-data-sources/

Today, we are announcing support for Snowflake and Amazon Athena as new sources for AWS Clean Rooms data collaborations. AWS Clean Rooms helps you and your partners more seamlessly and securely analyze your collective datasets without sharing or copying one another’s underlying data. This enhancement helps you collaborate with datasets stored in Snowflake or those queryable through Athena features, such as AWS Lake Formation permissions or AWS Glue Data Catalog views, without moving or revealing the source data.

You often need to collaborate with partners to analyze datasets to get insights for research and development, investments, or marketing and advertising campaigns. In some cases, your partners’ datasets are stored or managed outside of Amazon Simple Storage Service (Amazon S3), and companies want to reduce or eliminate the complexity, cost, compliance risks, and delays that are associated with moving or copying data. Companies also find that copying data can result in them using outdated information, potentially reducing the quality of the insights gained.

This launch helps companies to collaborate on the most up-to-date collective datasets in an AWS Clean Rooms collaboration with zero extract, transform, and load (zero-ETL). This eliminates the cost and complexity associated with migrating datasets out of existing environments. For example, an advertiser with data stored in Amazon S3 and a media publisher with data stored in Snowflake can run an audience overlap analysis to determine the percentage of users present in their collective datasets without having to build ETL data pipelines, or share underlying data with one another. No underlying data from external data sources is permanently stored in AWS Clean Rooms during the collaboration process and any data temporarily read into the AWS Clean Rooms analysis environment is deleted upon query completion. You can now work with your partners regardless of where their data is stored, streamlining the process of generating insights.

Let me show you how to use this feature.

How to use multiple clouds and data sources in AWS Clean Rooms
To demonstrate this feature, I use a scenario between an advertiser, Company A, and a publisher, Company B. Company A wants to know how many of their high-value users can be reached on Company B’s website before running an ad campaign. Company A stores their data in Amazon S3. Company B stores their data in Snowflake. To use AWS Clean Rooms, both parties must have their own AWS accounts.

In this demo, Company A, the advertiser, is the collaboration creator. Company A creates the AWS Clean Rooms collaboration and invites Company B, who has data hosted in Snowflake, to collaborate. You can follow the specific steps to create a collaboration in the AWS Clean Rooms general availability announcement blog post.

Next, I show how Company B, the publisher, creates a configured table in AWS Clean Rooms, specifying Snowflake as the data source and providing the Secrets Manager Amazon Resource Name (ARN). AWS Secrets Manager helps you manage, retrieve, and rotate secrets such as database credentials throughout their lifecycles. Your secret must contain the credentials for a Snowflake user with read-only permission to the data you want to collaborate with. AWS Clean Rooms will use it to read your secret and access the data stored in Snowflake. See the Secrets Manager documentation for step-by-step instructions for creating your secret.

Using Company B’s AWS account, I go to the AWS Clean Rooms console and choose Tables under Configured resources. I choose Configure new table. I choose Snowflake under Third-party clouds and data sources. I enter the Secret ARN for the secret that contains Snowflake credentials for a role with read access to the dataset stored in Snowflake I want to collaborate with. These are the credentials that you use to verify the identity of the entity trying to access the Snowflake table and schema. If you don’t have a secret ARN, you can create a new secret using the Store a new secret for this table option.

To define the table and schema details, I use the Import from file option and choose the Columns View Information Schema CSV file I exported from Snowflake to populate the information for me. You can also enter the information manually.

For this demo, I choose All columns under the Columns allowed in collaborations. Next, I choose Configure new table.

I go to the configured table and observe the table details, such as AWS accounts allowed to create queries and columns available for querying. On this page, I can edit the table name, description, and analysis rule.

As part of configuring a table to use in AWS Clean Rooms for collaboration analysis, I need to configure an analysis rule. An analysis rule is a privacy-enhancing control that each data owner sets up on a configured table. An analysis rule determines how the configured table can be analyzed. I choose Configure analysis rule to configure a custom analysis rule that allows custom queries to be run on the configured table.

In Step 1, I proceed with the selections. You can use JSON editor to create, paste, or import an analysis rule definition in a JSON format. I choose Next.

In Step 2, I choose Allow any queries created by specific collaborators to run without review on this table under Analyses for direct querying. With this option, only queries provided by the AWS accounts that I specify in the list of allowed accounts can be run on the table. All analysis templates created by the allowed accounts will automatically be allowed to be run on this table without requiring a review. I choose the allowed account under AWS account ID and choose Next.

In Step 3, I proceed with the selections. I choose None under Columns not allowed in output to allow all columns to be shown in the query output. I choose Not allowed under Additional analyses applied to output, so no additional analyses can be run on this table. I choose Next.

In the final step, I review the configuration and choose Configure analysis rule.

Next, I associate the table with the collaboration Company A, the advertiser, created using Associate to collaboration.

On the pop-up window, I choose a collaboration from the ones with active memberships and select Choose collaboration.

On the next page, I choose the Configured table name and enter the Name under Table associations details. I choose a method to authorize AWS Clean Rooms to give the permission to query the table. I choose Associate table.

Company A, the advertiser, and Company B, the publisher, can now run an audience overlap analysis to determine the percentage of users present in their collective datasets without accessing each other’s raw data. The analysis helps determine how much of the advertiser’s audience can be reached by the publisher. By evaluating the overlap, advertisers can determine whether the publisher provides unique reach or if the publisher’s audience predominantly overlaps with the advertiser’s existing audience, without either party having to move or share their source data. I switch to Company A’s account and go to AWS Clean Rooms console. I choose the collaboration I created and run the following query to get the audience overlap analysis result:

select count (distinct emailaddress)
from customer_data_example as advertiser
inner join synthetic_customer_data  as publisher
on 'emailaddress' = 'publisher_hashed_email_address'

In this example, I used Snowflake as a data source. You can also run queries on this data using Athena while following AWS Lake Formation permissions. This helps you do row- and column-level filtering with Lake Formation fine-grained access control and transform data using AWS Glue Data Catalog views before the datasets are associated to the collaboration.

Customer and partner voices
“Data security and privacy is essential to our work at Kinective Media by United Airlines, the world’s first traveler media network,” said Khatidja Ajania, Director, Strategic Partnerships, Kinective Media by United Airlines. “AWS Clean Rooms support of source data in multiple clouds and AWS sources enables us to securely and seamlessly work with more brands to deliver on closed loop measurement and other key use cases. This enhancement will make it easier for us to securely deliver personalized experiences, content, and relevant offerings to millions of United travelers through privacy-enhanced collaboration with our advertisers and partners.”

“Snowflake recognizes the challenges of source data interoperability across tech stacks when using data clean room technology; we are excited to see the progress and one more step taken in the direction of a shared goal to empower users to unlock the full potential of their data partnerships through their solution of choice, safely and effectively” – Kamakshi Sivaramakrishnan, General Manager, Snowflake Data Clean Rooms

Now available
Support for Snowﬂake and Athena as data sources in AWS Clean Rooms oﬀers signiﬁcant beneﬁts for cross-cloud collaboration. This launch eliminates the need for data movement across clouds and data sources and simpliﬁes the collaboration process. This is a first step in our eﬀorts to expand the ways in which customers can securely collaborate with any of their partners while protecting sensitive information, regardless of where their data is stored.

Get started with AWS Clean Rooms today. To learn more about collaborating with multiple data sources, visit the AWS Clean Rooms documentation.

— Esra

New physical AWS Data Transfer Terminals let you upload to the cloud faster

2024-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-physical-aws-data-transfer-terminals-let-you-upload-to-the-cloud-faster/

Today, we’re announcing the general availability of AWS Data Transfer Terminal, a secure physical location where you can bring your storage devices and upload data faster to the AWS Cloud.

The first Data Transfer Terminals are located in Los Angeles and New York, with plans to add more locations globally. You can reserve a time slot to visit your nearest location and upload data rapidly and securely to any AWS public endpoints, such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), or others, using a high throughput connection. Using AWS Data Transfer Terminal, you can significantly reduce the time of ingesting data with high throughput connectivity in the location near by you. You can upload large datasets from fleets of vehicles operating and collecting data in metro areas for training machine learning (ML) models, digital audio and video files from content creators for media processing workloads, and mapping or imagery data from local government organizations for geographic analysis.

After the data is uploaded to AWS, you can use the extensive suite of AWS services to generate value from your data and accelerate innovation. You can also bring your AWS Snowball devices to the location for upload and retain the device for continued use and not rely on traditional shipping methods.

Getting started with AWS Data Transfer Terminal
You can find the availability of a location in the AWS Management Console and reserve the date and time to visit. Then, you can visit the location, make a connection between your storage device and S3 bucket, initiate the transfer of your data, and validate that your transfer is complete.

Go to the AWS Data Transfer Terminal console, then choose Get started.

Choose Create Transfer Team and make a team by adding the team’s name and description with agreement of service terms and conditions. You can add your team members for personal or group reservation in the team setting.

To reserve your time and location, choose Create Reservation.

In the first step, choose your team, a process owner to manage your reservation, and team members to visit the location for the data transferring job. Now, you can choose a location of Data Transfer Terminal facility and set your preferred visiting time. You’ll pay for the space reservation at an hourly rate for your reserved time.

To secure your reservation, choose Next and Create after reviewing the reservation details.

After your reservation is requested, you can find your upcoming reservations in the team page. You can check the reservation status or cancel your reservation.

On your reserved date and time, visit the location and confirm access with the building reception. You’re escorted by building staff to the floor and your reserved room of the Data Transfer Terminal location.

Don’t be surprised if there are no AWS signs in the building or room. This is for security reasons to keep your work location as secret as possible.

Visiting a pilot Terminal
Instead of me visiting a Data Transfer Terminal location where I live in Seoul, Jeff Barr visited a pilot location near him in Seattle to test uploading data as my team member.

The room is equipped with a patch panel, fiber optic cable, and a personal computer. The patch panel is installed inside a wall mount rack or small floor rack to allow additional space on the desk table. With the personal computer, you can see how to remote access to the server during data transfer process.

Here is Jeff’s feedback about visiting and working at the pilot facility.

When I arrived at the building, I was kindly escorted in and able to work easily using the instructions provided at the time of reservation. This location provides me with direct access to AWS global network infrastructure in a secure and on-demand format. I am excited to see how customers use AWS Data Transfer Terminal to more quickly get data into the cloud where they can more rapidly innovate and build on AWS.

Thanks, Jeff, for visiting the facility and doing the uploading job in my place!

Now available
AWS Data Transfer Terminal is now available today in Los Angeles and New York, with plans to add more locations globally.

You’ll be charged for on-demand use per hour for each location. There will be no per GB charge for the data transfer if you upload data into AWS Regions in the same continent of your location. To learn more, visit the Data Transfer Terminal pricing page.

Give AWS Data Transfer Terminal a try in the AWS Management Console. To learn more, refer to the Data Transfer Terminal page and send feedback through your usual AWS Support contacts.

— Channy

Enhance your productivity with new extensions and integrations in Amazon Q Business

2024-12-02 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/enhance-your-productivity-with-new-extensions-and-integrations-in-amazon-q-business/

Today, we’re announcing a new capability from Amazon Q Business to seamlessly access your assistant within popular web browsers and productivity tools. This helps you save time and complete your work and tasks more efficiently without having to leave your preferred applications.

Now, you can use Amazon Q Business directly from your web browser and other supported messaging and collaboration applications. You can quickly gather insights, review information, and ask questions. For example, you can effortlessly analyze and summarize content, get explanations on complex topics, or create meeting summaries without switching between applications.

Let’s get started
Let me walk you through how to get started with the new browser extensions and integrations. First, let’s look at the browser extensions. The following screenshot shows how it looks.

As an administrator, I need to enable the browser extensions for users of my Amazon Q Business application. To do that, I navigate to my Amazon Q Business application dashboard and select Integrations under the Enhancements section in the left navigation pane.

Then, on the Integrations page, select Edit in the Browser extensions section.

I select the available options in the Browsers section and choose Save. After I’ve enabled these options, my users will receive notification emails prompting them to install the extension.

Now, I’m switching to a user perspective of the Amazon Q Business application. I’ve received an email with a link to the Amazon Q Business web application. I visit the link and sign in to the Amazon Q Business web application. Here, I see a banner with information and a link to install the extension for my browser. I select the Install extension button.

Then, I navigate to the Chrome Web Store and install the browser extension.

After I have installed the browser extension, I sign in to my Amazon Q Business application using the same URL and credentials I use to access the web application.

Now, I can chat with Amazon Q Business apps whenever I visit any webpage. For example, I can ask it to summarize the current website for me.

The following image shows the result.

Application integration with Amazon Q Business
With Amazon Q Business, you can get AI-powered assistance and information not only when browsing, but also when collaborating with your teams. Now, you can integrate Amazon Q Business with supported third-party applications, making it an always-ready productivity and creativity teammate in your conversations.

To add third-party applications to Amazon Q Business, I need to navigate to the Integrations page and choose Add integration.

Here, I find all available integrations that I can use. For this demo, I select Slack.

I fill in all the required details, including the Slack workspace team ID, which you can obtain by following the steps outlined on the Slack documentation page.

After the integration is successfully created, I need to deploy this integration as a Slack bot. From the Integrations page, I select the integration and complete the integration process in the Slack platform. With all the required steps completed, now I can now add the app into my Slack workspace.

Here’s a quick video showing how I use this integration to interact with Amazon Q Business on Slack.

As someone who juggles multiple tools and platforms daily, this new capability unlocks various possibilities for me to improve my productivity. The ability to access AI assistance and perform cross-application tasks without leaving my current workspace helps me save time and maintain focus.

Additional things to know

Supported browser extensions – At launch, the Amazon Q Business browser extension supports Chromium-based web browsers such as Google Chrome and Microsoft Edge. It also supports the Mozilla Firefox web browser.
Application integration support – For third-party applications, at launch, Amazon Q Business integrations support Slack and Microsoft Teams.
Availability – This new capability is available in AWS Regions where Amazon Q Business is available.

Get started today and experience an exciting opportunity to enhance your productivity and streamline cross-application workflows. Learn more on the Amazon Q Business page.

Happy building,
— Donnie

Announcing Amazon FSx Intelligent-Tiering, a new storage class for FSx for OpenZFS

2024-12-02 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/announcing-amazon-fsx-intelligent-tiering-a-new-storage-class-for-fsx-for-openzfs/

When I speak to customers who are planning to migrate massive amounts of on-premises data to AWS, they tell me that they want to simplify their storage management, reduce their costs, and to make the data more accessible so that it can be used for analytics, machine learning training, genomics, and other use cases. Customers are already using Network Attached Storage (NAS) on-premises, and are looking for a cloud-based upgrade that offers similar capabilities including point-in-time snapshots, data clones, and user management.

AWS customers such as Amdocs, Vela Games, and Astera Labs have been running their mission-critical and performance-intensive NAS workloads like databases, game development and streaming, and semiconductor chip design on Amazon FSx for OpenZFS. They’ve been using the existing SSD storage class on FSx to provide the predictable, high performance these workloads need. However, many other customers have large data sets that are stored on HDD-based or hybrid SSD/HDD-based NAS storage on prem that find it cost-prohibitive to move their data sets to all-SSD storage. Additionally, these customers are finding it increasingly challenging and expensive to manage provisioned storage on prem for unpredictable data sets and avoid running out of space. And they are keeping their NAS data around for longer because it could have future value for building their next model, investment strategy, or product, but that means they need to spend more time and effort monitoring access patterns and moving data around between hot and cold storage media to optimize costs.

FSx Intelligent-Tiering
Taking all of this into account, I am happy to be able to tell you about the new Amazon FSx Intelligent-Tiering storage class, available today for use with Amazon FSx for OpenZFS file systems. The new storage class is priced 85% lower than the existing SSD storage class and 20% lower than traditional HDD-based deployments on premises, and brings full elasticity and intelligent tiering to NAS data sets.

Your data moves between three storage tiers (Frequent Access, Infrequent Access, and Archive) with no effort on your part, so you get automatic cost savings with no upfront costs or commitments. Here’s how the tiers work:

Frequent Access – Data that has been accessed within the last 30 days is stored in this tier.

Infrequent Access – Data that has been not been accessed for 30 to 90 days is stored in tier, at a 44% cost reduction from Frequent Access.

Archive – Data that has not been accessed for 90 or more days is stored in this tier, at a 65% cost reduction from Infrequent Access.

Regardless of the storage tier, your data is stored across multiple AWS Availability Zones (AZs) for redundancy and availability, and can be retrieved instantly in milliseconds.

There’s no need to manage or pre-provision storage, making this storage class a great fit for uses case such as genomics, financial data analytics, seismic imagery analysis, and machine learning where storage requirements can change dramatically over the course of days or weeks.

Along with the potential for cost savings, you get high performance: up to 400K IOPS and 20 GB/second of throughput for each OpenZFS file system, with a time-to-first-byte of tens of milliseconds for all data, regardless of storage class. You can also configure an SSD-based read cache (64 GiB to 512 TiB) to reduce the time-to-first-byte by 10x to 100x for cached data.

Creating a File System
I can create a file system using the AWS Management Console, CLI, API, or a AWS CloudFormation. From the Console I click Create file system to get started:

I choose Amazon FSx for OpenZFS and click Next:

Then I enter a name (jeff_fsx_openzfs_1) for my file system and select the Intelligent-Tiering storage class. I choose the desired Throughput capacity, and I select one of the three sizing mode options for the read cache, click Next, and confirm my choices in order to create my file system:

It is ready within minutes, and I can NFS mount it to my EC2 instance:

$ sudo mkfs /fsx_zfs
$ sudo mount -t nfs -o noatime,nfsvers=4.2,sync,nconnect=16,rsize=1048576,wsize=1048576 \
  fs-00fc74f020d1e6f4e.fsx.us-east-2.aws.internal:/fsx/ /fsx_zfs/

After I run a representative workload for a while I can look at the metrics and review the performance of my file system:

It appears that I have plenty of throughput, but my read cache may be larger than needed. I created it in Automatically Provisioned mode, which allocated 3200 GiB of cache. I can change that (and save some money) with a couple of clicks:

I can also change the throughput capacity as needed:

Amazon FSx NAS Features and Attributes
Let’s take a quick look at some of the features which make FSx for OpenZFS and the FSx Intelligent-Tiering storage class a great for for your NAS-level storage needs:

Built-in Backups – Amazon FSx automatically makes a daily backup of each file system during a specified backup window and retains them for a specified retention period. The backups are file-system consistent, highly durable, and incremental. You can also create backups on your own and retain them for as long as needed.

Point-In-Time Snapshots -You can create a read-only image of an OpenZFS volume at any time. The snapshots are stored within the file system and consume storage; they can be used to restore a volume, restore individual files and folders, or to create a new volume as either a clone or a full-copy.

Replication – You can replicate a point-in-time view of an OpenZFS volume to another volume across file systems, AWS Regions, and AWS accounts. FSx uses ZFS send/receive technology behind the scenes to perform this replication and automatically establishes and maintains network connectivity between file systems to handle interruptions and resume data transfer as needed.

Data Compression – You can enable ZSTD or LZ4 compression on your OpenZFS volumes to reduce storage cost and speed up data transfer.

User and Volume Quotas – You can limit the amount of storage consumed by an individual volume or user.

Things to Know
Here are a couple of things to keep in mind before we wrap up:

Regions – This new storage class is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), and Europe (Frankfurt, Ireland) AWS Regions.

Pricing – Pricing is based on the amount of primary storage consumed (GB/Month) and read cache provisioned (GB/Month). See the Amazon FSx for OpenZFS Pricing page for more information.

— Jeff;

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

2024-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/

Today, we’re announcing two new evaluation capabilities in Amazon Bedrock that can help you streamline testing and improve generative AI applications:

Amazon Bedrock Knowledge Bases now supports RAG evaluation (preview) – You can now run an automatic knowledge base evaluation to assess and optimize Retrieval Augmented Generation (RAG) applications using Amazon Bedrock Knowledge Bases. The evaluation process uses a large language model (LLM) to compute the metrics for the evaluation. With RAG evaluations, you can compare different configurations and tune your settings to get the results you need for your use case.

Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (preview) – You can now perform tests and evaluate other models with humanlike quality at a fraction of the cost and time of running human evaluations.

These new capabilities make it easier to go into production by providing fast, automated evaluation of AI-powered applications, shortening feedback loops and speeding up improvements. These evaluations assess multiple quality dimensions including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.

To make it easy and intuitive, the evaluation results provide natural language explanations for each score in the output and on console, and the scores are normalized from 0 to 1 for ease of interpretability. Rubrics are published in full with the judge prompts in the documentation so non-scientists can understand how scores are derived.

Let’s see how they work in practice.

Using RAG evaluations in Amazon Bedrock Knowledge Bases
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section. There, I see the new Knowledge Bases tab.

I choose Create, enter a name and a description for the evaluation, and select the Evaluator model that will compute the metrics. In this case, I use Anthropic’s Claude 3.5 Sonnet.

I select the knowledge base to evaluate. I previously created a knowledge base containing only the AWS Lambda Developer Guide PDF file. In this way, for the evaluation, I can ask questions about the AWS Lambda service.

I can evaluate either the retrieval function alone or the complete retrieve-and-generate workflow. This choice affects the metrics that are available in the next step. I choose to evaluate both retrieval and response generation and select the model to use. In this case, I use Anthropic’s Claude 3 Haiku. I can also use Amazon Bedrock Guardrails and adjust runtime inference settings by choosing the configurations link after the response generator model.

Now, I can choose which metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

Now, I select the dataset that will be used for evaluation. This is the JSONL file I prepared and uploaded to Amazon Simple Storage Service (Amazon S3) for this evaluation. Each line provides a conversation, and for each message there is a reference response.

{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"A trigger is a resource or configuration that invokes a Lambda function such as an AWS service."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda trigger?"}]}}]}
{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"An event is a JSON document defined by the AWS service or the application invoking a Lambda function that is provided in input to the Lambda function."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda event?"}]}}]}

I specify the S3 location in which to store the results of the evaluation. The evaluation job requires that the S3 bucket is configured with the cross-origin resource sharing (CORS) permissions described in the Amazon Bedrock User Guide.

For service access, I need to create or provide an AWS Identity and Access Management (IAM) service role that Amazon Bedrock can assume and that allows access to the Amazon Bedrock and Amazon S3 resources used by the evaluation.

After a few minutes, the evaluation has completed, and I browse the results. The actual duration of an evaluation depends on the size of the prompt dataset and on the generator and the evaluator models used.

At the top, the Metric summary evaluates the overall performance using the average score across all conversations.

After that, the Generation metrics breakdown gives me details about each of the selected evaluation metrics. My evaluation dataset was small (two lines), so there isn’t a large distribution to look at.

From here, I can also see example conversations and how they were rated. To view all conversations, I can visit the full output in the S3 bucket.

I’m curious why Helpfulness is slightly below one. I expand and zoom Example conversations for Helpfulness. There, I see the generated output, the ground truth that I provided with the evaluation dataset, and the score. I choose the score to see the model reasoning. According to the model, it would have helped to have more in-depth information. Models really are strict judges.

Comparing RAG evaluations
The result of a knowledge base evaluation can be difficult to interpret by itself. For this reason, the console allows comparing results from multiple evaluations to understand the differences. In this way, you can understand if you’re improving or not for the metrics you care about.

For example, I previously ran two other knowledge base evaluations. They’re related to knowledge bases with the same data sources but different chunking and parsing configurations and different embedding models.

I select the two evaluations and choose Compare. To be comparable in the console, the evaluations need to cover the same metrics.

In the At a glance tab, I see a visual comparison of the metrics using a spider chart. In this case, the results are not much different. The main difference is the Faithfulness score.

In the Evaluation details tab, I find a detailed comparison of the results for each metric, including the difference in scores.

Using LLM-as-a-judge in Amazon Bedrock Model Evaluation (preview)
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section of the navigation pane. After I choose Create, I select the new Automatic: Model as a judge option.

I enter a name and a description for the evaluation and select the Evaluator model that is used to generate evaluation metrics. I use Anthropic’s Claude 3.5 Sonnet.

Then, I select the Generator model, which is the model I want to evaluate. Model evaluation can help me understand if a smaller and more cost-effective model meets the needs of my use case. I use Anthropic’s Claude 3 Haiku.

In the next section I select the Metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

In the Datasets section I specify the Amazon S3 location where my evaluation dataset is stored and the folder in an S3 bucket where the results of the model evaluation job are stored.

For the evaluation dataset, I prepared another JSONL file. Each line provides a prompt and a reference answer. Note that the format is different compared to knowledge base evaluations.

{"prompt":"Write a 15 words summary of this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"AWS Fargate allows running containers without managing servers or clusters, simplifying container deployment and scaling."}
{"prompt":"Give me a list of the top 3 benefits from this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"- No need to manage servers or clusters.\n- Simplified infrastructure management.\n- Improved focus on application development."}

Finally, I can choose an IAM service role that gives Amazon Bedrock access to the resources used by this evaluation job.

I complete the creation of the evaluation. After a few minutes, the evaluation is complete. Similar to the knowledge base evaluation, the result starts with a Metrics Summary.

The Generation metrics breakdown details each metric, and I can look at details for a few sample prompts. I look at Helpfulness to better understand the evaluation score.

The prompts in the evaluation have been correctly processed by the model, and I can apply the results for my use case. If my application needs to manage prompts similar to the ones used in this evaluation, the evaluated model is a good choice.

Things to know
These new evaluation capabilities are available in preview in the following AWS Regions:

RAG evaluation in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris), and South America (São Paulo)
LLM-as-a-judge in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Seoul, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Zurich), and South America (São Paulo)

Note that the available evaluator models depend on the Region.

Pricing is based on the standard Amazon Bedrock pricing for model inference. There are no additional charges for evaluation jobs themselves. The evaluator models and models being evaluated are billed according to their normal on-demand or provisioned pricing. The judge prompt templates are part of the input tokens, and those judge prompts can be found in the AWS documentation for transparency.

The evaluation service is optimized for English language content at launch, though the underlying models can work with content in other languages they support.

To get started, visit the Amazon Bedrock console. To learn more, you can access the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

— Danilo

Newly enhanced Amazon Connect adds generative AI, WhatsApp Business, and secure data collection

2024-12-02 Elizabeth Fuentes

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/newly-enhanced-amazon-connect-adds-generative-ai-whatsapp-business-and-secure-data-collection/

Today, Amazon Connect introduces a set of new features that help businesses enhance their contact center operations through generative AI, advanced security features, and streamlined bot management. These innovations help businesses deliver better customer experiences by creating more time and space for meaningful human interactions, while maintaining security and compliance.

Contact center managers continually face challenges in optimizing self-service resolution rates, evaluating agent performance efficiently, and maintaining data privacy compliance. Additionally, creating and managing conversational AI experiences often requires specialized expertise and complex integrations across multiple services.

To address these challenges, Amazon Connect introduced key features such as generative AI–powered customer segmentation for targeted campaigns, native WhatsApp Business messaging for omnichannel support, secure collection of sensitive customer data in chat interactions, simplified conversational AI bot management in the Amazon Connect interface, and new enhancements to Amazon Q in Connect. Amazon Connect also added new analytics capabilities through Amazon Connect Contact Lens to help optimize bot performance and contact center operations.

Here are the new capabilities that will help you create more personalized and efficient customer experiences while maintaining the highest standards of data security and operational excellence.

Generative AI powered features
Amazon Connect integrates new generative AI capabilities to automate and enhance customer interactions, enabling smarter targeting and more efficient contact center management.

Generative AI segmentation and trigger-based campaigns – Uses generative AI–powered assistance to create customer segments using conversational prompts. This allows businesses to create precise customer segments using natural language descriptions, making it easier to identify and reach specific customer groups. Trigger campaigns enable organizations to communicate with their customers based on specific customer events, such as cart abandonment.

You can also start with ready-to-use suggestions.

Simplify conversational AI bot creation and enhance them with Amazon Q in Connect – Create, edit, and manage conversational AI bots powered by Amazon Lex directly within the Amazon Connect web interface. You can now enhance these bots with Amazon Q in Connect, a generative AI–powered assistant for customer service. Amazon Q in Connect now supports end-customer self-service interactions across interactive voice response (IVR) and digital channels, in addition to assisting contact center agents with recommended responses and actions.

This integration extends beyond traditional voice and chatbot Amazon Lex capabilities by providing advanced conversational abilities via large language models (LLMs). The system intelligently searches configured knowledge bases, customer information, web content, and third-party application data to respond to customer questions when they don’t match predefined intents. Administrators can set custom guardrails for their instance, defining restrictions on response generation and monitoring Amazon Q in Connect performance.

Generative AI–powered automated evaluations: Supervisors can automatically evaluate up to 100 percent of contacts using generative AI.

Generative AI–powered contact categorization: Improves existing semantic match functionality using natural language intents.

Improved interfaces and tools
Enhanced capabilities for bot management and monitoring, simplifying the creation and optimization of automated experiences.

Amazon Connect for WhatsApp Business messaging – Natively integrate with WhatsApp Business messaging so customers can receive support over WhatsApp in addition to existing Amazon Connect channels such as voice, SMS, chat, and Apple Messages for Business. This addition to Amazon Connect omnichannel capabilities helps businesses meet customers on their preferred communication channel while maintaining consistent service delivery and management within the Amazon Connect application.

Contact Lens conversational AI bot dashboards – Offers analytics to monitor the performance of your conversational AI bots built in Amazon Connect.

Self-service voice (IVR) recording and interaction logs on contact details – Provides comprehensive records of self-service interactions, including audio recordings.

Improved intraday forecasts – Allows comparison of intraday forecasts against previously published forecasts.

Salesforce Contact Center with Amazon Connect (Preview) – Natively integrates the digital channels and unified routing of Amazon Connect into Salesforce customer relationship management (CRM) system. This new offering allows companies to use a single routing and workflow system for both Amazon Connect and Salesforce channels, intelligently directing calls, chats, and cases to the appropriate self-service or agent interaction. If you’re interested, sign up to join the preview.

Enhanced security for chat
New features that enhance security and compliance in chat interactions, enabling secure handling of sensitive information.

Collection of sensitive customer data within chats – Amazon Connect chat and messaging now includes a data privacy option that enables secure handling of sensitive customer information during chat interactions. This feature protects personally identifiable information (PII) and payment card industry (PCI) data, promoting compliance with data protection regulations.

Key benefits
The latest features of Amazon Connect combine generative AI, enhanced security, and streamlined bot management to help businesses:

Transform customer experience – Amazon Connect elevates customer interactions through AI–powered segmentation, enabling personalized engagement strategies. The new WhatsApp Business messaging expands omnichannel support capabilities, meeting customers on their preferred channel. Additionally, advanced bot capabilities, including Amazon Q in Connect, enhance self-service resolution rates, delivering more efficient customer experiences.

Enhance security and operations – Contact centers can now strengthen their security posture with PCI-compliant chat interactions while maintaining operational efficiency. Custom AI guardrails promote appropriate response generation, while the simplified bot management interface eliminates the need for specialized expertise. Analytics and forecasting capabilities provide comprehensive performance monitoring, enabling data-driven decision-making for optimal contact center operations.

Pricing and availability – These features are available today in all AWS Regions where Amazon Connect is supported. For pricing, visit the Amazon Connect Pricing. For implementation guidance, visit the Amazon Connect documentation.

— Eli

Securely share AWS resources across VPC and account boundaries with PrivateLink, VPC Lattice, EventBridge, and Step Functions

2024-12-02 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/securely-share-aws-resources-across-vpc-and-account-boundaries-with-privatelink-vpc-lattice-eventbridge-and-step-functions/

At some point, every AWS customer tells me that they have the desire to move into the future as quickly as possible. They want to simplify their modernization efforts, drive growth, and adapt to the cloud, while also reducing costs as they proceed. These customers typically have a large suite of legacy applications, possibly running on-premises, that are running on diverse technology stacks managed by disparate parts of the organization. To make things even more challenging, these organizations often have to meet stringent security and compliance requirements.

Prepare to Share
You can now share AWS resources such as Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS) container services, and your own HTTPS services across Amazon Virtual Private Cloud (Amazon VPC) and AWS account boundaries and use them to build event-driven apps via Amazon EventBridge and orchestrate workflows with AWS Step Functions. You can update your existing workloads, connect your modern cloud-native apps to on-premises legacy systems, with all communication routed across private endpoints and networks.

These new features build on Amazon VPC Lattice and AWS PrivateLink, and give you a lot of new options to design and control your network, along with some cool new ways to integrate and orchestrate across all of your technology stacks. For example, you can build hybrid event-driven architectures that make use of your existing on-premises applications.

Today, some customers use AWS Lambda functions or Amazon Simple Queue Service (Amazon SQS) queues to transfer data into VPCs. This undifferentiated heavy lifting can now be replaced with a simpler and more efficient solution.

Bringing all of this together, you get a set of services that will help you to accelerate your modernization efforts and simplify integration between your applications, regardless of where they are situated. EventBridge and Step Functions work hand-in-hand with PrivateLink and VPC Lattice to enable integration of public and private HTTPS-based applications into your event-driven architectures and workflows.

Here are the essential terms and concepts:

Resource Owner VPC – A VPC that has resources to be shared. The owner of this VPC creates a Resource Gateway with one or more associated Resource Configurations, then uses AWS Resource Access Manager (RAM) to share the Resource Configuration with the Resource Consumer, such as another AWS account, or a developer building event-driven architectures and workflows using EventBridge and Step Functions. Let’s define the Resource Owner as the person (maybe you) in your organization who is responsible for the care and feeding of this VPC.

Resource Gateway – Provides a point of ingress to a VPC so that clients can access resources in the Resource Owner VPC, as indicated by the Resource Configurations that are associated with the gateway. One Resource Gateway can make multiple resources available.

Resource – This can be a HTTPS endpoint, a database, a database cluster, an EC2 instance, an Application Load Balancer in front of multiple EC2 instances, an ECS service discoverable via AWS Cloud Map, an Amazon Elastic Kubernetes Service (Amazon EKS) service behind a Network Load Balancer, or a legacy service running in the Resource Owner VPC or running in on-premises across AWS Site-to-Site VPN or AWS Direct Connect.

Resource Configuration – Defines a set of resources that can be accessed through a particular Resource Gateway. The resources can be referenced by IP address, DNS name, or (for AWS resources) an ARN.

Resource Consumer – The person in your organization who is responsible for building applications that connect with and consume services provided by resources in a Resource Owner VPC.

Sharing Resources
You can put all of this power to use in a lot of different ways; I’ll focus on one for this post.

First, I will play the role of the Resource Owner. I click Resource gateways in the VPC Console, see that I don’t have a gateway, and click Create resource gateway to get started:

I assign a name (main-rg) and an IP address type, then pick the VPC and the private subnets where the gateway will have a presence (this is a one-shot selection that cannot be changed without creating a new Resource Gateway). I also choose up to five security groups to control inbound traffic:

I scroll down, assign any desired tags, and click Create resource gateway to proceed:

My new gateway is active within seconds; I nod in appreciation and click Create resource configuration to move ahead:

Now I need to create my first Resource Configuration. Let’s say that I have a HTTPS service running on an EC2 instance on a private subnet in my Resource Owner VPC. I assign a DNS name to the service and use a Amazon Route 53 Alias record which returns the IP address of the instance:

I am using a public hosted zone in this example. We already working on support for private hosted zones.

With DNS all set up, I click Create resource configuration to move ahead. I enter a name (rc-service1), choose Resource as the type, and select the Resource Gateway that I created earlier:

I scroll down and define my EC2 instance as a resource, entering the DNS name and setting up sharing for ports 80 and 443:

Now I take a small detour, and hop over to the RAM Console to create a Resource Share so that other AWS accounts can access the resources (this is optional, and only relevant for cross-account scenarios). I could create one Resource Share for each service, but in most cases I would create one share and use it to package up a collection of related services. I’ll do that, and call it shared-services:

Returning from my detour, I refresh the list of resource shares, pick the one that I created, and click Create resource configuration:

The resource configuration is ready within seconds.

Recap and Planning Time
Before moving ahead, let’s do a quick recap and make some plans. Here’s what I (in the role of Resource Provider) have so far:

MainVPC – My Resource Owner VPC.
main-rg – A Resource Gateway in MainVPC.
rc-service1 – The Resource Configuration for main-rg.
service1 – An HTTPS service hosted on an EC2 instance in a private subnet of MainVPC, at a fixed IP address.

Ok, so what’s next?

Share – This is the first and most obvious use use. I can use AWS Resource Access Manager (RAM) to share the Resource Configuration with another AWS account and access the service from another VPC. On the other side (as the Resource Consumer), I take a couple of quick steps to connect to the service that has been shared with me:

Service Network – I can create a service network, add the Resource Configuration to the Service Network, and create a VPC endpoint in a VPC to connect to the service network.
Endpoint – I can create a VPC endpoint in a VPC and access the shared resource via the endpoint.

Modernize – I can remove my legacy Lambda or SQS integration to get rid of some undifferentiated heavy lifting.

Build – I can use EventBridge and Step Functions to build event-driven architectures and orchestrate applications. I’ll take this option!

Accessing Private Resources with EventBridge and Step Functions
EventBridge and Step Functions already make it easy access to public HTTPS endpoints such as those from SaaS providers like Slack, Salesforce, and Adobe. With today’s launch, consuming private HTTPS services is just as easy.

As a Resource Consumer, I simply create an EventBridge connection, reference a Resource Configuration that was shared with me, and call the service from my event-driven application. Everything that I already know still applies, and I now have the new-found power to access private services.

To create the EventBridge connection, I open the EventBridge console and click Connections in the Integration menu:

I review my existing connections (none so far), then click Create connection to move ahead:

I enter a name (MyService1) and a description for my connection, select Private as the API type, and choose the Resource Configuration that I created earlier:

Scrolling down, I need to configure the authorization for the service that I am connecting to. I select Custom configuration and Basic authorization, and enter the Username and Password for my service. I also add Action=Forecast to the query string (as you can see there are a lot of options for authorization), and click Create:

The connection is created and ready within minutes. Then I use it in my Step Functions workflows by using the HTTP Task, selecting the connection, entering the URL of my API endpoint, and choosing an HTTP method:

And that’s all there is to it: your Step Functions workflows can now make use of Private Resources!

I can also use this connection as an EventBridge API destination target in Event Buses and Pipes.

Things to Know
Here a couple of things to know about these cool new features:

Pricing – Existing pricing for Step Functions, EventBridge, PrivateLink, and VPC Lattice apply including the per-GB charge for data transfer into the VPC.

Regions – You can create and use Resource Gateways and Resource Configurations in 21 AWS Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Middle East (Bahrain), and South America (São Paulo).

In the Works – As I noted earlier, we are already working on support for private hosted zones. We are also planning to support access to other types of AWS resources through EventBridge and Step Functions .

— Jeff;

New AWS Security Incident Response helps organizations respond to and recover from security events

2024-12-02 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/new-aws-security-incident-response-helps-organizations-respond-to-and-recover-from-security-events/

Today, we announce AWS Security Incident Response, a new service designed to help organizations manage security events quickly and effectively. The service is purpose-built to help customers prepare for, respond to, and recover from various security events, including account takeovers, data breaches, and ransomware attacks.

Security Incident Response automates the triage and investigation of security findings from Amazon GuardDuty and integrated third party threat detection tools through AWS Security Hub. It facilitates communication and coordination and provides 24/7 access to security experts from the AWS Customer Incident Response Team (CIRT) who can assist during security events. The service aims to provide customers with more comprehensive support across the phases of incident response lifecycle, from preparation to detection, analysis, and recovery.

Security events are becoming more pervasive and complex for customers. Security teams often face an overwhelming number of daily alerts, leading to potential misplaced priorities of resources and reduced effectiveness. Manual investigation of findings strains resources and may cause customers to overlook critical security alerts. Additionally, coordinating responses across multiple stakeholders, managing permissions in various environments, and documenting actions complicate the process. There is an opportunity to better support customers and remove various points of undifferentiated heavy lifting that customers face during security events.

Key capabilities

AWS Security Incident Response addresses these challenges through three main core capabilities that help customers effectively prepare for, respond to, and recover from security events :

Security Incident Response automatically triages security findings from GuardDuty and supported third-party tools through Security Hub to identify high-priority incidents requiring immediate attention. The service uses automation and customer-specific information to filter and suppress security findings based on expected behavior, helping teams focus on critical security alerts.
The service simplifies incident response by offering preconfigured notification rules and permission settings that can be extended to both internal and external stakeholders, including third-party security providers. Customers can access a centralized console with integrated features, such as messaging, secure data transfer, and video conference scheduling, all accessible through service APIs or the AWS Management Console. Additional capabilities include automated case history tracking and reporting, allowing security teams to focus on remediation and recovery efforts.
Customers gain access to self-service investigation tools and 24/7 support from the AWS CIRT. Customers also have the ability to handle incidents independently or interoperate with third-party security vendors. These options allow customers to choose, manage, and conduct their incident response based on their specific needs and requirements.

In addition to the core capabilities, customers benefit from a service dashboard with metrics that help them measure, monitor, and improve their security incident response performance over time. These metrics include mean time to resolution (MTTR), number of active and closed cases within a specific period, number of triaged findings, and other key performance indicators. Customers can access these metrics instantly without needing to collate information or create one-time reports.

How to get started

The onboarding process can be completed in a few steps. Security Incident Response integrates with AWS Organizations to provide comprehensive security coverage for your current and future accounts with an added layer of security. Customers begin by selecting a central account within their organization, where all active and historical security events can be created and managed.

Next, customers can enable the proactive incident response feature, which creates service-level permissions allowing Security Incident Response to monitor and investigate findings from GuardDuty or third-party detection tools through Security Hub. These findings are then automatically sorted and remediated using service automation and customer-specific data, including common IP addresses, AWS Identity and Access Management (IAM) principals, and other relevant attributes. For findings that can’t be automatically remediated, Security Incident Response creates a security case and notifies the appropriate stakeholders within the customer’s organization.

Customers can also configure permissions for the service to execute containment actions by deploying specific IAM roles. By using these Security Incident Response containment capabilities, customers can achieve faster incident response times and potentially minimize the impact of security events on accounts and resources.

Availability and getting started

AWS Security Incident Response is now available in 12 AWS Regions globally: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Seoul, Singapore, Sydney, Tokyo), Canada (Central), and Europe (Frankfurt, Ireland, London, Stockholm).

Learn more about AWS Security Incident Response by visiting the product page.

–Betty

New APIs in Amazon Bedrock to enhance RAG applications, now available

2024-12-01 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/new-apis-in-amazon-bedrock-to-enhance-rag-applications-now-available/

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Knowledge Bases is a fully managed service that empowers developers to create highly accurate, low latency, secure, and customizable generative AI applications cost effectively. Amazon Bedrock Knowledge Bases connects foundation models (FMs) to a company’s internal data using Retrieval Augmented Generation (RAG). RAG helps FMs deliver more relevant, accurate, and customized responses.

In this post, we detail two announcements related to Amazon Bedrock Knowledge Bases:

Support for custom connectors and ingestion of streaming data.
Support for reranking models.

Support for custom connectors and ingestion of streaming data
Today, we announced support for custom connectors and ingestion of streaming data in Amazon Bedrock Knowledge Bases. Developers can now efficiently and cost-effectively ingest, update, or delete data directly using a single API call, without the need to perform a full sync with the data source periodically or after every change. Customers are increasingly developing RAG-based generative AI applications for various use cases such as chatbots and enterprise search. However, they face challenges in keeping the data up-to-date in their knowledge bases so that the end users of the applications always have access to the latest information. The current process of data synchronization is time-consuming, requiring a full sync every time new data is added or removed. Customers also face challenges in integrating data from unsupported sources, such as Google Drive or Quip, into their knowledge base. Typically, to make this data available in Amazon Bedrock Knowledge Bases, they must first move it to a supported source, such as Amazon Simple Storage Service (Amazon S3), and then start the ingestion process. This extra step not only creates additional overhead but also introduces delays in making the data accessible for querying. Additionally, customers who want to use streaming data (for example, news feeds or Internet of Things (IoT) sensor data) face delays in real-time data availability due to the need to store the data in a supported data source before ingestion. As customers scale up their data, these inefficiencies and delays can become significant operational bottlenecks and increase costs. Keeping all these challenges in mind, it’s important to have a more efficient and cost-effective way to ingest and manage data from various sources to ensure that the knowledge base is up-to-date and available for querying in real-time. With support for custom connector and ingestion of streaming data, customers can now use direct APIs to efficiently add, check the status of, and delete data, without the need to list and sync the entire dataset.

How it works
Custom connectors and ingestion of streaming data can be accessed using the Amazon Bedrock console or the AWS SDK.

Add Document
The Add Document API is used to add new files to the knowledge base without having to perform a full sync after the document has been added. Customers can add content by specifying the Amazon S3 path of the document, the text content to add as a document to the source, or as a Base64-encoded string. For example:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json
{
  "documents": [{
    "content": {
      "dataSourceType": "CUSTOM",
      "custom": {
        "customDocumentIdentifier": {
          "id": "MyDocument"
        },
        "inlineContent": {
          "textContent": {
            "data": "Hello world!"
          },
          "type": "TEXT"
        },
        "sourceType": "IN_LINE"
      }
    }
  }]
}

Delete Document
The Delete Document API is used to delete data from the knowledge base without needing to perform a full sync after the document has been deleted. For example:

POST /knowledgebases/KB12345678/datasources/DS12345678/documents/deleteDocuments/ HTTP/1.1
Content-type: application/json
{
  "documentIdentifiers": [{
    "custom": {
      "id": "MyDocument"
    },
    "dataSourceType": "CUSTOM"
  }]
}

List Document(s)
The List Document API returns a list of records that match the criteria that is specified in the request parameters. For example:
```
POST /knowledgebases/KB12345678/datasources/DS12345678/documents/ HTTP/1.1
Content-type: application/json 
{
  "maxResults": 10
}
```

Get Document
The Get Document API returns information about the document(s) that match the criteria that is specified in the request parameters. For example:

POST /knowledgebases/KB12345678/datasources/DS12345678/documents/getDocuments/ HTTP/1.1
Content-type: application/json
{
  "documentIdentifiers": [{
    "custom": {
      "id": "MyDocument"
    },
    "dataSourceType": "CUSTOM"
  }]
}

Now available
Support for custom connectors and ingestion of streaming data in Amazon Bedrock Knowledge Bases is available today in all AWS Regions where Amazon Bedrock Knowledge Bases is available. Check the Region list for details and future updates. To learn more about Amazon Bedrock Knowledge Bases, visit the Amazon Bedrock product page. For pricing details, review the Amazon Bedrock pricing page.

Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

Support for reranking models
Today we also announced the new Rerank API in Amazon Bedrock to offer developers a way to use reranking models to enhance the performance of their RAG-based applications by improving the relevance and accuracy of responses. Semantic search, supported by vector embeddings, embeds documents and queries into a semantic high-dimension vector space where texts with related meanings are nearby in the vector space and therefore semantically similar, so that it returns similar items even if they don’t share any words with the query. Semantic search is used in RAG applications because the relevance of retrieved documents to a user’s query plays a critical role in providing accurate responses and RAG applications retrieve a range of relevant documents from the vector store.

However, semantic search has limitations in prioritizing the most suitable documents based on user preferences or query context especially when the user query is complex, ambiguous, or involves nuanced context. This can lead to retrieving documents that are only partially relevant to the user’s question. This leads to another challenge where proper citation and attribution of sources is not attributed to the correct sources, leading to loss of trust and transparency in the RAG-based application. To address these limitations, future RAG systems should prioritize developing robust ranking algorithms that can better understand user intent and context. Additionally, it is important to focus on improving source credibility assessment and citation practices to confirm the reliability and transparency of the generated responses.

Advanced reranking models solve for these challenges by prioritizing the most relevant content from a knowledge base for a query and additional context to ensure that foundation models receive the most relevant content, which leads to more accurate and contextually appropriate responses. Reranking models may reduce response generation costs by prioritizing the information that is sent to the generation model.

How it works
At launch, we’re supporting Amazon Rerank 1.0 and Cohere Rerank 3.5 reranking models. For the walkthrough, I will use the Amazon Rerank 1.0 model, I will start by requesting access to this model.

Once access has been granted, I create a knowledge base using the existing Amazon Bedrock Knowledge Bases Console experience (an API process is also available as an alternative). The knowledge base contains two data sources; a music playlist, and a list of films.

As soon as the knowledge base has been created I edit the Service Role to add the policy that contains the bedrock:Rerank action. The API takes the user query as the input along with the list of documents that needs to be reranked. The output will be a reranked prioritized list of documents.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:us-west-2::foundation-model/amazon.rerank-v1:0"
            ]
        },
        {
            "Sid": "Statement2",
            "Effect": "Allow",
            "Action": [
                "bedrock:Rerank"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

The last step is to sync the data sources to index their contents for searching. A sync can take between a few minutes to a few hours.

The knowledge base is ready for use. The RetrieveAndGenerate API reranks the results retrieved from the vector datastore based on their relevance with the query.

To contrast, I ran the same query against the same data in a separate account that doesn’t have the Rerank API. The outcome is that results aren’t reranked on their relevance with the query. This could affect performance and compromise the accuracy of the responses.

Now available
The Rerank API in Amazon Bedrock is available today in the following AWS Regions: US West (Oregon), Canada (Central), Europe (Frankfurt), and Asia Pacific (Tokyo). Check the Region list for details and future updates. Rerank API can be used independently to rerank documents even if you are not using Amazon Bedrock Knowledge Bases. To learn more about Amazon Bedrock Knowledge Bases, visit the Amazon Bedrock product page. For pricing details, review the Amazon Bedrock pricing page.

Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

– Veliswa.

Noise

Tag Archives: launch

Introducing Amazon Nova: Frontier intelligence and industry leading price performance

Introducing multi-agent collaboration capability for Amazon Bedrock (preview)

Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview)

Build faster, more cost-efficient, highly accurate models with Amazon Bedrock Model Distillation (preview)

Introducing queryable object metadata for Amazon S3 buckets (preview)

New Amazon S3 Tables: Storage optimized for analytics workloads

Amazon EC2 Trn2 Instances and Trn2 UltraServers for AI/ML training and inference are now available

New Amazon EC2 P5en instances with NVIDIA H200 Tensor Core GPUs and EFAv3 networking

Top announcements of AWS re:Invent 2024

Analytics

Application Integration

Business Applications

Compute

Containers

Database

Generative AI / Machine Learning

Management & Governance

Migration & Transfer Services

Security, Identity, & Compliance

Storage

Introducing Amazon GuardDuty Extended Threat Detection: AI/ML attack sequence identification for enhanced cloud security

Container Insights with enhanced observability now available in Amazon ECS

AWS Clean Rooms now supports multiple clouds and data sources

New physical AWS Data Transfer Terminals let you upload to the cloud faster

Enhance your productivity with new extensions and integrations in Amazon Q Business

Announcing Amazon FSx Intelligent-Tiering, a new storage class for FSx for OpenZFS

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

Newly enhanced Amazon Connect adds generative AI, WhatsApp Business, and secure data collection

Securely share AWS resources across VPC and account boundaries with PrivateLink, VPC Lattice, EventBridge, and Step Functions

New AWS Security Incident Response helps organizations respond to and recover from security events

New APIs in Amazon Bedrock to enhance RAG applications, now available

The collective thoughts of the interwebz