All posts by Antje Barth

Amazon Bedrock Guardrails now supports multimodal toxicity detection with image support (preview)

2024-12-04 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-bedrock-guardrails-now-supports-multimodal-toxicity-detection-with-image-support/

Today, we’re announcing the preview of multimodal toxicity detection with image support in Amazon Bedrock Guardrails. This new capability detects and filters out undesirable image content in addition to text, helping you improve user experiences and manage model outputs in your generative AI applications.

Amazon Bedrock Guardrails helps you implement safeguards for generative AI applications by filtering undesirable content, redacting personally identifiable information (PII), and enhancing content safety and privacy. You can configure policies for denied topics, content filters, word filters, PII redaction, contextual grounding checks, and Automated Reasoning checks (preview), to tailor safeguards to your specific use cases and responsible AI policies.

With this launch, you can now use the existing content filter policy in Amazon Bedrock Guardrails to detect and block harmful image content across categories such as hate, insults, sexual, and violence. You can configure thresholds from low to high to match your application’s needs.

This new image support works with all foundation models (FMs) in Amazon Bedrock that support image data, as well as any custom fine-tuned models you bring. It provides a consistent layer of protection across text and image modalities, making it easier to build responsible AI applications.

Tero Hottinen, VP, Head of Strategic Partnerships at KONE, envisions the following use case:

In its ongoing evaluation, KONE recognizes the potential of Amazon Bedrock Guardrails as a key component in protecting gen AI applications, particularly for relevance and contextual grounding checks, as well as the multimodal safeguards. The company envisions integrating product design diagrams and manuals into its applications, with Amazon Bedrock Guardrails playing a crucial role in enabling more accurate diagnosis and analysis of multimodal content.

Here’s how it works.

Multimodal toxicity detection in action
To get started, create a guardrail in the AWS Management Console and configure the content filters for either text or image data or both. You can also use AWS SDKs to integrate this capability into your applications.

Create guardrail
On the console, navigate to Amazon Bedrock and select Guardrails. From there, you can create a new guardrail and use the existing content filters to detect and block image data in addition to text data. The categories for Hate, Insults, Sexual, and Violence under Configure content filters can be configured for either text or image content or both. The Misconduct and Prompt attacks categories can be configured for text content only.

After you’ve selected and configured the content filters you want to use, you can save the guardrail and start using it to build safe and responsible generative AI applications.

To test the new guardrail in the console, select the guardrail and choose Test. You have two options: test the guardrail by choosing and invoking a model or to test the guardrail without invoking a model by using the Amazon Bedrock Guardrails independent ApplyGuardail API.

With the ApplyGuardrail API, you can validate content at any point in your application flow before processing or serving results to the user. You can also use the API to evaluate inputs and outputs for any self-managed (custom), or third-party FMs, regardless of the underlying infrastructure. For example, you could use the API to evaluate a Meta Llama 3.2 model hosted on Amazon SageMaker or a Mistral NeMo model running on your laptop.

Test guardrail by choosing and invoking a model
Select a model that supports image inputs or outputs, for example, Anthropic’s Claude 3.5 Sonnet. Verify that the prompt and response filters are enabled for image content. Next, provide a prompt, upload an image file, and choose Run.

In my example, Amazon Bedrock Guardrails intervened. Choose View trace for more details.

The guardrail trace provides a record of how safety measures were applied during an interaction. It shows whether Amazon Bedrock Guardrails intervened or not and what assessments were made on both input (prompt) and output (model response). In my example, the content filters blocked the input prompt because they detected insults in the image with a high confidence.

Test guardrail without invoking a model
In the console, choose Use Guardrails independent API to test the guardrail without invoking a model. Choose whether you want to validate an input prompt or an example of a model generated output. Then, repeat the steps from before. Verify that the prompt and response filters are enabled for image content, provide the content to validate, and choose Run.

I reused the same image and input prompt for my demo, and Amazon Bedrock Guardrails intervened again. Choose View trace again for more details.

Join the preview
Multimodal toxicity detection with image support is available today in preview in Amazon Bedrock Guardrails in the US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Tokyo), Europe (Frankfurt, Ireland, London), and AWS GovCloud (US-West) AWS Regions. To learn more, visit Amazon Bedrock Guardrails.

Give the multimodal toxicity detection content filter a try today in the Amazon Bedrock console and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Antje

Introducing the next generation of Amazon SageMaker: The center for all your data, analytics, and AI

2024-12-03 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/introducing-the-next-generation-of-amazon-sagemaker-the-center-for-all-your-data-analytics-and-ai/

Today, we’re announcing the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. The all-new SageMaker includes virtually all of the components you need for data exploration, preparation and integration, big data processing, fast SQL analytics, machine learning (ML) model development and training, and generative AI application development.

The current Amazon SageMaker has been renamed to Amazon SageMaker AI. SageMaker AI is integrated within the next generation of SageMaker while also being available as a standalone service for those who wish to focus specifically on building, training, and deploying AI and ML models at scale.

Highlights of the new Amazon SageMaker
At its core is SageMaker Unified Studio (preview), a single data and AI development environment. It brings together functionality and tools from the range of standalone “studios,” query editors, and visual tools that we have today in Amazon Athena, Amazon EMR, AWS Glue, Amazon Redshift, Amazon Managed Workflows for Apache Airflow (MWAA), and the existing SageMaker Studio. We’ve also integrated Amazon Bedrock IDE (preview), an updated version of Amazon Bedrock Studio, to build and customize generative AI applications. In addition, Amazon Q provides AI assistance throughout your workflows in SageMaker.

Here’s a list of key capabilities:

Amazon SageMaker Unified Studio (preview) – Build with all your data and tools for analytics and AI in a single environment.
Amazon SageMaker Lakehouse – Unify data across Amazon Simple Storage Service (Amazon S3) data lakes, Amazon Redshift data warehouses, and third-party and federated data sources with Amazon SageMaker Lakehouse.
Data and AI Governance – Securely discover, govern, and collaborate on data and AI with Amazon SageMaker Catalog, built on Amazon DataZone.
Data Processing – Analyze, prepare, and integrate data for analytics and AI using open source frameworks on Amazon Athena, Amazon EMR, and AWS Glue.
Model development – Build, train, and deploy ML and foundation models (FMs) with fully managed infrastructure, tools, and workflows with Amazon SageMaker AI.
Generative AI app development – Build and scale generative AI applications with Amazon Bedrock.
SQL analytics – Gain insights with Amazon Redshift, the most price-performant SQL engine.

In this post, I give you a quick tour of the new SageMaker Unified Studio experience and how to get started with data processing, model development, and generative AI app development.

Working with Amazon SageMaker Unified Studio (preview)
With SageMaker Unified Studio, you can discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, and generative AI app building, in a single governed environment.

An integrated SQL editor lets you query data from multiple sources, and a visual extract, transform, and load (ETL) tool simplifies the creation of data integration and transformation workflows. New unified Jupyter notebooks enable seamless work across different compute services and clusters. With the new built-in data catalog functionality, you can find, access, and query data and AI assets across your organization. Amazon Q is integrated to streamline tasks across the development lifecycle.

Let’s explore the individual capabilities in more detail.

Data processing
SageMaker integrates with SageMaker Lakehouse and lets you analyze, prepare, integrate, and orchestrate your data in a unified experience. You can integrate and process data from various sources using the provided connectivity options.

Start by creating a project in SageMaker Unified Studio, choosing the SQL analytics or data analytics and AI-ML model development project profile. Projects are a place to collaborate with your colleagues, share data, and use tools to work with data in a secure way. Project profiles in SageMaker define the preconfigured set of resources and tools that are provisioned when you create a new project. In your project, choose Data in the left menu and start adding data sources.

The built-in SQL query editor lets you query your data stored in data lakes, data warehouses, databases, and applications directly within SageMaker Unified Studio. In the top menu of SageMaker Unified Studio, select Build and choose Query Editor to get started. Also, try creating SQL queries using natural language with Amazon Q while you’re at it.

You should also explore the built-in visual ETL tool to create data integration and transformation workflows using a visual, drag-and-drop interface. In the top menu, select Build and choose Visual ETL flow to get started.

If Amazon Q is enabled, you can also use generative AI to author flows. Visual ETL comes with a wide range of data connectors, pre-built transformations, and features such as scheduling, monitoring, and data previewing to streamline your data workflows.

Model development
SageMaker Unified Studio includes capabilities from SageMaker AI, which provides infrastructure, tools, and workflows for the entire ML lifecycle. From the top menu, select Build to access tools for data preparation, model training, experiment tracking, pipeline creation, and orchestration. You can also use these tools for model deployment and inference, machine learning operations (MLOps) implementation, model monitoring and evaluation, as well as governance and compliance.

To start your model development, create a project in SageMaker Unified Studio using the data analytics and AI-ML model development project profile and explore the new unified Jupyter notebooks. In the top menu, select Build and choose JupyterLab. You can use the new unified notebooks to seamlessly work across different compute services and clusters. You can use these notebooks to switch between environments without leaving your workspace, streamlining your model development process.

You can also use Amazon Q Developer to assist with tasks such as code generation, debugging, and optimization throughout your model development process.

Generative AI app development
Use the new Amazon Bedrock IDE to develop generative AI applications within Amazon SageMaker Unified Studio. The Amazon Bedrock IDE includes tools to build and customize generative AI applications using FMs and advanced capabilities such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows to create tailored solutions aligned with your requirements and responsible AI guidelines.

Choose Discover in the top menu of SageMaker Unified Studio to browse Amazon Bedrock models or experiment with the model playgrounds.

Create a project using the GenAI Application Development profile to start building generative AI applications. Choose Build in the top menu of SageMaker Unified Studio and select Chat agent.

With the Amazon Bedrock IDE, you can build chat agents and create knowledge bases from your proprietary data sources with just a few clicks, enabling Retrieval-Augmented Generation (RAG). You can add guardrails to promote safe AI interactions and create functions to integrate with any system. With built-in model evaluation features, you can test and optimize your AI applications’ performance while collaborating with your team. Design flows for deterministic genAI-powered workflows, and when ready, share your applications or prompts within the domain or export them for deployment anywhere—all while maintaining control of your project and domain assets.

For a detailed description of all Amazon SageMaker capabilities, check the SageMaker Unified Studio User Guide.

Getting started
To begin using SageMaker Unified Studio, administrators need to complete several setup steps. This includes setting up AWS IAM Identity Center, configuring the necessary virtual private cloud (VPC) and AWS Identity and Access Management (IAM) roles, creating a SageMaker domain, and enabling Amazon Q Developer Pro. Instead of IAM Identity Center, you can also configure SAML through IAM federation for user management.

After the environment is configured, users sign in through the provided SageMaker Unified Studio domain URL with single sign-on. You can create projects to collaborate with team members, choosing from pre-configured project profiles for different use cases. Each project connects to a Git repository for version control and includes an example unified Jupyter notebook to get you started.

For detailed setup instructions, check the SageMaker Unified Studio Administrator Guide.

Now available
The next generation of Amazon SageMaker is available today in the US East (N. Virginia, Ohio), US West (Oregon), Asia Paciﬁc (Tokyo), and Europe (Ireland) AWS Regions. Amazon SageMaker Uniﬁed Studio and Amazon Bedrock IDE are available today in preview in these AWS Regions. Check the full Region list for future updates.

For pricing information, visit Amazon SageMaker pricing and Amazon Bedrock pricing. To learn more, visit Amazon SageMaker, SageMaker Unified Studio, and Amazon Bedrock IDE.

Existing Amazon Bedrock Studio preview domains will be available until February 28, 2025, but you may not create new workspaces. To experience the advanced features of Bedrock IDE, create a new SageMaker domain following the instructions in the Administrator Guide.

Give the new Amazon SageMaker a try in the console today and let us know what you think! Send feedback to AWS re:Post for Amazon SageMaker or through your usual AWS Support contacts.

— Antje

Introducing multi-agent collaboration capability for Amazon Bedrock (preview)

2024-12-03 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/introducing-multi-agent-collaboration-capability-for-amazon-bedrock/

Today, we’re announcing the multi-agent collaboration capability for Amazon Bedrock (preview). With multi-agent collaboration, you can build, deploy, and manage multiple AI agents working together on complex multi-step tasks that require specialized skills.

When you need more than a single agent to handle a complex task, you can create additional specialized agents to address different aspects of the process. However, managing these agents becomes technically challenging as tasks grow in complexity. As a developer using open source solutions, you may find yourself navigating the complexities of agent orchestration, session handling, memory management, and other technical aspects that require manual implementation.

With the fully managed multi-agent collaboration capability on Amazon Bedrock, specialized agents work within their domains of expertise, coordinated by a supervisor agent. The supervisor breaks down requests, delegates tasks, and consolidates outputs into a final response. For example, an investment advisory multi-agent system might include agents specialized in financial data analysis, research, forecasting, and investment recommendations. Similarly, a retail operations multi-agent system could handle demand forecasting, inventory allocation, supply chain coordination, and pricing optimization.

Amazon Bedrock Agents manages the collaboration, communication, and task delegation behind the scenes. By enabling agents to work together, you can achieve higher task success rates, accuracy, and enhanced productivity. In internal benchmark testing, multi-agent collaboration has shown marked improvements compared to single-agent systems for handling complex, multi-step tasks.

Highlights of multi-agent collaboration in Amazon Bedrock
A key challenge in building eﬀective multi-agent collaboration systems is managing the complexity and overhead of coordinating multiple specialized agents at scale. Amazon Bedrock simplifies the process of building, deploying, and orchestrating effective multi-agent collaboration systems while addressing efficiency challenges through several key features and optimizations:

Quick setup – Create, deploy, and manage AI agents working together in minutes without the need for complex coding.
Composability – Integrate your existing agents as subagents within a larger agent system, allowing them to seamlessly work together to tackle complex workflows.
Efficient inter-agent communication – The supervisor agent can interact with subagents using a consistent interface, supporting parallel communication for more efficient task completion.
Optimized collaboration modes – Choose between supervisor mode and supervisor with routing mode. With routing mode, the supervisor agent will route simple requests directly to specialized subagents, bypassing full orchestration. For complex queries or when no clear intention is detected, it automatically falls back to the full supervisor mode, where the supervisor agent analyzes, breaks down problems, and coordinates multiple subagents as needed.
Integrated trace and debug console – Visualize and analyze multi-agent interactions behind the scenes using the integrated trace and debug console.

These features collectively improve coordination capabilities, communication speed, and overall effectiveness of the multi-agent collaboration framework in tackling complex, real-world problems.

Here’s how to get started.

Using multi-agent collaboration in Amazon Bedrock
For this demo, I create a social media campaign manager agent that’s composed of a content strategist agent creating posts and an engagement predictor agent optimizing their timing and reach. The following figure shows the team of agents that I’m creating and how multi-agent collaboration works in this scenario.

To get started, you can use the Amazon Bedrock console or APIs to create a supervisor agent and associate specialist subagents in just a few steps.

Create subagents
First, I create the two subagents using the existing agent builder workflow. I open the Amazon Bedrock console, select Agents in the left navigation panel, then choose Create Agent. I create one agent that I name content-strategist, an agent that generates creative social media content ideas. Note the new option to enable the agent for multi-agent collaboration. I leave this option unchecked for now; we need to enable this option later for the supervisor agent. Next, I choose Create.

In the Agent builder dialog box, I choose to create and use a new service role, select Anthropic’s Claude 3.5 Sonnet v2 as the model, and provide the following instructions for the agent:

You are a social media content strategist with expertise in converting business goals into engaging social posts. Your task is to generate creative, on-brand content ideas that align with specified campaign goals and target audience. Each suggestion should include a topic, content type (image/video/text/poll), specific copy, and relevant hashtags. Focus on variety, authenticity, and ensuring each post serves a strategic purpose.

I also create and attach a knowledge base that contains high-performing post templates. As with any other agent, you could also configure additional settings, such as action groups to perform tasks, enable code interpretation, or add guardrails. I leave all other settings to their defaults.

Then, I choose Save and exit.

I repeat the steps to create a second agent that I name engagement-predictor, an agent that predicts social media post performance and optimal posting times. For this agent, I provide the following instructions:

You are a social media analytics expert who predicts post performance and optimal timing. For each content idea, analyze potential reach and engagement based on content type, industry benchmarks, and audience behavior patterns. Your task is to estimate reach, engagement rate, and determine the best posting time (day/hour). Support each prediction with data-driven reasoning and industry-specific insights. Focus on actionable metrics that will maximize campaign impact.

I create and attach a knowledge base that contains platform-specific peak engagement times, industry benchmark metrics, and content performance multipliers for predicting and optimizing social media post performance. Again, I choose Save and exit.

I now have my two specialist subagents.

Before moving on, test each agent individually, and once you’ve confirmed their functionality, create an alias for each one. This approach will streamline the process of creating supervisor agents in the future.

Create supervisor agent and associate subagents
Next, I create the supervisor agent. I name this agent social-media-campaign-manager, an agent that combines the outputs from the content strategy agent and the engagement predictor agent into a comprehensive campaign plan.

This time, I turn on Enable Multi-agent collaboration before I choose Create.

In the Agent builder dialog box, I again choose to create and use a new service role, select Anthropic’s Claude 3.5 Sonnet v2 as the model, and provide the following instructions for the agent:

You are a strategic campaign manager who orchestrates social media campaigns from concept to execution.

I create and attach a knowledge base that contains a collection of proven campaign templates, content mix ratios, and cross-platform posting requirements.

Next, I scroll down to Multi-agent collaboration and choose Edit.

The option to turn on multi-agent collaboration should already be checked because I enabled this option when I started creating the agent.

Then, you can choose between two collaboration configurations that determine how information is handled across the agent’s team to coordinate a final response.

In Supervisor mode, the supervisor agent analyzes the input, breaking down complex problems or paraphrasing the request. It then invokes subagents either serially or in parallel, and it might consult knowledge bases or invoke action groups. After receiving responses from subagents, the supervisor agent processes them to determine if the problem is solved or if further action is needed.

Alternatively, in Supervisor with routing mode, the supervisor agent first attempts to route simple requests directly to a relevant subagent, whose response is then forwarded to the user. For complex or ambiguous inputs, the system switches to supervisor mode, where the supervisor agent breaks down the problem or asks follow-up questions before proceeding similarly to standard supervisor mode. This approach allows for efficient handling of both straightforward and complex queries within a single framework.

For my demo, I choose Supervisor mode.

As a last step, I associate the two subagents by adding each subagent in Agent collaborator. I provide a collaborator name for each agent and a collaborator instruction.

I select the content-strategist agent and provide the collaborator name content-strategist along with the following instruction:

You can invoke this agent for social media content strategy tasks such as converting business goals into engaging social posts. The agent generates creative, on-brand content ideas that align with specified campaign goals and target audience.

Then, I choose Add collaborator, select the engagement-predictor agent, and provide the collaborator name engagement-predictor along with the following instructions:

You can invoke this agent for social media analytics to predict post performance and optimal timing.

Note: Enable conversation history sharing allows the supervisor agent to pass the full context of a user interaction to subagents. This helps maintain coherence and avoid repeating questions, especially when routing or switching between agents. Keep in mind, it might confuse simpler subagents with complex task histories. We recommend enabling this feature when you need continuity and disabling it when you’re focusing on task simplification or using specialized agents. I keep it disabled for my demo.

Choose Save and complete the Agent builder workflow.

Let’s test it!

Test multi-agent collaboration
Prepare the social media campaign manager agent and choose Test.

I use the following input prompt:

Create a 2-week social campaign for EcoTech's new solar panel launch. Target: B2B (facility managers, sustainability directors) Key points: 30% more efficient, AI-optimized, 2-year ROI Need: 4 posts/week on LinkedIn/Twitter (40% educational, 30% product, 30% thought leadership).

After the response comes back, I choose Show trace to inspect the workflow. In the Multi-agent collaboration trace timeline, you can observe that each subagent got invoked. You can also inspect the trace steps to check the orchestration details.

You can find more examples of how to work with Amazon Bedrock Agents and the new multi-agent collaboration capability in the Amazon Bedrock Agent Samples GitHub repo.

Things to know

During preview, multi-agent collaboration supports real-time chat assistant (synchronous) use cases.
Subagents can have collaboration enabled themselves with an overall soft limit of three hierarchical agent team layers.

Join the preview
Multi-agent collaboration in Amazon Bedrock is available today in preview in all AWS Regions that support Amazon Bedrock Agents, except AWS GovCloud (US-West). Check the full Region list for future updates. To learn more, visit Amazon Bedrock Agents.

Give multi-agent collaboration a try in the Amazon Bedrock console today and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

I’m excited to see what you build with multi-agent collaboration.

— Antje

Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview)

2024-12-03 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/prevent-factual-errors-from-llm-hallucinations-with-mathematically-sound-automated-reasoning-checks-preview/

Today, we’re adding Automated Reasoning checks (preview) as a new safeguard in Amazon Bedrock Guardrails to help you mathematically validate the accuracy of responses generated by large language models (LLMs) and prevent factual errors from hallucinations.

Amazon Bedrock Guardrails lets you implement safeguards for generative AI applications by filtering undesirable content, redacting personal identifiable information (PII), and enhancing content safety and privacy. You can configure policies for denied topics, content filters, word filters, PII redaction, contextual grounding checks, and now Automated Reasoning checks.

Automated Reasoning checks help prevent factual errors from hallucinations using sound mathematical, logic-based algorithmic verification and reasoning processes to verify the information generated by a model, so outputs align with known facts and aren’t based on fabricated or inconsistent data.

Amazon Bedrock Guardrails is the only responsible AI capability offered by a major cloud provider that helps customers to build and customize safety, privacy, and truthfulness for their generative AI applications within a single solution.

Primer on automated reasoning
Automated reasoning is a field of computer science that uses mathematical proofs and logical deduction to verify the behavior of systems and programs. Automated reasoning differs from machine learning (ML), which makes predictions, in that it provides mathematical guarantees about a system’s behavior. Amazon Web Services (AWS) already uses automated reasoning in key service areas such as storage, networking, virtualization, identity, and cryptography. For example, automated reasoning is used to formally verify the correctness of cryptographic implementations, improving both performance and development speed. To learn more, check out Provable Security and the Automated reasoning research area in the Amazon Science Blog.

Now AWS is applying a similar approach to generative AI. The new Automated Reasoning checks (preview) in Amazon Bedrock Guardrails is the first and only generative AI safeguard that helps prevent factual errors due to hallucinations using logically accurate and verifiable reasoning that explains why generative AI responses are correct. Automated Reasoning checks are particularly useful for use cases where factual accuracy and explainability are important. For example, you could use Automated Reasoning checks to validate LLM-generated responses about human resources (HR) policies, company product information, or operational workflows.

Used alongside other techniques such as prompt engineering, Retrieval-Augmented Generation (RAG), and contextual grounding checks, Automated Reasoning checks add a more rigorous and verifiable approach to making sure that LLM-generated output is factually accurate. By encoding your domain knowledge into structured policies, you can have confidence that your conversational AI applications are providing reliable and trustworthy information to your users.

Using Automated Reasoning checks (preview) in Amazon Bedrock Guardrails
With Automated Reasoning checks in Amazon Bedrock Guardrails, you can create Automated Reasoning policies that encode your organization’s rules, procedures, and guidelines into a structured, mathematical format. These policies can then be used to verify that the content generated by your LLM-powered applications is consistent with your guidelines.

Automated Reasoning policies are composed of a set of variables, defined with a name, type, and description, and the logical rules that operate on the variables. Behind the scenes, rules are expressed in formal logic, but they’re translated to natural language to make it easier for a user without formal logic expertise to refine a model. Automated Reasoning checks uses the variable descriptions to extract their values when validating a Q&A.

Here’s how it works.

Create Automated Reasoning policies
Using the Amazon Bedrock console, you can upload documents that describe your organization’s rules and procedures. Amazon Bedrock will analyze these documents and automatically create an initial Automated Reasoning policy, which represents the key concepts and their relationships in a mathematical format.

Navigate to the new Automated Reasoning menu item in Safeguards. Create a new policy and give it a name. Upload an existing document that defines the right solution space, such as an HR guideline or an operational manual. For this demo, I’m using an example airline ticket policy document that includes the airline’s policies for ticket changes.

Then, define the policy’s intent and any processing parameters. For example, specify if it will validate airport staff inquiries and identify any elements to exclude from processing, such as internal reference numbers. Include one or more sample Q&As to help the system understand typical interactions.

Here’s my intent description:

Ignore the policy ID number, it's irrelevant. Airline employees will ask questions about whether customers are allowed to modify their tickets providing the customer details. Below is an example question:

QUESTION: I’m flying to Wonder City with Unicorn Airlines and noticed my last name is misspelled on the ticket, can modify it at the airport?
ANSWER: No. Changes to the spelling of the names on the ticket must be submitted via email within 24 hours of ticket purchase.

Then, choose Create.

The system now initiates an automated process to create your Automated Reasoning policy. This process involves analyzing your document, identifying key concepts, breaking down the document into individual units, translating these natural language units into formal logic, validating the translations, and finally combining them into a comprehensive logical model. Once complete, review the generated structure, including the rules and variables. You can edit these for accuracy through the user interface.

To test the Automated Reasoning policy, you first have to create a guardrail.

Create a guardrail and configure Automated Reasoning checks
When building your conversational AI application with Amazon Bedrock Guardrails, you can enable Automated Reasoning checks and specify which Automated Reasoning policies to use for validation.

Navigate to the Guardrails menu item in Safeguards. Create a new guardrail and give it a name. Choose Enable Automated Reasoning policy and select the policy and policy version you want to use. Then, complete your guardrail configuration.

Test Automated Reasoning checks
You can use the Test playground in the Automated Reasoning console to verify the effectiveness of your Automated Reasoning policy. Enter a test question just like a user of your application would, together with an example answer to validate.

For this demo, I enter an incorrect answer to see what will happen.

Question: I'm flying to Wonder City with Unicorn Airlines and noticed my last name is misspelled on the ticket, I'm currently in person at the airport, can I submit the change in person?

Answer: Yes. You are allowed to change names on tickets at any time, even in person at the airport.

Then, select the guardrail you’ve just created and choose Submit.

Automated Reasoning checks will analyze the content and validate it against the Automated Reasoning policies you’ve configured. The checks will identify any factual inaccuracies or inconsistencies and provide an explanation for the validation results.

In my demo, the Automated Reasoning checks correctly identified the response as Invalid. It shows which rule led to the finding, along with the extracted variables and suggestions.

When the validation result is invalid, the suggestions show a set of variable assignments that would make the conclusion valid. In my scenario, the suggestions show that the change submission method needs to be email for the validation result to be valid.

If no factual inaccuracies are detected and the validation result is Valid, suggestions show a list of assignments that are necessary for the result to hold; these are unstated assumptions in the answer. In my scenario, this might be assumptions such as that it’s the original ticket on which name corrections must be made or that the type of ticket stock is eligible for changes.

If factual inconsistencies are detected, the console will display Mixed results as the validation result. In the API response, you will see a list of findings, with some marked as valid and others as invalid. If this happens, review the system’s findings and suggestions and edit any unclear policy rules.

You can also use the validation results to enhance LLM-generated responses based on the feedback. For example, the following code snippet demonstrates how you can ask the model to regenerate its answer based on the received feedback:

for f in findings:
    if f.result == "INVALID":
        if f.rules is not None:
            for r in f.rules:
                feedback += f"<feedback>{r.description}</feedback>\n"

new_prompt = (
    "The answer you generated is inaccurate. Consider the feedback below within "
    f"<feedback> tags and rewrite your answer.\n\n{feedback}"
)

Achieving high validation accuracy is an iterative process. As a best practice, regularly review policy performance and adjust it as needed. You can edit rules in natural language and the system will automatically update the logical model.

For example, updating variable descriptions can significantly improve validation accuracy. Consider a scenario where a question states, “I’m a full-time employee…,” and the description of the is_full_time variable only states, “works more than 20 hours per week.” In this case, Automated Reasoning checks might not recognize the phrase “full-time.” To enhance accuracy, you should update the variable description to be more comprehensive, such as: “Works more than 20 hours per week. Users may refer to this as full-time or part-time. The value should be true for full-time and false for part-time.” This detailed description helps the system pick up all relevant factual claims for validation in natural language questions and answers, providing more accurate results.

Available in preview
The new Automated Reasoning checks safeguard is available today in preview in Amazon Bedrock Guardrails in the US West (Oregon) AWS Region. To request to be considered for access to the preview today, contact your AWS account team. In the next few weeks, look for a sign-up form in the Amazon Bedrock console. To learn more, visit Amazon Bedrock Guardrails.

— Antje

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

2024-10-21 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-agentic-workflows-amazon-transcribe-aws-lambda-insights-and-more-october-21-2024/

Agentic workflows are quickly becoming a cornerstone of AI innovation, enabling intelligent systems to autonomously handle and refine complex tasks in a way that mirrors human problem-solving. Last week, we launched Serverless Agentic Workflows with Amazon Bedrock, a new short course developed in collaboration with Dr. Andrew Ng and DeepLearning.AI.

This hands-on course, taught by my colleague Mike Chambers, teaches how to build serverless agents that can handle complex tasks without the hassle of managing infrastructure. You will learn everything you need to know about integrating tools, automating workflows, and deploying responsible agents with built-in guardrails on Amazon Web Services (AWS) with Amazon Bedrock. The hands-on labs provided with the course let you apply your knowledge directly in an AWS environment, hosted by AWS Partner Vocareum. Find more information and enroll for free on the DeepLearning.AI course page.

Now, let’s turn our attention to other exciting news in the AWS universe from last week.

Last week’s launches
Here are some launches that got my attention:

Amazon Transcribe now supports streaming transcription in 30 additional languages – Amazon Transcribe has expanded its support to include 30 additional languages, bringing the total number of supported languages to 54. This enhancement helps you reach a broader global audience and improves accessibility across various industries, including contact centers, broadcasting, and e-learning. The expanded language support allows for more efficient content moderation, improved agent productivity, and automatic subtitling for live events and meetings.

AWS Lambda console now surfaces key function insights and supports real-time log analytics – The AWS Lambda console now features a built-in Amazon CloudWatch Metrics Insights dashboard and supports CloudWatch Logs Live Tail, providing instant visibility into critical function metrics and real-time log streaming. You can now identify and troubleshoot errors or performance issues for your Lambda functions without leaving the console, as well as view and analyze logs in real time as they become available. You can reduce context switching and accelerate the development and troubleshooting processes for serverless applications. Check out the launch post for more details.

Amazon Bedrock Model Evaluation now supports evaluating custom model import models – You can now evaluate custom models you’ve imported to Amazon Bedrock using the model evaluation feature. This helps you to complete the full cycle of selecting, customizing, and evaluating models before deploying them. To evaluate an imported model, select the custom model from the list of models to evaluate in the model selector tool when creating an evaluation job.

Amazon Q in AWS Supply Chain – You can now use Amazon Q, an interactive AI assistant, to analyze your supply chain data in AWS Supply Chain and get insights to operate your supply chain more efficiently. Amazon Q can answer your supply chain questions by diving into your data. This reduces the time spent searching for information and streamlines finding answers to improve your supply chain operations.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and posts that you might find interesting:

New Amazon OpenSearch Service YouTube channel – The channel offers bite-sized tutorials, curated content, and organized playlists on topics such as log analytics, semantic search, vector databases, and operational best practices. You can also provide feedback to influence future channel content and the OpenSearch Service roadmap. Check out the launch post for more details and subscribe to the Amazon OpenSearch Service YouTube channel.

Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – This post shows you how to use Amazon EKS to orchestrate the deployment of pods containing NVIDIA NIM microservices, to enable quick-to-setup and optimized large-scale large language model (LLM) inference on Amazon EC2 G5 instances. It also demonstrates how to scale (both pod and cluster) by monitoring for custom metrics through Prometheus, and how you can load balance using an Application Load Balancer.

Instant Well-Architected CDK Resources with Solutions Constructs Factories – You can now create well-architected AWS resources such as Amazon Simple Storage Service (Amazon S3) buckets and AWS Step Functions state machines with a single function call using the new AWS Solutions Constructs Factories. These factories handle all the best practices configuration for you while still allowing customization. Try using a Constructs factory the next time you need to deploy one of the supported resources.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS GenAI Lofts – AWS GenAI Lofts are about more than just the tech, they bring together startups, developers, investors, and industry experts. Whether you’re looking to gain deep insights, or get your questions answered by generative AI pros, our GenAI Lofts have you covered and provide everything you need to start building your next innovation. Join events in London (through October 25), Seoul (October 30–November 6), São Paulo (through November 20), and Paris (through November 25).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Malta (November 8), Chile (November 9), and Kochi, India (December 14).

AWS re:Invent – Registration is now open for the annual tech extravaganza, taking place December 2–6 in Las Vegas. At re:Invent 2024, you’ll get a front row seat to hear real stories from customers and AWS leaders about navigating pressing topics, such as generative AI. Learn about new product launches, watch demos, and get behind-the-scenes insights during five headline-making keynotes.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Jamba 1.5 family of models by AI21 Labs is now available in Amazon Bedrock

2024-09-23 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/jamba-1-5-family-of-models-by-ai21-labs-is-now-available-in-amazon-bedrock/

Today, we are announcing the availability of AI21 Labs’ powerful new Jamba 1.5 family of large language models (LLMs) in Amazon Bedrock. These models represent a significant advancement in long-context language capabilities, delivering speed, efficiency, and performance across a wide range of applications. The Jamba 1.5 family of models includes Jamba 1.5 Mini and Jamba 1.5 Large. Both models support a 256K token context window, structured JSON output, function calling, and are capable of digesting document objects.

AI21 Labs is a leader in building foundation models and artificial intelligence (AI) systems for the enterprise. Together, AI21 Labs and AWS are empowering customers across industries to build, deploy, and scale generative AI applications that solve real-world challenges and spark innovation through a strategic collaboration. With AI21 Labs’ advanced, production-ready models together with Amazon’s dedicated services and powerful infrastructure, customers can leverage LLMs in a secure environment to shape the future of how we process information, communicate, and learn.

What is Jamba 1.5?
Jamba 1.5 models leverage a unique hybrid architecture that combines the transformer model architecture with Structured State Space model (SSM) technology. This innovative approach allows Jamba 1.5 models to handle long context windows up to 256K tokens, while maintaining the high-performance characteristics of traditional transformer models. You can learn more about this hybrid SSM/transformer architecture in the Jamba: A Hybrid Transformer-Mamba Language Model whitepaper.

You can now use two new Jamba 1.5 models from AI21 in Amazon Bedrock:

Jamba 1.5 Large excels at complex reasoning tasks across all prompt lengths, making it ideal for applications that require high quality outputs on both long and short inputs.
Jamba 1.5 Mini is optimized for low-latency processing of long prompts, enabling fast analysis of lengthy documents and data.

Key strengths of the Jamba 1.5 models include:

Long context handling – With 256K token context length, Jamba 1.5 models can improve the quality of enterprise applications, such as lengthy document summarization and analysis, as well as agentic and RAG workflows.
Multilingual – Support for English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew.
Developer-friendly – Native support for structured JSON output, function calling, and capable of digesting document objects.
Speed and efficiency – AI21 measured the performance of Jamba 1.5 models and shared that the models demonstrate up to 2.5X faster inference on long contexts than other models of comparable sizes. For detailed performance results, visit the Jamba model family announcement on the AI21 website.

Get started with Jamba 1.5 models in Amazon Bedrock
To get started with the new Jamba 1.5 models, go to the Amazon Bedrock console, choose Model access on the bottom left pane, and request access to Jamba 1.5 Mini or Jamba 1.5 Large.

To test the Jamba 1.5 models in the Amazon Bedrock console, choose the Text or Chat playground in the left menu pane. Then, choose Select model and select AI21 as the category and Jamba 1.5 Mini or Jamba 1.5 Large as the model.

By choosing View API request, you can get a code example of how to invoke the model using the AWS Command Line Interface (AWS CLI) with the current example prompt.

You can follow the code examples in the Amazon Bedrock documentation to access available models using AWS SDKs and to build your applications using various programming languages.

The following Python code example shows how to send a text message to Jamba 1.5 models using the Amazon Bedrock Converse API for text generation.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client.
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID.
# modelId = "ai21.jamba-1-5-mini-v1:0"
model_id = "ai21.jamba-1-5-large-v1:0"

# Start a conversation with the user message.
user_message = "What are 3 fun facts about mambas?"
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 256, "temperature": 0.7, "topP": 0.8},
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

The Jamba 1.5 models are perfect for use cases like paired document analysis, compliance analysis, and question answering for long documents. They can easily compare information across multiple sources, check if passages meet specific guidelines, and handle very long or complex documents. You can find example code in the AI21-on-AWS GitHub repo. To learn more about how to prompt Jamba models effectively, check out AI21’s documentation.

Now available
AI21 Labs’ Jamba 1.5 family of models is generally available today in Amazon Bedrock in the US East (N. Virginia) AWS Region. Check the full Region list for future updates. To learn more, check out the AI21 Labs in Amazon Bedrock product page and pricing page.

Give Jamba 1.5 models a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions.

— Antje

AWS Weekly Roundup: Llama 3.1, Mistral Large 2, AWS Step Functions, AWS Certifications update, and more (July 29, 2024)

2024-07-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-llama-3-1-mistral-large-2-aws-step-functions-aws-certifications-update-and-more-july-29-2024/

I’m always amazed by the talent and passion of our Amazon Web Services (AWS) community members, especially in their efforts to increase diversity, equity, and inclusion in the tech community.

Last week, I had the honor of speaking at the AWS User Group Women Bay Area meetup, led by Natalie. This group is dedicated to empowering and connecting women, providing a supportive environment to explore cloud computing. In Latin America, we recently had the privilege of supporting 12 women-led AWS User Groups from 10 countries in organizing two regional AWSome Women Community Summits, reaching over 800 women builders. There’s still more work to be done, but initiatives like these highlight the power of community in fostering an inclusive and diverse tech environment.

Now, let’s turn our attention to other exciting news in the AWS universe from last week.

Last week’s launches
Here are some launches that got my attention:

Meta Llama 3.1 models – The Llama 3.1 models are Meta’s most advanced and capable models to date. The Llama 3.1 models are a collection of 8B, 70B, and 405B parameter size models that demonstrate state-of-the-art performance on a wide range of industry benchmarks and offer new capabilities for your generative artificial intelligence (generative AI) applications. Llama 3.1 models are now available in Amazon Bedrock (see Announcing Llama 3.1 405B, 70B, and 8B models from Meta in Amazon Bedrock) and Amazon SageMaker JumpStart (see Llama 3.1 models are now available in Amazon SageMaker JumpStart).

My colleagues Tiffany and Mike explored Llama 3.1 in last week’s episode of the weekly Build On Generative AI live stream. You can watch the full episode here!

Mistral Large 2 model – Mistral Large 2 is the newest version of Mistral Large, and according to Mistral AI, it offers significant improvements across multilingual capabilities, math, reasoning, coding, and much more. Mistral AI’s Mistral Large 2 foundation model (FM) is now available in Amazon Bedrock. See Mistral Large 2 is now available in Amazon Bedrock for all the details. You can find code examples in the Mistral-on-AWS repo and the Amazon Bedrock User Guide.

Faster auto scaling for generative AI models – This new capability in Amazon SageMaker inference can help you reduce the time it takes for your generative AI models to scale automatically. You can now use sub-minute metrics and significantly reduce overall scaling latency for generative AI models. With this enhancement, you can improve the responsiveness of your generative AI applications as demand fluctuates. For more details, check out Amazon SageMaker inference launches faster auto scaling for generative AI models.

AWS Step Functions now supports customer managed keys – AWS Step Functions now supports the use of customer managed keys with AWS Key Management Service (AWS KMS) to encrypt Step Functions state machine and activity resources. This new capability lets you encrypt your workflow definitions and execution data using your own encryption keys. Visit the AWS Step Functions documentation and the AWS KMS documentation to learn more.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and posts that you might find interesting:

AWS Certification: Addition of new exam question types – If you are planning to take the AWS Certified AI Practitioner or AWS Certified Machine Learning Engineer – Associate exam anytime soon, check out AWS Certification: Addition of new exam question types. These exams will be the first to include three new question types: ordering, matching, and case study. The post shares insights about the new question types and offers information to help you prepare.

Amazon’s exabyte-scale migration from Apache Spark to Ray on Amazon EC2 – The Business Data Technologies (BDT) team at Amazon Retail has just flipped the switch to start quietly moving management of some of their largest production business intelligence (BI) datasets from Apache Spark over to Ray to help reduce both data processing time and cost. They’ve also contributed a critical component of their work (The Flash Compactor) back to Ray’s open source DeltaCAT project. Find the full story at Amazon’s Exabyte-Scale Migration from Apache Spark to Ray on Amazon EC2.

From community.aws
Here are my top three personal favorites posts from community.aws:

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Summits – The 2024 AWS Summit season is almost wrapping up! Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Mexico City (August 7), São Paulo (August 15), and Jakarta (September 5).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: New Zealand (August 15), Colombia (August 24), New York (August 28), Belfast (September 6), and Bay Area (September 13).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Knowledge Bases for Amazon Bedrock now supports additional data connectors (in preview)

2024-07-10 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/knowledge-bases-for-amazon-bedrock-now-supports-additional-data-connectors-in-preview/

Using Knowledge Bases for Amazon Bedrock, foundation models (FMs) and agents can retrieve contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG). RAG helps FMs deliver more relevant, accurate, and customized responses.

Over the past months, we’ve continuously added choices of embedding models, vector stores, and FMs to Knowledge Bases.

Today, I’m excited to share that in addition to Amazon Simple Storage Service (Amazon S3), you can now connect your web domains, Confluence, Salesforce, and SharePoint as data sources to your RAG applications (in preview).

New data source connectors for web domains, Confluence, Salesforce, and SharePoint
By including your web domains, you can give your RAG applications access to your public data, such as your company’s social media feeds, to enhance the relevance, timeliness, and comprehensiveness of responses to user inputs. Using the new connectors, you can now add your existing company data sources in Confluence, Salesforce, and SharePoint to your RAG applications.

Let me show you how this works. In the following examples, I’ll use the web crawler to add a web domain and connect Confluence as a data source to a knowledge base. Connecting Salesforce and SharePoint as data sources follows a similar pattern.

Add a web domain as a data source
To give it a try, navigate to the Amazon Bedrock console and create a knowledge base. Provide the knowledge base details, including name and description, and create a new or use an existing service role with the relevant AWS Identity and Access Management (IAM) permissions.

Then, choose the data source you want to use. I select Web Crawler.

In the next step, I configure the web crawler. I enter a name and description for the web crawler data source. Then, I define the source URLs. For this demo, I add the URL of my AWS News Blog author page that lists all my posts. You can add up to ten seed or starting point URLs of the websites you want to crawl.

Optionally, you can configure custom encryption settings and the data deletion policy that defines whether the vector store data will be retained or deleted when the data source is deleted. I keep the default advanced settings.

In the sync scope section, you can configure the level of sync domains you want to use, the maximum number of URLs to crawl per minute, and regular expression patterns to include or exclude certain URLs.

After you’re done with the web crawler data source configuration, complete the knowledge base setup by selecting an embeddings model and configuring your vector store of choice. You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base and see FM responses with web URLs as citations.

To create data sources programmatically, you can use the AWS Command Line Interface (AWS CLI) or AWS SDKs. For code examples, check out the Amazon Bedrock User Guide.

Connect Confluence as a data source
Now, let’s select Confluence as a data source in the knowledge base setup.

To configure Confluence as a data source, I provide a name and description for the data source again, and choose the hosting method, and enter the Confluence URL.

To connect to Confluence, you can choose between base and OAuth 2.0 authentication. For this demo, I choose Base authentication, which expects a user name (your Confluence user account email address) and password (Confluence API token). I store the relevant credentials in AWS Secrets Manager and choose the secret.

Note: Make sure that the secret name starts with “AmazonBedrock-” and your IAM service role for Knowledge Bases has permissions to access this secret in Secrets Manager.

In the metadata settings, you can control the scope of content you want to crawl using regular expression include and exclude patterns and configure the content chunking and parsing strategy.

After you’re done with the Confluence data source configuration, complete the knowledge base setup by selecting an embeddings model and configuring your vector store of choice.

You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base. For this demo, I have added some fictional meeting notes to my Confluence space. Let’s ask about the action items from one of the meetings!

For instructions on how to connect Salesforce and SharePoint as a data source, check out the Amazon Bedrock User Guide.

Things to know

Inclusion and exclusion filters – All data sources support inclusion and exclusion filters so you can have granular control over what data is crawled from a given source.
Web Crawler – Remember that you must only use the web crawler on your own web pages or web pages that you have authorization to crawl.

Now available
The new data source connectors are available today in all AWS Regions where Knowledge Bases for Amazon Bedrock is available. Check the Region list for details and future updates. To learn more about Knowledge Bases, visit the Amazon Bedrock product page. For pricing details, review the Amazon Bedrock pricing page.

Give the new data source connectors a try in the Amazon Bedrock console today, send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

AWS Weekly Roundup – LlamaIndex support for Amazon Neptune, force AWS CloudFormation stack deletion, and more (May 27, 2024)

2024-05-27 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-llamaindex-support-for-amazon-neptune-force-aws-cloudformation-stack-deletion-and-more-may-27-2024/

Last week, Dr. Matt Wood, VP for AI Products at Amazon Web Services (AWS), delivered the keynote at the AWS Summit Los Angeles. Matt and guest speakers shared the latest advancements in generative artificial intelligence (generative AI), developer tooling, and foundational infrastructure, showcasing how they come together to change what’s possible for builders. You can watch the full keynote on YouTube.

Announcements during the LA Summit included two new Amazon Q courses as part of Amazon’s AI Ready initiative to provide free AI skills training to 2 million people globally by 2025. The courses are part of the Amazon Q learning plan. But that’s not all that happened last week.

Last week’s launches
Here are some launches that got my attention:

LlamaIndex support for Amazon Neptune — You can now build Graph Retrieval Augmented Generation (GraphRAG) applications by combining knowledge graphs stored in Amazon Neptune and LlamaIndex, a popular open source framework for building applications with large language models (LLMs) such as those available in Amazon Bedrock. To learn more, check the LlamaIndex documentation for Amazon Neptune Graph Store.

AWS CloudFormation launches a new parameter called DeletionMode for the DeleteStack API — You can use the AWS CloudFormation DeleteStack API to delete your stacks and stack resources. However, certain stack resources can prevent the DeleteStack API from successfully completing, for example, when you attempt to delete non-empty Amazon Simple Storage Service (Amazon S3) buckets. The DeleteStack API can enter into the DELETE_FAILED state in such scenarios. With this launch, you can now pass FORCE_DELETE_STACK value to the new DeletionMode parameter and delete such stacks. To learn more, check the DeleteStack API documentation.

Mistral Small now available in Amazon Bedrock — The Mistral Small foundation model (FM) from Mistral AI is now generally available in Amazon Bedrock. This a fast-follow to our recent announcements of Mistral 7B and Mixtral 8x7B in March, and Mistral Large in April. Mistral Small, developed by Mistral AI, is a highly efficient large language model (LLM) optimized for high-volume, low-latency language-based tasks. To learn more, check Esra’s post.

New Amazon CloudFront edge location in Cairo, Egypt — The new AWS edge location brings the full suite of benefits provided by Amazon CloudFront, a secure, highly distributed, and scalable content delivery network (CDN) that delivers static and dynamic content, APIs, and live and on-demand video with low latency and high performance. Customers in Egypt can expect up to 30 percent improvement in latency, on average, for data delivered through the new edge location. To learn more about AWS edge locations, visit CloudFront edge locations.

Amazon OpenSearch Service zero-ETL integration with Amazon S3 — This Amazon OpenSearch Service integration offers a new efficient way to query operational logs in Amazon S3 data lakes, eliminating the need to switch between tools to analyze data. You can get started by installing out-of-the-box dashboards for AWS log types such as Amazon VPC Flow Logs, AWS WAF Logs, and Elastic Load Balancing (ELB). To learn more, check out the Amazon OpenSearch Service Integrations page and the Amazon OpenSearch Service Developer Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and a Twitch show that you might find interesting:

AWS Build On Generative AI Build On Generative AI — Now streaming every Thursday, 2:00 PM US PT on twitch.tv/aws, my colleagues Tiffany and Mike discuss different aspects of generative AI and invite guest speakers to demo their work. Check out show notes and the full list of episodes on community.aws.

Amazon Bedrock Studio bootstrapper script — We’ve heard your feedback! To everyone who struggled setting up the required AWS Identity and Access Management (IAM) roles and permissions to get started with Amazon Bedrock Studio: You can now use the Bedrock Studio bootstrapper script to automate the creation of the permissions boundary, service role, and provisioning role.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Summits — It’s AWS Summit season! Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Dubai (May 29), Bangkok (May 30), Stockholm (June 4), Madrid (June 5), and Washington, DC (June 26–27).

AWS re:Inforce — Join us for AWS re:Inforce (June 10–12) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. Connect with the AWS teams that build the security tools and meet AWS customers to learn about their security journeys.

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), New Zealand (August 15), Nigeria (August 24), and New York (August 28).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Build RAG and agent-based generative AI applications with new Amazon Titan Text Premier model, available in Amazon Bedrock

2024-05-07 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/build-rag-and-agent-based-generative-ai-applications-with-new-amazon-titan-text-premier-model-available-in-amazon-bedrock/

Today, we’re happy to welcome a new member of the Amazon Titan family of models: Amazon Titan Text Premier, now available in Amazon Bedrock.

Following Amazon Titan Text Lite and Titan Text Express, Titan Text Premier is the latest large language model (LLM) in the Amazon Titan family of models, further increasing your model choice within Amazon Bedrock. You can now choose between the following Titan Text models in Bedrock:

Titan Text Premier is the most advanced Titan LLM for text-based enterprise applications. With a maximum context length of 32K tokens, it has been specifically optimized for enterprise use cases, such as building Retrieval Augmented Generation (RAG) and agent-based applications with Knowledge Bases and Agents for Amazon Bedrock. As with all Titan LLMs, Titan Text Premier has been pre-trained on multilingual text data but is best suited for English-language tasks. You can further custom fine-tune (preview) Titan Text Premier with your own data in Amazon Bedrock to build applications that are specific to your domain, organization, brand style, and use case. I’ll dive deeper into model highlights and performance in the following sections of this post.
Titan Text Express is ideal for a wide range of tasks, such as open-ended text generation and conversational chat. The model has a maximum context length of 8K tokens.
Titan Text Lite is optimized for speed, is highly customizable, and is ideal to be fine-tuned for tasks such as article summarization and copywriting. The model has a maximum context length of 4K tokens.

Now, let’s discuss Titan Text Premier in more detail.

Amazon Titan Text Premier model highlights
Titan Text Premier has been optimized for high-quality RAG and agent-based applications and customization through fine-tuning while incorporating responsible artificial intelligence (AI) practices.

Optimized for RAG and agent-based applications – Titan Text Premier has been specifically optimized for RAG and agent-based applications in response to customer feedback, where respondents named RAG as one of their key components in building generative AI applications. The model training data includes examples for tasks like summarization, Q&A, and conversational chat and has been optimized for integration with Knowledge Bases and Agents for Amazon Bedrock. The optimization includes training the model to handle the nuances of these features, such as their specific prompt formats.

High-quality RAG through integration with Knowledge Bases for Amazon Bedrock – With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for RAG. You can now choose Titan Text Premier with Knowledge Bases to implement question-answering and summarization tasks over your company’s proprietary data.
Automating tasks through integration with Agents for Amazon Bedrock – You can also create custom agents that can perform multistep tasks across different company systems and data sources using Titan Text Premier with Agents for Amazon Bedrock. Using agents, you can automate tasks for your internal or external customers, such as managing retail orders or processing insurance claims.

We already see customers exploring Titan Text Premier to implement interactive AI assistants that create summaries from unstructured data such as emails. They’re also exploring the model to extract relevant information across company systems and data sources to create more meaningful product summaries.

Here’s a demo video created by my colleague Brooke Jamieson that shows an example of how you can put Titan Text Premier to work for your business.

Custom fine-tuning of Amazon Titan Text Premier (preview) – You can fine-tune Titan Text Premier with your own data in Amazon Bedrock to increase model accuracy by providing your own task-specific labeled training dataset. Customizing Titan Text Premier helps to further specialize your model and create unique user experiences that reflect your company’s brand, style, voice, and services.

Built responsibly – Amazon Titan Text Premier incorporates safe, secure, and trustworthy practices. The AWS AI Service Card for Amazon Titan Text Premier documents the model’s performance across key responsible AI benchmarks from safety and fairness to veracity and robustness. The model also integrates with Guardrails for Amazon Bedrock so you can implement additional safeguards customized to your application requirements and responsible AI policies. Amazon indemnifies customers who responsibly use Amazon Titan models against claims that generally available Amazon Titan models or their outputs infringe on third-party copyrights.

Amazon Titan Text Premier model performance
Titan Text Premier has been built to deliver broad intelligence and utility relevant for enterprises. The following table shows evaluation results on public benchmarks that assess critical capabilities, such as instruction following, reading comprehension, and multistep reasoning against price-comparable models. The strong performance across these diverse and challenging benchmarks highlights that Titan Text Premier is built to handle a wide range of use cases in enterprise applications, offering great price performance. For all benchmarks listed below, a higher score is a better score.

Capability	Benchmark	Description	Amazon	Google	OpenAI
Capability	Benchmark	Description	Titan Text Premier	Gemini Pro 1.0	GPT-3.5
General	MMLU (Paper)	Representation of questions in 57 subjects	70.4% (5-shot)	71.8% (5-shot)	70.0% (5-shot)
Instruction following	IFEval (Paper)	Instruction-following evaluation for large language models	64.6% (0-shot)	not published	not published
Reading comprehension	RACE-H (Paper)	Large-scale reading comprehension	89.7% (5-shot)	not published	not published
Reasoning	HellaSwag (Paper)	Common-sense reasoning	92.6% (10-shot)	84.7% (10-shot)	85.5% (10-shot)
	DROP, F1 score (Paper)	Reasoning over text	77.9 (3-shot)	74.1 (Variable Shots)	64.1 (3-shot)
	BIG-Bench Hard (Paper)	Challenging tasks requiring multistep reasoning	73.7% (3-shot CoT)	75.0% (3-shot CoT)	not published
	ARC-Challenge (Paper)	Common-sense reasoning	85.8% (5-shot)	not published	85.2% (25-shot)

Note: Benchmarks evaluate model performance using a variation of few-shot and zero-shot prompting. With few-shot prompting, you provide the model with a number of concrete examples (three for 3-shot, five for 5-shot, etc.) of how to solve a specific task. This demonstrates the model’s ability to learn from example, called in-context learning. With zero-shot prompting on the other hand, you evaluate a model’s ability to perform tasks by relying only on its preexisting knowledge and general language understanding without providing any examples.

Get started with Amazon Titan Text Premier
To enable access to Amazon Titan Text Premier, navigate to the Amazon Bedrock console and choose Model access on the bottom left pane. On the Model access overview page, choose the Manage model access button in the upper right corner and enable access to Amazon Titan Text Premier.

To use Amazon Titan Text Premier in the Bedrock console, choose Text or Chat under Playgrounds in the left menu pane. Then choose Select model and select Amazon as the category and Titan Text Premier as the model. To explore the model, you can load examples. The following screenshot shows one of those examples that demonstrates the model’s chain of thought (CoT) and reasoning capabilities.

By choosing View API request, you can get a code example of how to invoke the model using the AWS Command Line Interface (AWS CLI) with the current example prompt. You can also access Amazon Bedrock and available models using the AWS SDKs. In the following example, I will use the AWS SDK for Python (Boto3).

Amazon Titan Text Premier in action
For this demo, I ask Amazon Titan Text Premier to summarize one of my previous AWS News Blog posts that announced the availability of Amazon Titan Image Generator and the watermark detection feature.

For summarization tasks, a recommended prompt template looks like this:

The following is text from a {{Text Category}}:
{{Text}}
Summarize the {{Text Category}} in {{length of summary}}

For more prompting best practices, check out the Amazon Titan Text Prompt Engineering Guidelines.

I adapt this template to my example and define the prompt. In preparation, I saved my News Blog post as a text file and read it into the post string variable.

prompt = """
The following is text from a AWS News Blog post:

<text>
%s
</text>

Summarize the above AWS News Blog post in a short paragraph.
""" % post

Similar to previous Amazon Titan Text models, Amazon Titan Text Premier supports temperature and topP inference parameters to control the randomness and diversity of the response, as well as maxTokenCount and stopSequences to control the length of the response.

import boto3
import json

bedrock_runtime = boto3.client(service_name="bedrock-runtime")

body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{  
        "maxTokenCount":256,
        "stopSequences":[],
        "temperature":0,
        "topP":0.9
    }
})

Then, I use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body,
	modelId="amazon.titan-text-premier-v1:0",
    accept="application/json", 
    contentType="application/json"
)

response_body = json.loads(response.get('body').read())
print(response_body.get('results')[0].get('outputText'))

And here’s the response:

Amazon Titan Image Generator is now generally available in Amazon Bedrock, giving you an easy way to build and scale generative AI applications with new image generation and image editing capabilities, including instant customization of images. Watermark detection for Titan Image Generator is now generally available in the Amazon Bedrock console. Today, we’re also introducing a new DetectGeneratedContent API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

For more examples in different programming languages, check out the code examples section in the Amazon Bedrock User Guide.

More resources
Here are some additional resources that you might find helpful:

Intended use cases and more — Check out the AWS AI Service Card for Amazon Titan Text Premier to learn more about the models’ intended use cases, design, and deployment, as well as performance optimization best practices.

AWS Generative AI CDK Constructs — Amazon Titan Text Premier is supported by the AWS Generative AI CDK Constructs, an open source extension of the AWS Cloud Development Kit (AWS CDK), providing sample implementations of AWS CDK for common generative AI patterns.

Amazon Titan models — If you’re curious to learn more about Amazon Titan models in general, check out the following video. Dr. Sherry Marcus, Director of Applied Science for Amazon Bedrock, shares how the Amazon Titan family of models incorporates the 25 years of experience Amazon has innovating with AI and machine learning (ML) across its business.

Now available
Amazon Titan Text Premier is available today in the AWS US East (N. Virginia) Region. Custom fine-tuning for Amazon Titan Text Premier is available today in preview in the AWS US East (N. Virginia) Region. Check the full Region list for future updates. To learn more about the Amazon Titan family of models, visit the Amazon Titan product page. For pricing details, review the Amazon Bedrock pricing page.

Give Amazon Titan Text Premier a try in the Amazon Bedrock console today, send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

Build generative AI applications with Amazon Bedrock Studio (preview)

2024-05-07 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/build-generative-ai-applications-with-amazon-bedrock-studio-preview/

Today, we’re introducing Amazon Bedrock Studio, a new web-based generative artificial intelligence (generative AI) development experience, in public preview. Amazon Bedrock Studio accelerates the development of generative AI applications by providing a rapid prototyping environment with key Amazon Bedrock features, including Knowledge Bases, Agents, and Guardrails.

As a developer, you can now use your company’s single sign-on credentials to sign in to Bedrock Studio and start experimenting. You can build applications using a wide array of top performing models, evaluate, and share your generative AI apps within Bedrock Studio. The user interface guides you through various steps to help improve a model’s responses. You can experiment with model settings, and securely integrate your company data sources, tools, and APIs, and set guardrails. You can collaborate with team members to ideate, experiment, and refine your generative AI applications—all without requiring advanced machine learning (ML) expertise or AWS Management Console access.

As an Amazon Web Services (AWS) administrator, you can be confident that developers will only have access to the features provided by Bedrock Studio, and won’t have broader access to AWS infrastructure and services.

Now, let me show you how to get started with Amazon Bedrock Studio.

Get started with Amazon Bedrock Studio
As an AWS administrator, you first need to create an Amazon Bedrock Studio workspace, then select and add users you want to give access to the workspace. Once the workspace is created, you can share the workspace URL with the respective users. Users with access privileges can sign in to the workspace using single sign-on, create projects within their workspace, and start building generative AI applications.

Create Amazon Bedrock Studio workspace
Navigate to the Amazon Bedrock console and choose Bedrock Studio on the bottom left pane.

Before creating a workspace, you need to configure and secure the single sign-on integration with your identity provider (IdP) using the AWS IAM Identity Center. For detailed instructions on how to configure various IdPs, such as AWS Directory Service for Microsoft Active Directory, Microsoft Entra ID, or Okta, check out the AWS IAM Identity Center User Guide. For this demo, I configured user access with the default IAM Identity Center directory.

Next, choose Create workspace, enter your workspace details, and create any required AWS Identity and Access Management (IAM) roles.

If you want, you can also select default generative AI models and embedding models for the workspace. Once you’re done, choose Create.

Next, select the created workspace.

Then, choose User management and Add users or groups to select the users you want to give access to this workspace.

Back in the Overview tab, you can now copy the Bedrock Studio URL and share it with your users.

Build generative AI applications using Amazon Bedrock Studio
As a builder, you can now navigate to the provided Bedrock Studio URL and sign in with your single sign-on user credentials. Welcome to Amazon Bedrock Studio! Let me show you how to choose from industry leading FMs, bring your own data, use functions to make API calls, and safeguard your applications using guardrails.

Choose from multiple industry leading FMs
By choosing Explore, you can start selecting available FMs and explore the models using natural language prompts.

If you choose Build, you can start building generative AI applications in a playground mode, experiment with model configurations, iterate on system prompts to define the behavior of your application, and prototype new features.

Bring your own data
With Bedrock Studio, you can securely bring your own data to customize your application by providing a single file or by selecting a knowledge base created in Amazon Bedrock.

Use functions to make API calls and make model responses more relevant
A function call allows the FM to dynamically access and incorporate external data or capabilities when responding to a prompt. The model determines which function it needs to call based on an OpenAPI schema that you provide.

Functions enable a model to include information in its response that it doesn’t have direct access to or prior knowledge of. For example, a function could allow the model to retrieve and include the current weather conditions in its response, even though the model itself doesn’t have that information stored.

Safeguard your applications using Guardrails for Amazon Bedrock
You can create guardrails to promote safe interactions between users and your generative AI applications by implementing safeguards customized to your use cases and responsible AI policies.

When you create applications in Amazon Bedrock Studio, the corresponding managed resources such as knowledge bases, agents, and guardrails are automatically deployed in your AWS account. You can use the Amazon Bedrock API to access those resources in downstream applications.

Here’s a short demo video of Amazon Bedrock Studio created by my colleague Banjo Obayomi.

Join the preview
Amazon Bedrock Studio is available today in public preview in AWS Regions US East (N. Virginia) and US West (Oregon). To learn more, visit the Amazon Bedrock Studio page and User Guide.

Give Amazon Bedrock Studio a try today and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

Amazon Titan Image Generator and watermark detection API are now available in Amazon Bedrock

2024-04-23 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-titan-image-generator-and-watermark-detection-api-are-now-available-in-amazon-bedrock/

During AWS re:Invent 2023, we announced the preview of Amazon Titan Image Generator, a generative artificial intelligence (generative AI) foundation model (FM) that you can use to quickly create and refine realistic, studio-quality images using English natural language prompts.

I’m happy to share that Amazon Titan Image Generator is now generally available in Amazon Bedrock, giving you an easy way to build and scale generative AI applications with new image generation and image editing capabilities, including instant customization of images.

In my previous post, I also mentioned that all images generated by Titan Image Generator contain an invisible watermark, by default, which is designed to help reduce the spread of misinformation by providing a mechanism to identify AI-generated images.

I’m excited to announce that watermark detection for Titan Image Generator is now generally available in the Amazon Bedrock console. Today, we’re also introducing a new DetectGeneratedContent API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

Let me show you how to get started with these new capabilities.

Instant image customization using Amazon Titan Image Generator
You can now generate new images of a subject by providing up to five reference images. You can create the subject in different scenes while preserving its key features, transfer the style from the reference images to new images, or mix styles from multiple reference images. All this can be done without additional prompt engineering or fine-tuning of the model.

For this demo, I prompt Titan Image Generator to create an image of a “parrot eating a banana.” In the first attempt, I use Titan Image Generator to create this new image without providing a reference image.

Note: In the following code examples, I’ll use the AWS SDK for Python (Boto3) to interact with Amazon Bedrock. You can find additional code examples for C#/.NET, Go, Java, and PHP in the Bedrock User Guide.

import boto3
import json

bedrock_runtime = boto3.client(service_name="bedrock-runtime")

body = json.dumps(
    {
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": "parrot eating a banana",   
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,   
            "quality": "premium", 
            "height": 768,
            "width": 1280,
            "cfgScale": 10, 
            "seed": 42
        }
    }
)
response = bedrock_runtime.invoke_model(
    body=body, 
    modelId="amazon.titan-image-generator-v1",
    accept="application/json", 
    contentType="application/json"
)

You can display the generated image using the following code.

import io
import base64
from PIL import Image

response_body = json.loads(response.get("body").read())

images = [
    Image.open(io.BytesIO(base64.b64decode(base64_image)))
    for base64_image in response_body.get("images")
]

for img in images:
    display(img)

Here’s the generated image:

Then, I use the new instant image customization capability with the same prompt, but now also providing the following two reference images. For easier comparison, I’ve resized the images, added a caption, and plotted them side by side.

Here’s the code. The new instant customization is available through the IMAGE_VARIATION task:

# Import reference images
image_path_1 = "parrot-cartoon.png"
image_path_2 = "bird-sketch.png"

with open(image_path_1, "rb") as image_file:
    input_image_1 = base64.b64encode(image_file.read()).decode("utf8")

with open(image_path_2, "rb") as image_file:
    input_image_2 = base64.b64encode(image_file.read()).decode("utf8")

# ImageVariationParams options:
#   text: Prompt to guide the model on how to generate variations
#   images: Base64 string representation of a reference image, up to 5 images are supported
#   similarityStrength: Parameter you can tune to control similarity with reference image(s)

body = json.dumps(
    {
        "taskType": "IMAGE_VARIATION",
        "imageVariationParams": {
            "text": "parrot eating a banana",  # Required
            "images": [input_image_1, input_image_2],  # Required 1 to 5 images
            "similarityStrength": 0.7,  # Range: 0.2 to 1.0
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,
            "quality": "premium",
            "height": 768,
            "width": 1280,
            "cfgScale": 10,
            "seed": 42
        }
    }
)

response = bedrock_runtime.invoke_model(
    body=body, 
    modelId="amazon.titan-image-generator-v1",
    accept="application/json", 
    contentType="application/json"
)

Once again, I’ve resized the generated image, added a caption, and plotted it side by side with the originally generated image.

You can see how the parrot in the second image that has been generated using the instant image customization capability resembles in style the combination of the provided reference images.

Watermark detection for Amazon Titan Image Generator
All Amazon Titan FMs are built with responsible AI in mind. They detect and remove harmful content from data, reject inappropriate user inputs, and filter model outputs. As content creators create realistic-looking images with AI, it’s important to promote responsible development of this technology and reduce the spread of misinformation. That’s why all images generated by Titan Image Generator contain an invisible watermark, by default. Watermark detection is an innovative technology, and Amazon Web Services (AWS) is among the first major cloud providers to widely release built-in watermarks for AI image outputs.

Titan Image Generator’s new watermark detection feature is a mechanism that allows you to identify images generated by Amazon Titan. These watermarks are designed to be tamper-resistant, helping increase transparency around AI-generated content as these capabilities continue to advance.

Watermark detection using the console
Watermark detection is generally available in the Amazon Bedrock console. You can upload an image to detect watermarks embedded in images created by Titan Image Generator, including those generated by the base model and any customized versions. If you upload an image that was not created by Titan Image Generator, then the model will indicate that a watermark has not been detected.

The watermark detection feature also comes with a confidence score. The confidence score represents the confidence level in watermark detection. In some cases, the detection confidence may be low if the original image has been modified. This new capability enables content creators, news organizations, risk analysts, fraud detection teams, and others to better identify and mitigate misleading AI-generated content, promoting transparency and responsible AI deployment across organizations.

Watermark detection using the API (preview)
In addition to watermark detection using the console, we’re introducing a new DetectGeneratedContent API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator. Let’s see how this works.

For this demo, let’s check if the image of the green iguana I showed in the Titan Image Generator preview post was indeed generated by the model.

I define the imports, set up the Amazon Bedrock boto3 runtime client, and base64-encode the image. Then, I call the DetectGeneratedContent API by specifying the foundation model and providing the encoded image.

import boto3
import json
import base64

bedrock_runtime = boto3.client(service_name="bedrock-runtime")

image_path = "green-iguana.png"

with open(image_path, "rb") as image_file:
    input_image_iguana = image_file.read()

response = bedrock_runtime.detect_generated_content(
    foundationModelId = "amazon.titan-image-generator-v1",
    content = {
        "imageContent": { "bytes": input_image_iguana }
    }
)

Let’s check the response.

response.get("detectionResult")
'GENERATED'
response.get("confidenceLevel")
'HIGH'

The response GENERATED with the confidence level HIGH confirms that Amazon Bedrock detected a watermark generated by Titan Image Generator.

Now, let’s check another image I generated using Stable Diffusion XL 1.0 on Amazon Bedrock. In this case, a “meerkat facing the sunset.”

I call the API again, this time with the image of the meerkat.

image_path = "meerkat.png"

with open(image_path, "rb") as image_file:
    input_image_meerkat = image_file.read()

response = bedrock_runtime.detect_generated_content(
    foundationModelId = "amazon.titan-image-generator-v1",
    content = {
        "imageContent": { "bytes": input_image_meerkat }
    }
)

response.get("detectionResult")
'NOT_GENERATED'

And indeed, the response NOT_GENERATED tells me that there was no watermark by Titan Image Generator detected, and therefore, the image most likely wasn’t generated by the model.

Using Amazon Titan Image Generator and watermark detection in the console
Here’s a short demo of how to get started with Titan Image Generator and the new watermark detection feature in the Amazon Bedrock console, put together by my colleague Nirbhay Agarwal.

Availability
Amazon Titan Image Generator, the new instant customization capabilities, and watermark detection in the Amazon Bedrock console are available today in the AWS Regions US East (N. Virginia) and US West (Oregon). Check the full Region list for future updates. The new DetectGeneratedContent API in Amazon Bedrock is available today in public preview in the AWS Regions US East (N. Virginia) and US West (Oregon).

Amazon Titan Image Generator, now also available in PartyRock
Titan Image Generator is now also available in PartyRock, an Amazon Bedrock playground. PartyRock gives you a no-code, AI-powered app-building experience that doesn’t require a credit card. You can use PartyRock to create apps that generate images in seconds by selecting from your choice of image generation models from Stability AI and Amazon.

More resources
To learn more about the Amazon Titan family of models, visit the Amazon Titan product page. For pricing details, check Amazon Bedrock Pricing.

Give Amazon Titan Image Generator a try in PartyRock or explore the model’s advanced image generation and editing capabilities in the Amazon Bedrock console. Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts.

For more deep-dive technical content and to engage with the generative AI Builder community, visit our generative AI space at community.aws.

— Antje

AWS Weekly Roundup — Claude 3 Haiku in Amazon Bedrock, AWS CloudFormation optimizations, and more — March 18, 2024

2024-03-18 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-3-haiku-in-amazon-bedrock-aws-cloudformation-optimizations-and-more-march-18-2024/

Storage, storage, storage! Last week, we celebrated 18 years of innovation on Amazon Simple Storage Service (Amazon S3) at AWS Pi Day 2024. Amazon S3 mascot Buckets joined the celebrations and had a ton of fun! The 4-hour live stream was packed with puns, pie recipes powered by PartyRock, demos, code, and discussions about generative AI and Amazon S3.

AWS Pi Day 2024 — Twitch live stream on March 14, 2024

In case you missed the live stream, you can watch the recording. We’ll also update the AWS Pi Day 2024 post on community.aws this week with show notes and session clips.

Last week’s launches
Here are some launches that got my attention:

Anthropic’s Claude 3 Haiku model is now available in Amazon Bedrock — Anthropic recently introduced the Claude 3 family of foundation models (FMs), comprising Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Claude 3 Haiku, the fastest and most compact model in the family, is now available in Amazon Bedrock. Check out Channy’s post for more details. In addition, my colleague Mike shows how to get started with Haiku in Amazon Bedrock in his video on community.aws.

Up to 40 percent faster stack creation with AWS CloudFormation — AWS CloudFormation now creates stacks up to 40 percent faster and has a new event called CONFIGURATION_COMPLETE. With this event, CloudFormation begins parallel creation of dependent resources within a stack, speeding up the whole process. The new event also gives users more control to shortcut their stack creation process in scenarios where a resource consistency check is unnecessary. To learn more, read this AWS DevOps Blog post.

Amazon SageMaker Canvas extends its model registry integration — SageMaker Canvas has extended its model registry integration to include time series forecasting models and models fine-tuned through SageMaker JumpStart. Users can now register these models to the SageMaker Model Registry with just a click. This enhancement expands the model registry integration to all problem types supported in Canvas, such as regression/classification tabular models and CV/NLP models. It streamlines the deployment of machine learning (ML) models to production environments. Check the Developer Guide for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items, open source projects, and Twitch shows that you might find interesting:

Build On Generative AI — Season 3 of your favorite weekly Twitch show about all things generative AI is in full swing! Streaming every Monday, 9:00 US PT, my colleagues Tiffany and Darko discuss different aspects of generative AI and invite guest speakers to demo their work. In today’s episode, guest Martyn Kilbryde showed how to build a JIRA Agent powered by Amazon Bedrock. Check out show notes and the full list of episodes on community.aws.

Amazon S3 Connector for PyTorch — The Amazon S3 Connector for PyTorch now lets PyTorch Lightning users save model checkpoints directly to Amazon S3. Saving PyTorch Lightning model checkpoints is up to 40 percent faster with the Amazon S3 Connector for PyTorch than writing to Amazon Elastic Compute Cloud (Amazon EC2) instance storage. You can now also save, load, and delete checkpoints directly from PyTorch Lightning training jobs to Amazon S3. Check out the open source project on GitHub.

AWS open source news and updates — My colleague Ricardo writes this weekly open source newsletter in which he highlights new open source projects, tools, and demos from the AWS Community.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS at NVIDIA GTC 2024 — The NVIDIA GTC 2024 developer conference is taking place this week (March 18–21) in San Jose, CA. If you’re around, visit AWS at booth #708 to explore generative AI demos and get inspired by AWS, AWS Partners, and customer experts on the latest offerings in generative AI, robotics, and advanced computing at the in-booth theatre. Check out the AWS sessions and request 1:1 meetings.

AWS Summits — It’s AWS Summit season again! The first one is Paris (April 3), followed by Amsterdam (April 9), Sydney (April 10–11), London (April 24), Berlin (May 15–16), and Seoul (May 16–17). AWS Summits are a series of free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Knowledge Bases for Amazon Bedrock now supports Amazon Aurora PostgreSQL and Cohere embedding models

2024-02-12 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/knowledge-bases-for-amazon-bedrock-now-supports-amazon-aurora-postgresql-and-cohere-embedding-models/

During AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for Retrieval Augmented Generation (RAG).

In my previous post, I described how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you. You specify the location of your data, select an embedding model to convert the data into vector embeddings, and have Amazon Bedrock create a vector store in your AWS account to store the vector data, as shown in the following figure. You can also customize the RAG workflow, for example, by specifying your own custom vector store.

Since my previous post in November, there have been a number of updates to Knowledge Bases, including the availability of Amazon Aurora PostgreSQL-Compatible Edition as an additional custom vector store option next to vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud. But that’s not all. Let me give you a quick tour of what’s new.

Additional choice for embedding model
The embedding model converts your data, such as documents, into vector embeddings. Vector embeddings are numeric representations of text data within your documents. Each embedding aims to capture the semantic or contextual meaning of the data.

Cohere Embed v3 – In addition to Amazon Titan Text Embeddings, you can now also choose from two additional embedding models, Cohere Embed English and Cohere Embed Multilingual, each supporting 1,024 dimensions.

Check out the Cohere Blog to learn more about Cohere Embed v3 models.

Additional choice for vector stores
Each vector embedding is put into a vector store, often with additional metadata such as a reference to the original content the embedding was created from. The vector store indexes the stored vector embeddings, which enables quick retrieval of relevant data.

Knowledge Bases gives you a fully managed RAG experience that includes creating a vector store in your account to store the vector data. You can also select a custom vector store from the list of supported options and provide the vector database index name as well as index field and metadata field mappings.

We have made three recent updates to vector stores that I want to highlight: The addition of Amazon Aurora PostgreSQL-Compatible and Pinecone serverless to the list of supported custom vector stores, as well as an update to the existing Amazon OpenSearch Serverless integration that helps to reduce cost for development and testing workloads.

Amazon Aurora PostgreSQL – In addition to vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud, you can now also choose Amazon Aurora PostgreSQL as your vector database for Knowledge Bases.

Aurora is a relational database service that is fully compatible with MySQL and PostgreSQL. This allows existing applications and tools to run without the need for modification. Aurora PostgreSQL supports the open source pgvector extension, which allows it to store, index, and query vector embeddings.

Many of Aurora’s features for general database workloads also apply to vector embedding workloads:

Aurora offers up to 3x the database throughput when compared to open source PostgreSQL, extending to vector operations in Amazon Bedrock.
Aurora Serverless v2 provides elastic scaling of storage and compute capacity based on real-time query load from Amazon Bedrock, ensuring optimal provisioning.
Aurora global database provides low-latency global reads and disaster recovery across multiple AWS Regions.
Blue/green deployments replicate the production database in a synchronized staging environment, allowing modifications without affecting the production environment.
Aurora Optimized Reads on Amazon EC2 R6gd and R6id instances use local storage to enhance read performance and throughput for complex queries and index rebuild operations. With vector workloads that don’t fit into memory, Aurora Optimized Reads can offer up to 9x better query performance over Aurora instances of the same size.
Aurora seamlessly integrates with AWS services such as Secrets Manager, IAM, and RDS Data API, enabling secure connections from Amazon Bedrock to the database and supporting vector operations using SQL.

For a detailed walkthrough of how to configure Aurora for Knowledge Bases, check out this post on the AWS Database Blog and the User Guide for Aurora.

Pinecone serverless – Pinecone recently introduced Pinecone serverless. If you choose Pinecone as a custom vector store in Knowledge Bases, you can provide either Pinecone or Pinecone serverless configuration details. Both options are supported.

Reduce cost for development and testing workloads in Amazon OpenSearch Serverless
When you choose the option to quickly create a new vector store, Amazon Bedrock creates a vector index in Amazon OpenSearch Serverless in your account, removing the need to manage anything yourself.

Since becoming generally available in November, vector engine for Amazon OpenSearch Serverless gives you the choice to disable redundant replicas for development and testing workloads, reducing cost. You can start with just two OpenSearch Compute Units (OCUs), one for indexing and one for search, cutting the costs in half compared to using redundant replicas. Additionally, fractional OCU billing further lowers costs, starting with 0.5 OCUs and scaling up as needed. For development and testing workloads, a minimum of 1 OCU (split between indexing and search) is now sufficient, reducing cost by up to 75 percent compared to the 4 OCUs required for production workloads.

Usability improvement – Redundant replicas disabled is now the default selection when you choose the quick-create workflow in Knowledge Bases for Amazon Bedrock. Optionally, you can create a collection with redundant replicas by selecting Update to production workload.

For more details on vector engine for Amazon OpenSearch Serverless, check out Channy’s post.

Additional choice for FM
At runtime, the RAG workflow starts with a user query. Using the embedding model, you create a vector embedding representation of the user’s input prompt. This embedding is then used to query the database for similar vector embeddings to retrieve the most relevant text as the query result. The query result is then added to the original prompt, and the augmented prompt is passed to the FM. The model uses the additional context in the prompt to generate the completion, as shown in the following figure.

Anthropic Claude 2.1 – In addition to Anthropic Claude Instant 1.2 and Claude 2, you can now choose Claude 2.1 for Knowledge Bases. Compared to previous Claude models, Claude 2.1 doubles the supported context window size to 200 K tokens.

Check out the Anthropic Blog to learn more about Claude 2.1.

Now available
Knowledge Bases for Amazon Bedrock, including the additional choice in embedding models, vector stores, and FMs, is available in the AWS Regions US East (N. Virginia) and US West (Oregon).

Learn more

Read more about Knowledge Bases for Amazon Bedrock

— Antje

AWS Weekly Roundup — Amazon ECS, RDS for MySQL, EMR Studio, AWS Community, and more — January 22, 2024

2024-01-22 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-ecs-rds-for-mysql-emr-studio-aws-community-and-more-january-22-2024/

As usual, a lot has happened in the Amazon Web Services (AWS) universe this past week. I’m also excited about all the AWS Community events and initiatives that are happening around the world. Let’s take a look together!

Last week’s launches
Here are some launches that got my attention:

Amazon Elastic Container Service (Amazon ECS) now supports managed instance draining – Managed instance draining allows you to gracefully shutdown workloads deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances by safely stopping and rescheduling them to other, non-terminating instances. This new capability streamlines infrastructure maintenance, such as deploying a new AMI version, eliminating the need for custom solutions to shutdown instances without disrupting their workloads. To learn more, check out Nathan’s post on the AWS Containers Blog.

Amazon Relational Database Service (Amazon RDS) for MySQL now supports multi-source replication – Using multi-source replication, you can configure multiple RDS for MySQL database instances as sources for a single target database instance. This feature facilitates tasks such as merging shards into a single target, consolidating data for analytics, or creating long-term backups within a single RDS for MySQL instance. The Amazon RDS for MySQL User Guide has all the details.

Amazon EMR Studio now comes with simplified create experience and improved start times – With the simplified console experience for creating EMR Studio, you can launch interactive and batch workloads with default settings more easily. The improved start times let you launch EMR Studio Workspaces for performing interactive analysis in notebooks in seconds. Have a look at the Amazon EMR User Guide to learn more.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, programs, and news items that you might find interesting:

Summarize news using Amazon Bedrock – My colleague Danilo built this application to summarize the most recent news from an RSS or Atom feed using Amazon Bedrock. The application is deployed as an AWS Lambda function. The function downloads the most recent entries from an RSS or Atom feed, downloads the linked content, extracts text, and makes a summary.

AWS Community Builders program – Interested in joining our AWS Community Builders program? The 2024 application is open until January 28. The AWS Community Builders program offers technical resources, education, and networking opportunities to AWS technical enthusiasts who are passionate about sharing knowledge and connecting with the technical community.

AWS User Groups – The AWS User Group Yaounde Cameroon embarked on a 12-week workshop challenge. Over 12 weeks, participants explored various aspects of AWS and cloud computing, including architecture, security, storage, and more, to develop skills and share knowledge. You can read more about this amazing initiative in this LinkedIn post.

AWS open-source news and updates – My colleague Ricardo writes this weekly open source newsletter in which he highlights new open source projects, tools, and demos from the AWS Community.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Innovate: AI/ML and Data Edition – Register now for the Asia Pacific & Japan AWS Innovate online conference on February 22, 2024, to explore, discover, and learn how to innovate with artificial intelligence (AI) and machine learning (ML). Choose from over 50 sessions in three languages and get hands-on with technical demos aimed at generative AI builders.

AWS Community re:Invent re:Caps – Join a Community re:Cap event organized by volunteers from AWS User Groups and AWS Cloud Clubs around the world to learn about the latest announcements from AWS re:Invent.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Amazon SageMaker Studio adds web-based interface, Code Editor, flexible workspaces, and streamlines user onboarding

2023-11-30 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-sagemaker-studio-adds-web-based-interface-code-editor-flexible-workspaces-and-streamlines-user-onboarding/

Today, we are announcing an improved Amazon SageMaker Studio experience! The new SageMaker Studio web-based interface loads faster and provides consistent access to your preferred integrated development environment (IDE) and SageMaker resources and tooling, irrespective of your IDE choice. In addition to JupyterLab and RStudio, SageMaker Studio now includes a fully managed Code Editor based on Code-OSS (Visual Studio Code Open Source).

Both Code Editor and JupyterLab can be launched using a flexible workspace. With spaces, you can scale the compute and storage for your IDE up and down as you go, customize runtime environments, and pause-and-resume coding anytime from anywhere. You can spin up multiple such spaces, each configured with a different combination of compute, storage, and runtimes.

SageMaker Studio now also comes with a streamlined onboarding and administration experience to help both individual users and enterprise administrators get started in minutes. Let me give you a quick tour of some of these highlights.

New SageMaker Studio web-based interface
The new SageMaker Studio web-based interface acts as a command center for launching your preferred IDE and accessing your SageMaker tools to build, train, tune, and deploy models. You can now view SageMaker training jobs and endpoints in SageMaker Studio and access foundation models (FMs) via SageMaker JumpStart. Also, you no longer need to manually upgrade SageMaker Studio.

New Code Editor based on Code-OSS (Visual Studio Code Open Source)
As a data scientist or machine learning (ML) practitioner, you can now sign in to SageMaker Studio and launch Code Editor directly from your browser. With Code Editor, you have access to thousands of VS Code compatible extensions from Open VSX registry and the preconfigured AWS toolkit for Visual Studio Code for developing and deploying applications on AWS. You can also use the artificial intelligence (AI)-powered coding companion and security scanning tool powered by Amazon CodeWhisperer and Amazon CodeGuru.

Launch Code Editor and JupyterLab in a flexible workspace
You can launch both Code Editor and JupyterLab using private spaces that only the user creating the space has access to. This flexible workspace is designed to provide a faster and more efficient coding environment.

The spaces come preconfigured with a SageMaker distribution that contains popular ML frameworks and Python packages. With the help of the AI-powered coding companions and security tools, you can quickly generate, debug, explain, and refactor your code.

In addition, SageMaker Studio comes with an improved collaboration experience. You can use the built-in Git integration to share and version code or bring your own shared file storage using Amazon EFS to access a collaborative filesystem across different users or teams.

Streamlined user onboarding and administration
With redesigned setup and onboarding workflows, you can now set up SageMaker Studio domains within minutes. As an individual user, you can now use a one-click experience to launch SageMaker Studio using default presets and without the need to learn about domains or AWS IAM roles.

As an enterprise administrator, step-by-step instructions help you choose the right authentication method, connect to your third-party identity providers, integrate networking and security configurations, configure fine-grained access policies, and choose the right applications to enable in SageMaker Studio. You can also update settings at any time.

To get started, navigate to the SageMaker console and select either Set up for single user or Set up for organization.

The single-user setup will start deploying a SageMaker Studio domain using default presets and will be ready within a few minutes. The setup for organizations will guide you through the configuration step-by-step. Note that you can choose to keep working with the classic SageMaker Studio experience or start exploring the new experience.

Now available
The new Amazon SageMaker Studio experience is available today in all AWS Regions where SageMaker Studio is available. Starting today, new SageMaker Studio domains will default to the new web-based interface. If you have an existing setup and want to start using the new experience, check out the SageMaker Developer Guide for instructions on how to migrate your existing domains.

Give it a try, and let us know what you think. You can send feedback to AWS re:Post for Amazon SageMaker Studio or through your usual AWS contacts.

Start building your ML projects with Amazon SageMaker Studio today!

— Antje

Package and deploy models faster with new tools and guided workflows in Amazon SageMaker

2023-11-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/package-and-deploy-models-faster-with-new-tools-and-guided-workflows-in-amazon-sagemaker/

I’m happy to share that Amazon SageMaker now comes with an improved model deployment experience to help you deploy traditional machine learning (ML) models and foundation models (FMs) faster.

As a data scientist or ML practitioner, you can now use the new ModelBuilder class in the SageMaker Python SDK to package models, perform local inference to validate runtime errors, and deploy to SageMaker from your local IDE or SageMaker Studio notebooks.

In SageMaker Studio, new interactive model deployment workflows give you step-by-step guidance on which instance type to choose to find the most optimal endpoint configuration. SageMaker Studio also provides additional interfaces to add models, test inference, and enable auto scaling policies on the deployed endpoints.

New tools in SageMaker Python SDK
The SageMaker Python SDK has been updated with new tools, including ModelBuilder and SchemaBuilder classes that unify the experience of converting models into SageMaker deployable models across ML frameworks and model servers. Model builder automates the model deployment by selecting a compatible SageMaker container and capturing dependencies from your development environment. Schema builder helps to manage serialization and deserialization tasks of model inputs and outputs. You can use the tools to deploy the model in your local development environment to experiment with it, fix any runtime errors, and when ready, transition from local testing to deploy the model on SageMaker with a single line of code.

Let me show you how this works. In the following example, I choose the Falcon-7B model from the Hugging Face model hub. I first deploy the model locally, run a sample inference, perform local benchmarking to find the optimal configuration, and finally deploy the model with the suggested configuration to SageMaker.

First, import the updated SageMaker Python SDK and define a sample model input and output that matches the prompt format for the selected model.

import sagemaker
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve import Mode

prompt = "Falcons are"
response = "Falcons are small to medium-sized birds of prey related to hawks and eagles."

sample_input = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 32}
}

sample_output = [{"generated_text": response}]

Then, create a ModelBuilder instance with the Hugging Face model ID, a SchemaBuilder instance with the sample model input and output, define a local model path, and set the mode to LOCAL_CONTAINER to deploy the model locally. The schema builder generates the required functions for serializing and deserializing the model inputs and outputs.

model_builder = ModelBuilder(
    model="tiiuae/falcon-7b",
    schema_builder=SchemaBuilder(sample_input, sample_output),
    model_path="/path/to/falcon-7b",
    mode=Mode.LOCAL_CONTAINER,
	env_vars={"HF_TRUST_REMOTE_CODE": "True"}
)

Next, call build() to convert the PyTorch model into a SageMaker deployable model. The build function generates the required artifacts for the model server, including the inferency.py and serving.properties files.

local_mode_model = model_builder.build()

For FMs, such as Falcon, you can optionally run tune() in local container mode that performs local benchmarking to find the optimal model serving configuration. This includes the tensor parallel degree that specifies the number of GPUs to use if your environment has multiple GPUs available. Once ready, call deploy() to deploy the model in your local development environment.

tuned_model = local_mode_model.tune()
tuned_model.deploy()

Let’s test the model.

updated_sample_input = model_builder.schema_builder.sample_input
print(updated_sample_input)

{'inputs': 'Falcons are',
 'parameters': {'max_new_tokens': 32}}
 
local_tuned_predictor.predict(updated_sample_input)[0]["generated_text"]

In my demo, the model returns the following response:

a type of bird that are known for their sharp talons and powerful beaks. They are also known for their ability to fly at high speeds […]

When you’re ready to deploy the model on SageMaker, call deploy() again, set the mode to SAGEMAKLER_ENDPOINT, and provide an AWS Identity and Access Management (IAM) role with appropriate permissions.

sm_predictor = tuned_model.deploy(
    mode=Mode.SAGEMAKER_ENDPOINT, 
	role="arn:aws:iam::012345678910:role/role_name"
)

This starts deploying your model on a SageMaker endpoint. Once the endpoint is ready, you can run predictions.

new_input = {'inputs': 'Eagles are','parameters': {'max_new_tokens': 32}}
sm_predictor.predict(new_input)[0]["generated_text"])

New SageMaker Studio model deployment experience
You can start the new interactive model deployment workflows by selecting one or more models to deploy from the models landing page or SageMaker JumpStart model details page or by creating a new endpoint from the endpoints details page.

The new workflows help you quickly deploy the selected model(s) with minimal inputs. If you used SageMaker Inference Recommender to benchmark your model, the dropdown will show instance recommendations from that benchmarking.

Without benchmarking your model, the dropdown will display prospective instances that SageMaker predicts could be a good fit based on its own heuristics. For some of the most popular SageMaker JumpStart models, you’ll see an AWS pretested optimal instance type. For other models, you’ll see generally recommended instance types. For example, if I select the Falcon 40B Instruct model in SageMaker JumpStart, I can see the recommended instance types.

However, if I want to optimize the deployment for cost or performance to meet my specific use cases, I could open the Alternate configurations panel to view more options based on data from before benchmarking.

Once deployed, you can test inference or manage auto scaling policies.

Things to know
Here are a couple of important things to know:

Supported ML models and frameworks – At launch, the new SageMaker Python SDK tools support model deployment for XGBoost and PyTorch models. You can deploy FMs by specifying the Hugging Face model ID or SageMaker JumpStart model ID using the SageMaker LMI container or Hugging Face TGI-based container. You can also bring your own container (BYOC) or deploy models using the Triton model server in ONNX format.

Now available
The new set of tools is available today in all AWS Regions where Amazon SageMaker real-time inference is available. There is no cost to use the new set of tools; you pay only for any underlying SageMaker resources that get created.

Learn more

Get started
Explore the new SageMaker model deployment experience in the AWS Management Console today!

— Antje

Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency

2023-11-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-sagemaker-adds-new-inference-capabilities-to-help-reduce-foundation-model-deployment-costs-and-latency/

Today, we are announcing new Amazon SageMaker inference capabilities that can help you optimize deployment costs and reduce latency. With the new inference capabilities, you can deploy one or more foundation models (FMs) on the same SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps to improve resource utilization, reduce model deployment costs on average by 50 percent, and lets you scale endpoints together with your use cases.

For each FM, you can define separate scaling policies to adapt to model usage patterns while further optimizing infrastructure costs. In addition, SageMaker actively monitors the instances that are processing inference requests and intelligently routes requests based on which instances are available, helping to achieve on average 20 percent lower inference latency.

Key components
The new inference capabilities build upon SageMaker real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.

Let me show you how this works.

New inference capabilities in action
You can start using the new inference capabilities from SageMaker Studio, the SageMaker Python SDK, and the AWS SDKs and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation.

For this demo, I use the AWS SDK for Python (Boto3) to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face model hub on a SageMaker real-time endpoint using the new inference capabilities.

Create a SageMaker endpoint configuration

import boto3
import sagemaker

role = sagemaker.get_execution_role()
sm_client = boto3.client(service_name="sagemaker")

sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ExecutionRoleArn=role,
    ProductionVariants=[{
        "VariantName": "AllTraffic",
        "InstanceType": "ml.g5.12xlarge",
        "InitialInstanceCount": 1,
		"RoutingConfig": {
            "RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"
        }
    }]
)

Create the SageMaker endpoint

sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

Before you can create the inference component, you need to create a SageMaker-compatible model and specify a container image to use. For both models, I use the Hugging Face LLM Inference Container for Amazon SageMaker. These deep learning containers (DLCs) include the necessary components, libraries, and drivers to host large models on SageMaker.

Prepare the Dolly v2 model

from sagemaker.huggingface import get_huggingface_llm_image_uri

# Retrieve the container image URI
hf_inference_dlc = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

# Configure model container
dolly7b = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'databricks/dolly-v2-7b',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "dolly-v2-7b",
    ExecutionRoleArn = role,
    Containers       = [dolly7b]
)

Prepare the FLAN-T5 XXL model

# Configure model container
flant5xxlmodel = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'google/flan-t5-xxl',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "flan-t5-xxl",
    ExecutionRoleArn = role,
    Containers       = [flant5xxlmodel]
)

Now, you’re ready to create the inference component.

Create an inference component for each model
Specify an inference component for each model you want to deploy on the endpoint. Inference components let you specify the SageMaker-compatible model and the compute and memory resources you want to allocate. For CPU workloads, define the number of cores to allocate. For accelerator workloads, define the number of accelerators. RuntimeConfig defines the number of model copies you want to deploy.

# Inference compoonent for Dolly v2 7B
sm_client.create_inference_component(
    InferenceComponentName="IC-dolly-v2-7b",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "dolly-v2-7b",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 2, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

# Inference component for FLAN-T5 XXL
sm_client.create_inference_component(
    InferenceComponentName="IC-flan-t5-xxl",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "flan-t5-xxl",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 1, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

Once the inference components have successfully deployed, you can invoke the models.

Run inference
To invoke a model on the endpoint, specify the corresponding inference component.

import json
sm_runtime_client = boto3.client(service_name="sagemaker-runtime")
payload = {"inputs": "Why is California a great place to live?"}

response_dolly = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-dolly-v2-7b",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

response_flant5 = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-flan-t5-xxl",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

result_dolly = json.loads(response_dolly['Body'].read().decode())
result_flant5 = json.loads(response_flant5['Body'].read().decode())

Next, you can define separate scaling policies for each model by registering the scaling target and applying the scaling policy to the inference component. Check out the SageMaker Developer Guide for detailed instructions.

The new inference capabilities provide per-model CloudWatch metrics and CloudWatch Logs and can be used with any SageMaker-compatible container image across SageMaker CPU- and GPU-based compute instances. Given support by the container image, you can also use response streaming.

Now available
The new Amazon SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing. To learn more, visit Amazon SageMaker.

Get started
Log in to the AWS Management Console and deploy your FMs using the new SageMaker inference capabilities today!

— Antje

Amazon SageMaker Clarify makes it easier to evaluate and select foundation models (preview)

2023-11-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-sagemaker-clarify-makes-it-easier-to-evaluate-and-select-foundation-models-preview/

I’m happy to share that Amazon SageMaker Clarify now supports foundation model (FM) evaluation (preview). As a data scientist or machine learning (ML) engineer, you can now use SageMaker Clarify to evaluate, compare, and select FMs in minutes based on metrics such as accuracy, robustness, creativity, factual knowledge, bias, and toxicity. This new capability adds to SageMaker Clarify’s existing ability to detect bias in ML data and models and explain model predictions.

The new capability provides both automatic and human-in-the-loop evaluations for large language models (LLMs) anywhere, including LLMs available in SageMaker JumpStart, as well as models trained and hosted outside of AWS. This removes the heavy lifting of finding the right model evaluation tools and integrating them into your development environment. It also simplifies the complexity of trying to adopt academic benchmarks to your generative artificial intelligence (AI) use case.

Evaluate FMs with SageMaker Clarify
With SageMaker Clarify, you now have a single place to evaluate and compare any LLM based on predefined criteria during model selection and throughout the model customization workflow. In addition to automatic evaluation, you can also use the human-in-the-loop capabilities to set up human reviews for more subjective criteria, such as helpfulness, creative intent, and style, by using your own workforce or managed workforce from SageMaker Ground Truth.

To get started with model evaluations, you can use curated prompt datasets that are purpose-built for common LLM tasks, including open-ended text generation, text summarization, question answering (Q&A), and classification. You can also extend the model evaluation with your own custom prompt datasets and metrics for your specific use case. Human-in-the-loop evaluations can be used for any task and evaluation metric. After each evaluation job, you receive an evaluation report that summarizes the results in natural language and includes visualizations and examples. You can download all metrics and reports and also integrate model evaluations into SageMaker MLOps workflows.

In SageMaker Studio, you can find Model evaluation under Jobs in the left menu. You can also select Evaluate directly from the model details page of any LLM in SageMaker JumpStart.

Select Evaluate a model to set up the evaluation job. The UI wizard will guide you through the selection of automatic or human evaluation, model(s), relevant tasks, metrics, prompt datasets, and review teams.

Once the model evaluation job is complete, you can view the results in the evaluation report.

In addition to the UI, you can also start with example Jupyter notebooks that walk you through step-by-step instructions on how to programmatically run model evaluation in SageMaker.

Evaluate models anywhere with the FMEval open source library
To run model evaluation anywhere, including models trained and hosted outside of AWS, use the FMEval open source library. The following example demonstrates how to use the library to evaluate a custom model by extending the ModelRunner class.

For this demo, I choose GPT-2 from the Hugging Face model hub and define a custom HFModelConfig and HuggingFaceCausalLLMModelRunner class that works with causal decoder-only models from the Hugging Face model hub such as GPT-2. The example is also available in the FMEval GitHub repo.

!pip install fmeval

# ModelRunners invoke FMs
from amazon_fmeval.model_runners.model_runner import ModelRunner

# Additional imports for custom model
import warnings
from dataclasses import dataclass
from typing import Tuple, Optional
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

@dataclass
class HFModelConfig:
    model_name: str
    max_new_tokens: int
    normalize_probabilities: bool = False
    seed: int = 0
    remove_prompt_from_generated_text: bool = True

class HuggingFaceCausalLLMModelRunner(ModelRunner):
    def __init__(self, model_config: HFModelConfig):
        self.config = model_config
        self.model = AutoModelForCausalLM.from_pretrained(self.config.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.config.model_name)

    def predict(self, prompt: str) -> Tuple[Optional[str], Optional[float]]:
        input_ids = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        generations = self.model.generate(
            **input_ids,
            max_new_tokens=self.config.max_new_tokens,
            pad_token_id=self.tokenizer.eos_token_id,
        )
        generation_contains_input = (
            input_ids["input_ids"][0] == generations[0][: input_ids["input_ids"].shape[1]]
        ).all()
        if self.config.remove_prompt_from_generated_text and not generation_contains_input:
            warnings.warn(
                "Your model does not return the prompt as part of its generations. "
                "`remove_prompt_from_generated_text` does nothing."
            )
        if self.config.remove_prompt_from_generated_text and generation_contains_input:
            output = self.tokenizer.batch_decode(generations[:, input_ids["input_ids"].shape[1] :])[0]
        else:
            output = self.tokenizer.batch_decode(generations, skip_special_tokens=True)[0]

        with torch.inference_mode():
            input_ids = self.tokenizer(self.tokenizer.bos_token + prompt, return_tensors="pt")["input_ids"]
            model_output = self.model(input_ids, labels=input_ids)
            probability = -model_output[0].item()

        return output, probability

Next, create an instance of HFModelConfig and HuggingFaceCausalLLMModelRunner with the model information.

hf_config = HFModelConfig(model_name="gpt2", max_new_tokens=32)
model = HuggingFaceCausalLLMModelRunner(model_config=hf_config)

Then, select and configure the evaluation algorithm.

# Let's evaluate the FM for FactualKnowledge
from amazon_fmeval.fmeval import get_eval_algorithm
from amazon_fmeval.eval_algorithms.factual_knowledge import FactualKnowledgeConfig

eval_algorithm_config = FactualKnowledgeConfig("<OR>")
eval_algorithm = get_eval_algorithm("factual_knowledge", eval_algorithm_config)

Let’s first test with one sample. The evaluation score is the percentage of factually correct responses.

model_output = model.predict("London is the capital of")[0]
print(model_output)

eval_algo.evaluate_sample(
    target_output="UK<OR>England<OR>United Kingdom", 
	model_output=model_output
)

the UK, and the UK is the largest producer of food in the world.

The UK is the world's largest producer of food in the world.
[EvalScore(name='factual_knowledge', value=1)]

Although it’s not a perfect response, it includes “UK.”

Next, you can evaluate the FM using built-in datasets or define your custom dataset. If you want to use a custom evaluation dataset, create an instance of DataConfig:

config = DataConfig(
    dataset_name="my_custom_dataset",
    dataset_uri="dataset.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="question",
    target_output_location="answer",
)

eval_output = eval_algorithm.evaluate(
    model=model, 
    dataset_config=config, 
    prompt_template="$feature", #$feature is replaced by the input value in the dataset 
    save=True
)

The evaluation results will return a combined evaluation score across the dataset and detailed results for each model input stored in a local output path.

Join the preview
FM evaluation with Amazon SageMaker Clarify is available today in public preview in AWS Regions US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). The FMEval open source library] is available on GitHub. To learn more, visit Amazon SageMaker Clarify.

Get started
Log in to the AWS Management Console and start evaluating your FMs with SageMaker Clarify today!

— Antje

Evaluate, compare, and select the best foundation models for your use case in Amazon Bedrock (preview)

2023-11-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/evaluate-compare-and-select-the-best-foundation-models-for-your-use-case-in-amazon-bedrock-preview/

I’m happy to share that you can now evaluate, compare, and select the best foundation models (FMs) for your use case in Amazon Bedrock. Model Evaluation on Amazon Bedrock is available today in preview.

Amazon Bedrock offers a choice of automatic evaluation and human evaluation. You can use automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. For subjective or custom metrics, such as friendliness, style, and alignment to brand voice, you can set up human evaluation workflows with just a few clicks.

Model evaluations are critical at all stages of development. As a developer, you now have evaluation tools available for building generative artificial intelligence (AI) applications. You can start by experimenting with different models in the playground environment. To iterate faster, add automatic evaluations of the models. Then, when you prepare for an initial launch or limited release, you can incorporate human reviews to help ensure quality.

Let me give you a quick tour of Model Evaluation on Amazon Bedrock.

Automatic model evaluation
With automatic model evaluation, you can bring your own data or use built-in, curated datasets and pre-defined metrics for specific tasks such as content summarization, question and answering, text classification, and text generation. This takes away the heavy lifting of designing and running your own model evaluation benchmarks.

To get started, navigate to the Amazon Bedrock console, then select Model evaluation under Assessment & deployment in the left menu. Create a new model evaluation and choose Automatic.

Next, follow the setup dialog to choose the FM you want to evaluate and the type of task, for example, text summarization. Select the evaluation metrics and specify a dataset—either built-in or your own.

If you bring your own dataset, make sure it’s in JSON Lines format, and each line contains all of the key-value pairs that you want to evaluate your model with for the model dimension that you want to evaluate. For example, if you want to evaluate the model on a question-answer task, you would format your data as follows (with category being optional):

{"referenceResponse":"Cantal","category":"Capitals","prompt":"Aurillac is the capital of"}
{"referenceResponse":"Bamiyan Province","category":"Capitals","prompt":"Bamiyan city is the capital of"}
{"referenceResponse":"Abkhazia","category":"Capitals","prompt":"Sokhumi is the capital of"}
...

Then, create and run the evaluation job to understand the model’s task-specific performance. Once the evaluation job is complete, you can review the results in the model evaluation report.

Human model evaluation
For human evaluation, you can have Amazon Bedrock set up human review workflows with a few clicks. You can bring your own datasets and define custom evaluation metrics, such as relevance, style, or alignment to brand voice. You also have the choice to either leverage your own internal teams as reviewers or engage an AWS managed team. This takes away the tedious effort of building and operating human evaluation workflows.

To get started, create a new model evaluation and select Human: Bring your own team or Human: AWS managed team.

If you choose an AWS managed team for human evaluation, describe your model evaluation needs, including task type, expertise of the work team, and the approximate number of prompts, along with your contact information. In the next step, an AWS expert will reach out to discuss your model evaluation project requirements in more detail. Upon review, the team will share a custom quote and project timeline.

If you choose to bring your own team, follow the setup dialog to choose the FMs you want to evaluate and the type of task, for example, text summarization. Then, select the evaluation metrics, upload your test dataset, and set up the work team.

For human evaluation, you would format the example data shown before again in JSON Lines format like this (with category and referenceResponse being optional):

{"prompt":"Aurillac is the capital of","referenceResponse":"Cantal","category":"Capitals"}
{"prompt":"Bamiyan city is the capital of","referenceResponse":"Bamiyan Province","category":"Capitals"}
{"prompt":"Senftenberg is the capital of","referenceResponse":"Oberspreewald-Lausitz","category":"Capitals"}

Once the human evaluation is completed, Amazon Bedrock generates an evaluation report with the model’s performance against your selected metrics.

Things to know
Here are a couple of important things to know:

Model support – During preview, you can evaluate and compare text-based large language models (LLMs) available on Amazon Bedrock. During preview, you can select one model for each automatic evaluation job and up to two models for each human evaluation job using your own team. For human evaluation using an AWS managed team, you can specify custom project requirements.

Pricing – During preview, AWS only charges for the model inference needed to perform the evaluation (processed input and output tokens for on-demand pricing). There will be no separate charges for human evaluation or automatic evaluation. Amazon Bedrock Pricing has all the details.

Join the preview
Automatic evaluation and human evaluation using your own work team are available today in public preview in AWS Regions US East (N. Virginia) and US West (Oregon). Human evaluation using an AWS managed team is available in public preview in AWS Region US East (N. Virginia). To learn more, visit the Amazon Bedrock Developer Experience web page and check out the User Guide.

Get started
Log in to the AWS Management Console and start exploring model evaluation in Amazon Bedrock today!

— Antje

Noise

All posts by Antje Barth

Amazon Bedrock Guardrails now supports multimodal toxicity detection with image support (preview)

Introducing the next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Introducing multi-agent collaboration capability for Amazon Bedrock (preview)

Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview)

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

Jamba 1.5 family of models by AI21 Labs is now available in Amazon Bedrock

AWS Weekly Roundup: Llama 3.1, Mistral Large 2, AWS Step Functions, AWS Certifications update, and more (July 29, 2024)

Knowledge Bases for Amazon Bedrock now supports additional data connectors (in preview)

AWS Weekly Roundup – LlamaIndex support for Amazon Neptune, force AWS CloudFormation stack deletion, and more (May 27, 2024)

Build RAG and agent-based generative AI applications with new Amazon Titan Text Premier model, available in Amazon Bedrock

Build generative AI applications with Amazon Bedrock Studio (preview)

Amazon Titan Image Generator and watermark detection API are now available in Amazon Bedrock

AWS Weekly Roundup — Claude 3 Haiku in Amazon Bedrock, AWS CloudFormation optimizations, and more — March 18, 2024

Knowledge Bases for Amazon Bedrock now supports Amazon Aurora PostgreSQL and Cohere embedding models

AWS Weekly Roundup — Amazon ECS, RDS for MySQL, EMR Studio, AWS Community, and more — January 22, 2024

Amazon SageMaker Studio adds web-based interface, Code Editor, flexible workspaces, and streamlines user onboarding

Package and deploy models faster with new tools and guided workflows in Amazon SageMaker

Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency

Amazon SageMaker Clarify makes it easier to evaluate and select foundation models (preview)

Evaluate, compare, and select the best foundation models for your use case in Amazon Bedrock (preview)

The collective thoughts of the interwebz