Tag Archives: Amazon Bedrock

AWS Weekly Roundup: AI21 Labs’ Jamba-Instruct in Amazon Bedrock, Amazon WorkSpaces Pools, and more (July 1, 2024)

2024-07-01 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-ai21-labs-jamba-instruct-in-amazon-bedrock-amazon-workspaces-pools-and-more-july-1-2024/

AWS Summit New York is 10 days away, and I am very excited about the new announcements and more than 170 sessions. There will be A Night Out with AWS event after the summit for professionals from the media and entertainment, gaming, and sports industries who are existing Amazon Web Services (AWS) customers or have a keen interest in using AWS Cloud services for their business. You’ll have the opportunity to relax, collaborate, and build new connections with AWS leaders and industry peers.

Let’s look at the last week’s new announcements.

Last week’s launches
Here are the launches that got my attention.

AI21 Labs’ Jamba-Instruct now available in Amazon Bedrock – AI21 Labs’ Jamba-Instruct is an instruction-following large language model (LLM) for reliable commercial use, with the ability to understand context and subtext, complete tasks from natural language instructions, and ingest information from long documents or financial filings. With strong reasoning capabilities, Jamba-Instruct can break down complex problems, gather relevant information, and provide structured outputs to enable uses like Q&A on calls, summarizing documents, building chatbots, and more. For more information, visit AI21 Labs in Amazon Bedrock and the Amazon Bedrock User Guide.

Amazon WorkSpaces Pools, a new feature of Amazon WorkSpaces – You can now create a pool of non-persistent virtual desktops using Amazon WorkSpaces and save costs by sharing them across users who receive a fresh desktop each time they sign in. WorkSpaces Pools provides the flexibility to support shared environments like training labs and contact centers, and some user settings like bookmarks and files stored in a central storage repository such as Amazon Simple Storage Service (Amazon S3) or Amazon FSx can be saved for improved personalization. You can use AWS Auto Scaling to automatically scale the pool of virtual desktops based on usage metrics or schedules. For pricing information, refer to the Amazon WorkSpaces Pricing page.

API-driven, OpenLineage-compatible data lineage visualization in Amazon DataZone (preview) – Amazon DataZone introduces a new data lineage feature that allows you to visualize how data moves from source to consumption across organizations. The service captures lineage events from OpenLineage-enabled systems or through API to trace data transformations. Data consumers can gain confidence in an asset’s origin, and producers can assess the impact of changes by understanding its consumption through the comprehensive lineage view. Additionally, Amazon DataZone versions lineage with each event to enable visualizing lineage at any point in time or comparing transformations across an asset or job’s history. To learn more, visit Amazon DataZone, read my News Blog post, and get started with data lineage documentation.

Knowledge Bases for Amazon Bedrock now offers observability logs – You can now monitor knowledge ingestion logs through Amazon CloudWatch, S3 buckets, or Amazon Data Firehose streams. This provides enhanced visibility into whether documents were successfully processed or encountered failures during ingestion. Having these comprehensive insights promptly ensures that you can efficiently determine when your documents are ready for use. For more details on these new capabilities, refer to the Knowledge Bases for Amazon Bedrock documentation.

Updates and expansion to the AWS Well-Architected Framework and Lens Catalog – We announced updates to the AWS Well-Architected Framework and Lens Catalog to provide expanded guidance and recommendations on architectural best practices for building secure and resilient cloud workloads. The updates reduce redundancies and enhance consistency in resources and framework structure. The Lens Catalog now includes the new Financial Services Industry Lens and updates to the Mergers and Acquisitions Lens. We also made important updates to the Change Enablement in the Cloud whitepaper. You can use the updated Well-Architected Framework and Lens Catalog to design cloud architectures optimized for your unique requirements by following current best practices.

Cross-account machine learning (ML) model sharing support in Amazon SageMaker Model Registry – Amazon SageMaker Model Registry now integrates with AWS Resource Access Manager (AWS RAM), allowing you to easily share ML models across AWS accounts. This helps data scientists, ML engineers, and governance officers access models in different accounts like development, staging, and production. You can share models in Amazon SageMaker Model Registry by specifying the model in the AWS RAM console and granting access to other accounts. This new feature is now available in all AWS Regions where SageMaker Model Registry is available except GovCloud Regions. To learn more, visit the Amazon SageMaker Developer Guide.

AWS CodeBuild supports Arm-based workloads using AWS Graviton3 – AWS CodeBuild now supports natively building and testing Arm workloads on AWS Graviton3 processors without additional configuration, providing up to 25% higher performance and 60% lower energy usage than previous Graviton processors. To learn more about CodeBuild’s support for Arm, visit our AWS CodeBuild User Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

We launched existing services and instance types in additional Regions:

Amazon RDS now supports integration with AWS Secrets Manager in the AWS GovCloud (US) Regions. RDS integration with AWS Secrets Manager improves your database security by ensuring your RDS master user password is not visible in plaintext to administrators or engineers during your database creation workflow.
Amazon OpenSearch Serverless is now available in Canada (Central) Region. OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service that makes it simple to run search and analytics workloads without the complexities of infrastructure management.
Amazon Redshift Concurrency Scaling is now available in the AWS Europe (Spain, Zurich) and Middle East (UAE) Regions. Amazon Redshift Concurrency Scaling elastically scales query processing power to provide consistently fast performance for hundreds of concurrent queries.
Amazon ElastiCache now supports Graviton3-based M7g and R7g node families in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon, N. California), Canada (Central), South America (São Paulo), Europe (Frankfurt, Ireland, London, Paris (M7g only), Spain, Stockholm), and Asia Pacific (Hyderabad, Mumbai, Seoul, Singapore, Sydney, Tokyo). These new nodes deliver up to 28% increased throughput, 21% improved P99 latency, and 25% higher networking bandwidth compared to Graviton2 nodes for improved price performance.
Amazon EC2 C6a instances are now available in Asia Pacific (Hong Kong) Region. C6a instances are powered by third-generation AMD EPYC processors with a maximum frequency of 3.6 GHz. C6a instances deliver up to 15% better price performance than comparable C5a instances.
Amazon Redshift Serverless with lower base capacity is now available in the Asia Pacific (Mumbai), Europe (Stockholm), and US West (N. California) Regions. Amazon Redshift Serverless now has a lower minimum base capacity of 8 RPUs, down from 32 RPUs, providing more flexibility to support a diverse range of small to large workloads based on price-performance requirements by measuring capacity in RPUs paid per second.
Amazon Athena Provisioned Capacity is now available in South America (São Paulo) and Europe (Spain) Regions. Provisioned Capacity is a feature of Athena that allows you to run SQL queries on fully managed, dedicated serverless resources for a fixed price and no long-term commitments.
Amazon CloudWatch Logs account-level subscription filter is now available in the AWS GovCloud (US-East, US-West) Regions, as well as Israel (Tel Aviv), and Canada West (Calgary) Regions. With this new capability, you can deliver real-time log events that are ingested into Amazon CloudWatch Logs to a Kinesis Data Streams data stream, an Amazon Data Firehose delivery stream, or an AWS Lambda function for custom processing, analysis, or delivery to other destinations using a single account level subscription filter.
Amazon Route 53 Application Recovery Controller (Route 53 ARC) zonal autoshift is now generally available in the AWS GovCloud (US-East, US-West) Regions. With this feature, you can safely and automatically shift an application’s traffic away from an Availability Zone when AWS identifies a potential failure affecting that Availability Zone.
AWS Backup support for Amazon S3 is now available in AWS Canada West (Calgary) Region. AWS Backup is a policy-based, fully managed, and cost-effective solution that enables you to centralize and automate data protection of Amazon S3 along with other AWS services (spanning compute, storage, and databases) and third-party applications.
Amazon EC2 High Memory instances with 3TiB of memory are now available in Asia Pacific (Hong Kong) Region. You can start using these new High Memory instances with On Demand and Savings Plan purchase options.

Other AWS news
Here are some additional news items that you might find interesting:

Top reasons to build and scale generative AI applications on Amazon Bedrock – Check out Jeff Barr’s video, where he discusses why our customers are choosing Amazon Bedrock to build and scale generative artificial intelligence (generative AI) applications that deliver fast value and business growth. Amazon Bedrock is becoming a preferred platform for building and scaling generative AI due to its features, innovation, availability, and security. Leading organizations across diverse sectors use Amazon Bedrock to speed their generative AI work, like creating intelligent virtual assistants, creative design solutions, document processing systems, and a lot more.

Four ways AWS is engineering infrastructure to power generative AI – We continue to optimize our infrastructure to support generative AI at scale through innovations like delivering low-latency, large-scale networking to enable faster model training, continuously improving data center energy efficiency, prioritizing security throughout our infrastructure design, and developing custom AI chips like AWS Trainium to increase computing performance while lowering costs and energy usage. Read the new blog post about how AWS is engineering infrastructure for generative AI.

AWS re:Inforce 2024 re:Cap – It’s been 2 weeks since AWS re:Inforce 2024, our annual cloud-security learning event. Check out the summary of the event prepared by Wojtek.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. To learn more about future AWS Summit events, visit the AWS Summit page. Register in your nearest city: New York (July 10), Bogotá (July 18), and Taipei (July 23–24).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in Cameroon (July 13), Aotearoa (August 15), and Nigeria (August 24).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Esra

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Build a real-time streaming generative AI application using Amazon Bedrock, Amazon Managed Service for Apache Flink, and Amazon Kinesis Data Streams

2024-06-27 Felix John

Post Syndicated from Felix John original https://aws.amazon.com/blogs/big-data/build-a-real-time-streaming-generative-ai-application-using-amazon-bedrock-amazon-managed-service-for-apache-flink-and-amazon-kinesis-data-streams/

Generative artificial intelligence (AI) has gained a lot of traction in 2024, especially around large language models (LLMs) that enable intelligent chatbot solutions. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to help you build generative AI applications with security, privacy, and responsible AI. Use cases around generative AI are vast and go well beyond chatbot applications; for instance, generative AI can be used for analysis of input data such as sentiment analysis of reviews.

Most businesses generate data continuously in real-time. Internet of Things (IoT) sensor data, application log data from your applications, or clickstream data generated by users of your website are only some examples of continuously generated data. In many situations, the ability to process this data quickly (in real-time or near real-time) helps businesses increase the value of insights they get from their data.

One option to process data in real-time is using stream processing frameworks such as Apache Flink. Flink is a framework and distributed processing engine for processing data streams. AWS provides a fully managed service for Apache Flink through Amazon Managed Service for Apache Flink, which enables you to build and deploy sophisticated streaming applications without setting up infrastructure and managing resources.

Data streaming enables generative AI to take advantage of real-time data and provide businesses with rapid insights. This post looks at how to integrate generative AI capabilities when implementing a streaming architecture on AWS using managed services such as Managed Service for Apache Flink and Amazon Kinesis Data Streams for processing streaming data and Amazon Bedrock to utilize generative AI capabilities. We focus on the use case of deriving review sentiment in real-time from customer reviews in online shops. We include a reference architecture and a step-by-step guide on infrastructure setup and sample code for implementing the solution with the AWS Cloud Development Kit (AWS CDK). You can find the code to try it out yourself on the GitHub repo.

Solution overview

The following diagram illustrates the solution architecture. The architecture diagram depicts the real-time streaming pipeline in the upper half and the details on how you gain access to the Amazon OpenSearch Service dashboard in the lower half.

Architecture Overview

The real-time streaming pipeline consists of a producer that is simulated by running a Python script locally that is sending reviews to a Kinesis Data Stream. The reviews are from the Large Movie Review Dataset and contain positive or negative sentiment. The next step is the ingestion to the Managed Service for Apache Flink application. From within Flink, we are asynchronously calling Amazon Bedrock (using Anthropic Claude 3 Haiku) to process the review data. The results are then ingested into an OpenSearch Service cluster for visualization with OpenSearch Dashboards. We directly call the PutRecords API of Kinesis Data Streams within the Python script for the sake of simplicity and to cost-effectively run this example. You should consider using an Amazon API Gateway REST API as a proxy in front of Kinesis Data Streams when using a similar architecture in production, as described in Streaming Data Solution for Amazon Kinesis.

To gain access to the OpenSearch dashboard, we need to use a bastion host that is deployed in the same private subnet within your virtual private cloud (VPC) as your OpenSearch Service cluster. To connect with the bastion host, we use Session Manager, a capability of Amazon Systems Manager, which allows us to connect to our bastion host securely without having to open inbound ports. To access it, we use Session Manager to port forward the OpenSearch dashboard to our localhost.

The walkthrough consists of the following high-level steps:

Create the Flink application by building the JAR file.
Deploy the AWS CDK stack.
Set up and connect to OpenSearch Dashboards.
Set up the streaming producer.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account.
Java 11 or later.
Apache Maven 3.9.6 or later.
The AWS Command Line Interface (AWS CLI) installed. For instructions, refer to Get started with the AWS CLI.
The AWS CDK installed. For instructions, refer to Install the AWS CDK.
Python 3.9 or later.
The Session Manager plugin installed. The plugin is required for access to OpenSearch Dashboards using Session Manager. For instructions, refer to Install the Session Manager plugin for the AWS CLI.
Model access to Anthropic’s Claude model on Amazon Bedrock. For instructions, refer to Add model access.
The dataset used is the Large Movie Review Dataset from the following paper: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Implementation details

This section focuses on the Flink application code of this solution. You can find the code on GitHub. The StreamingJob.java file inside the flink-async-bedrock directory file serves as entry point to the application. The application uses the FlinkKinesisConsumer, which is a connector for reading streaming data from a Kinesis Data Stream. It applies a map transformation to convert each input string into an instance of Review class object, resulting in DataStream<Review> to ease processing.

The Flink application uses the helper class AsyncDataStream defined in the StreamingJob.java file to incorporate an asynchronous, external operation into Flink. More specifically, the following code creates an asynchronous data stream by applying the AsyncBedrockRequest function to each element in the inputReviewStream. The application uses unorderedWait to increase throughput and reduce idle time because event ordering is not required. The timeout is set to 25,000 milliseconds to give the Amazon Bedrock API enough time to process long reviews. The maximum concurrency or capacity is limited to 1,000 requests at a time. See the following code:

DataStream<ProcessedReview> processedReviewStream = AsyncDataStream.unorderedWait(inputReviewStream, new AsyncBedrockRequest(applicationProperties), 25000, TimeUnit.MILLISECONDS, 1000).uid("processedReviewStream");

The Flink application initiates asynchronous calls to the Amazon Bedrock API, invoking the Anthropic Claude 3 Haiku foundation model for each incoming event. We use Anthropic Claude 3 Haiku on Amazon Bedrock because it is Anthropic’s fastest and most compact model for near-instant responsiveness. The following code snippet is part of the AsyncBedrockRequest.java file and illustrates how we set up the required configuration to call the Anthropic’s Claude Messages API to invoke the model:

@Override
public void asyncInvoke(Review review, final ResultFuture<ProcessedReview> resultFuture) throws Exception {

    // [..]

    JSONObject user_message = new JSONObject()
        .put("role", "user")
        .put("content", "<review>" + reviewText + "</review>");

    JSONObject assistant_message = new JSONObject()
        .put("role", "assistant")
        .put("content", "{");

    JSONArray messages = new JSONArray()
            .put(user_message)
            .put(assistant_message);

    String payload = new JSONObject()
            .put("system", systemPrompt)
            .put("anthropic_version", "bedrock-2023-05-31")
            .put("temperature", 0.0)
            .put("max_tokens", 4096)
            .put("messages", messages)
            .toString();

    InvokeModelRequest request = InvokeModelRequest.builder()
            .body(SdkBytes.fromUtf8String(payload))
            .modelId("anthropic.claude-3-haiku-20240307-v1:0")
            .build();

    CompletableFuture<InvokeModelResponse> completableFuture = client.invokeModel(request)
            .whenComplete((response, exception) -> {
                if (exception != null) {
                    LOG.error("Model invocation failed: " + exception);
                }
            })
            .orTimeout(250000, TimeUnit.MILLISECONDS);

Prompt engineering

The application uses advanced prompt engineering techniques to guide the generative AI model’s responses and provide consistent responses. The following prompt is designed to extract a summary as well as a sentiment from a single review:

String systemPrompt = 
     "Summarize the review within the <review> tags 
     into a single and concise sentence alongside the sentiment 
     that is either positive or negative. Return a valid JSON object with 
     following keys: summary, sentiment. 
     <example> {\\\"summary\\\": \\\"The reviewer strongly dislikes the movie, 
     finding it unrealistic, preachy, and extremely boring to watch.\\\", 
     \\\"sentiment\\\": \\\"negative\\\"} 
     </example>";

The prompt instructs the Anthropic Claude model to return the extracted sentiment and summary in JSON format. To maintain consistent and well-structured output by the generative AI model, the prompt uses various prompt engineering techniques to improve the output. For example, the prompt uses XML tags to provide a clearer structure for Anthropic Claude. Moreover, the prompt contains an example to enhance Anthropic Claude’s performance and guide it to produce the desired output. In addition, the prompt pre-fills Anthropic Claude’s response by pre-filling the Assistant message. This technique helps provide a consistent output format. See the following code:

JSONObject assistant_message = new JSONObject()
    .put("role", "assistant")
    .put("content", "{");

Build the Flink application

The first step is to download the repository and build the JAR file of the Flink application. Complete the following steps:

Clone the repository to your desired workspace:

git clone https://github.com/aws-samples/aws-streaming-generative-ai-application.git

Move to the correct directory inside the downloaded repository and build the Flink application:
```
cd flink-async-bedrock && mvn clean package
```

Building Jar File

Maven will compile the Java source code and package it in a distributable JAR format in the directory flink-async-bedrock/target/ named flink-async-bedrock-0.1.jar. After you deploy your AWS CDK stack, the JAR file will be uploaded to Amazon Simple Storage Service (Amazon S3) to create your Managed Service for Apache Flink application.

Deploy the AWS CDK stack

After you build the Flink application, you can deploy your AWS CDK stack and create the required resources:

Move to the correct directory cdk and deploy the stack:
```
cd cdk && npm install & cdk deploy
```

This will create the required resources in your AWS account, including the Managed Service for Apache Flink application, Kinesis Data Stream, OpenSearch Service cluster, and bastion host to quickly connect to OpenSearch Dashboards, deployed in a private subnet within your VPC.

Take note of the output values. The output will look similar to the following:

 ✅  StreamingGenerativeAIStack

✨  Deployment time: 1414.26s

Outputs:
StreamingGenerativeAIStack.BastionHostBastionHostIdC743CBD6 = i-0970816fa778f9821
StreamingGenerativeAIStack.accessOpenSearchClusterOutput = aws ssm start-session --target i-0970816fa778f9821 --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters '{"portNumber":["443"],"localPortNumber":["8157"], "host":["vpc-generative-ai-opensearch-qfssmne2lwpzpzheoue7rkylmi.us-east-1.es.amazonaws.com"]}'
StreamingGenerativeAIStack.bastionHostIdOutput = i-0970816fa778f9821
StreamingGenerativeAIStack.domainEndpoint = vpc-generative-ai-opensearch-qfssmne2lwpzpzheoue7rkylmi.us-east-1.es.amazonaws.com
StreamingGenerativeAIStack.regionOutput = us-east-1
Stack ARN:
arn:aws:cloudformation:us-east-1:<AWS Account ID>:stack/StreamingGenerativeAIStack/3dec75f0-cc9e-11ee-9b16-12348a4fbf87

✨  Total time: 1418.61s

Set up and connect to OpenSearch Dashboards

Next, you can set up and connect to OpenSearch Dashboards. This is where the Flink application will write the extracted sentiment as well as the summary from the processed review stream. Complete the following steps:

Run the following command to establish connection to OpenSearch from your local workspace in a separate terminal window. The command can be found as output named accessOpenSearchClusterOutput.
- For Mac/Linux, use the following command:

aws ssm start-session --target <BastionHostId> --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters '{"portNumber":["443"],"localPortNumber":["8157"], "host":["<OpenSearchDomainHost>"]}'

- For Windows, use the following command:

aws ssm start-session ^
    —target <BastionHostId> ^
    —document-name AWS-StartPortForwardingSessionToRemoteHost ^    
    —parameters host="<OpenSearchDomainHost>",portNumber="443",localPortNumber="8157"

It should look similar to the following output:

Session Manager CLI

Create the required index in OpenSearch by issuing the following command:
- For Mac/Linux, use the following command:

curl --location -k --request PUT https://localhost:8157/processed_reviews \
--header 'Content-Type: application/json' \
--data-raw '{
  "mappings": {
    "properties": {
        "reviewId": {"type": "integer"},
        "userId": {"type": "keyword"},
        "summary": {"type": "keyword"},
        "sentiment": {"type": "keyword"},
        "dateTime": {"type": "date"}}}}}'

- For Windows, use the following command:

$url = https://localhost:8157/processed_reviews
$headers = @{
    "Content-Type" = "application/json"
}
$body = @{
    "mappings" = @{
        "properties" = @{
            "reviewId" = @{ "type" = "integer" }
            "userId" = @{ "type" = "keyword" }
            "summary" = @{ "type" = "keyword" }
            "sentiment" = @{ "type" = "keyword" }
            "dateTime" = @{ "type" = "date" }
        }
    }
} | ConvertTo-Json -Depth 3
Invoke-RestMethod -Method Put -Uri $url -Headers $headers -Body $body -SkipCertificateCheck

After the session is established, you can open your browser and navigate to https://localhost:8157/_dashboards. Your browser might consider the URL not secure. You can ignore this warning.
Choose Dashboards Management under Management in the navigation pane.
Choose Saved objects in the sidebar.
Import export.ndjson, which can be found in the resources folder within the downloaded repository.

OpenSearch Dashboards Upload

After you import the saved objects, you can navigate to Dashboards under My Dashboard in the navigation pane.

At the moment, the dashboard appears blank because you haven’t uploaded any review data to OpenSearch yet.

Set up the streaming producer

Finally, you can set up the producer that will be streaming review data to the Kinesis Data Stream and ultimately to the OpenSearch Dashboards. The Large Movie Review Dataset was originally published in 2011 in the paper “Learning Word Vectors for Sentiment Analysis” by Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Complete the following steps:

Download the Large Movie Review Dataset here.
After the download is complete, extract the .tar.gz file to retrieve the folder named aclImdb 3 or similar that contains the review data. Rename the review data folder to aclImdb.
Move the extracted dataset to data/ inside the repository that you previously downloaded.

Your repository should look like the following screenshot.

Folder Overview

Modify the DATA_DIR path in producer/producer.py if the review data is named differently.
Move to the producer directory using the following command:
```
cd producer
```
Install the required dependencies and start generating the data:
```
pip install -r requirements.txt && python produce.py
```

The OpenSearch dashboard should be populated after you start generating streaming data and writing it to the Kinesis Data Stream. Refresh the dashboard to view the latest data. The dashboard shows the total number of processed reviews, the sentiment distribution of the processed reviews in a pie chart, and the summary and sentiment for the latest reviews that have been processed.

When you have a closer look at the Flink application, you will notice that the application marks the sentiment field with the value error whenever there is an error with the asynchronous call made by Flink to the Amazon Bedrock API. The Flink application simply filters the correctly processed reviews and writes them to the OpenSearch dashboard.

For robust error handling, you should write any incorrectly processed reviews to a separate output stream and not discard them completely. This separation allows you to handle failed reviews differently than successful ones for simpler reprocessing, analysis, and troubleshooting.

Clean up

When you’re done with the resources you created, complete the following steps:

Delete the Python producer using Ctrl/Command + C.
Destroy your AWS CDK stack by returning to the root folder and running the following command in your terminal:
```
cd cdk && cdk destroy
```
When asked to confirm the deletion of the stack, enter yes.

Conclusion

In this post, you learned how to incorporate generative AI capabilities in your streaming architecture using Amazon Bedrock and Managed Service for Apache Flink using asynchronous requests. We also gave guidance on prompt engineering to derive the sentiment from text data using generative AI. You can build this architecture by deploying the sample code from the GitHub repository.

For more information on how to get started with Managed Service for Apache Flink, refer to Getting started with Amazon Managed Service for Apache Flink (DataStream API). For details on how to set up Amazon Bedrock, refer to Set up Amazon Bedrock. For other posts on Managed Service for Apache Flink, browse through the AWS Big Data Blog.

About the Authors

Felix John is a Solutions Architect and data streaming expert at AWS, based in Germany. He focuses on supporting small and medium businesses on their cloud journey. Outside of his professional life, Felix enjoys playing Floorball and hiking in the mountains.

Michelle Mei-Li Pfister is a Solutions Architect at AWS. She is supporting customers in retail and consumer packaged goods (CPG) industry on their cloud journey. She is passionate about topics around data and machine learning.

Uncover social media insights in real time using Amazon Managed Service for Apache Flink and Amazon Bedrock

2024-06-25 Francisco Morillo

Post Syndicated from Francisco Morillo original https://aws.amazon.com/blogs/big-data/uncover-social-media-insights-in-real-time-using-amazon-managed-service-for-apache-flink-and-amazon-bedrock/

With over 550 million active users, X (formerly known as Twitter) has become a useful tool for understanding public opinion, identifying sentiment, and spotting emerging trends. In an environment where over 500 million tweets are sent each day, it’s crucial for brands to effectively analyze and interpret the data to maximize their return on investment (ROI), which is where real-time insights play an essential role.

Amazon Managed Service for Apache Flink helps you to transform and analyze streaming data in real time with Apache Flink. Apache Flink supports stateful computation over a large volume of data in real time with exactly-once consistency guarantees. Moreover, Apache Flink’s support for fine-grained control of time with highly customizable window logic enables the implementation of the advanced business logic required for building a streaming data platform. Stream processing and generative artificial intelligence (AI) have emerged as powerful tools to harness the potential of real time data. Amazon Bedrock, along with foundation models (FMs) such as Anthropic Claude on Amazon Bedrock, empowers a new wave of AI adoption by enabling natural language conversational experiences.

In this post, we explore how to combine real-time analytics with the capabilities of generative AI and use state-of-the-art natural language processing (NLP) models to analyze tweets through queries related to your brand, product, or topic of choice. It goes beyond basic sentiment analysis and allows companies to provide actionable insights that can be used immediately to enhance customer experience. These include:

Identifying rising trends and discussion topics related to your brand
Conducting granular sentiment analysis to truly understand customers’ opinions
Detecting nuances such as emojis, acronyms, sarcasm, and irony
Spotting and addressing concerns proactively before they spread
Guiding product development based on feature requests and feedback
Creating targeted customer segments for information campaigns

This post takes a step-by-step approach to showcase how you can use Retrieval Augmented Generation (RAG) to reference real-time tweets as a context for large language models (LLMs). RAG is the process of optimizing the output of an LLM so it references an authoritative knowledge base outside of its training data sources before generating a response. LLMs are trained on vast volumes of data and use billions of parameters to generate original output for tasks such as answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It’s a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

Solution overview

In this section, we explain the flow and architecture of the application. We divide the flow of the application into two parts:

Data ingestion – Ingest data from streaming sources, convert it to vector embeddings, and then store them in a vector database
Insights retrieval – Invoke an LLM with the user queries to retrieve insights on tweets using the RAG technique

Data ingestion

The following diagram describes the data ingestion flow:

Process feeds from streaming sources, such as social media feeds, Amazon Kinesis Data Streams, or Amazon Managed Service for Apache Kafka (Amazon MSK).
Convert streaming data to vector embeddings in real time.
Store them in a vector database.

Data is ingested from a streaming source (for example, X) and processed using an Apache Flink application. Apache Flink is an open source stream processing framework. It provides powerful streaming capabilities, enabling real-time processing, stateful computations, fault tolerance, high throughput, and low latency. Apache Flink is used to process the streaming data, perform deduplication, and invoke an embedding model to create vector embeddings.

Vector embeddings are numerical representations that capture the relationships and meaning of words, sentences, and other data types. These vector embeddings will be used for semantic search or neural search to retrieve relevant information that will be used as context for the LLM to evaluate the response. After the text data is converted into vectors, the vectors are persisted in an Amazon OpenSearch Service domain, which will be used as a vector database. Unlike traditional relational databases with rows and columns, data points in a vector database are represented by vectors with a fixed number of dimensions, which are clustered based on similarity.

OpenSearch Service offers scalable and efficient similarity search capabilities tailored for handling large volumes of dense vector data. OpenSearch Service seamlessly integrates with other AWS services, enabling you to build robust data pipelines within AWS. As a fully managed service, OpenSearch Service alleviates the operational overhead of managing the underlying infrastructure, while providing essential features like approximate k-Nearest Neighbor (k-NN) search algorithms, dense vector support, and robust monitoring and logging tools through Amazon CloudWatch. These capabilities make OpenSearch Service a suitable solution for applications that require fast and accurate similarity-based retrieval tasks using vector embeddings.

This design enables real-time vector embedding, making it ideal for AI-driven applications.

Insights retrieval

The following diagram shows the flow from the user side, where the user places a query through the frontend and gets a response from the LLM model using the retrieved vector database documents as the context provided in the prompt.

As shown in the preceding figure, to retrieve insights from the LLM, first you need to receive a query from the user. The text query is then converted into vector embeddings using the same model that was used before for the tweets. It’s important to make sure the same embedding model is used for both ingestion and search. The vector embeddings are then used to perform a semantic search in the vector database to obtain the related vectors and associated text. This serves as the context for the prompt. Next, the previous conversation history (if any) is added to the prompt. This serves as the conversation history for the model. Finally, the user’s question is also included in the prompt and the LLM is invoked to get the response.

For the purpose of this post, we don’t take into consideration the conversation history or store it for later use.

Solution architecture

Now that you understand the overall process flow, let’s analyze the following architecture using AWS services step by step.

The first part of the preceding figure shows the data ingestion process:

A user authenticates with Amazon Cognito.
The user connects to the Streamlit frontend and configures the following parameters: query terms, API bearer token, and frequency to retrieve tweets.
Managed Service for Apache Flink is used to consume and process the tweets in real time and stores in Apache Flink’s state the parameters for making the API requests received from the frontend application.
The streaming application uses Apache Flink’s async I/O to invoke the Amazon Titan Embeddings model through the Amazon Bedrock API.
Amazon Bedrock returns a vector embedding for each tweet.
The Apache Flink application then writes the vector embedding with the original text of the tweet into an OpenSearch Service k-NN index.

The remainder of the architecture diagram shows the insights retrieval process:

A user sends a query through the Streamlit frontend application.
An AWS Lambda function is invoked by Amazon API Gateway, passing the user query as input.
The Lambda function uses LangChain to orchestrate the RAG process. As a first step, the function invokes the Amazon Titan Embeddings model on Amazon Bedrock to create a vector embedding for the question.
Amazon Bedrock returns the vector embedding for the question.
As a second step in the RAG orchestration process, the Lambda function performs a semantic search in OpenSearch Service and retrieves the relevant documents related to the question.
OpenSearch Service returns the relevant documents containing the tweet text to the Lambda function.
As a last step in the LangChain orchestration process, the Lambda function augments the prompt, adding the context and using few-shot prompting. The augmented prompt, including instructions, examples, context, and query, is sent to the Anthropic Claude model through the Amazon Bedrock API.
Amazon Bedrock returns the answer to the question in natural language to the Lambda function.
The response is sent back to the user through API Gateway.
API Gateway provides the response to the user question in the Streamlit application.

The solution is available in the GitHub repo. Follow the README file to deploy the solution.

Now that you understand the overall flow and architecture, let’s dive deeper into some of the key steps to understand how it works.

Amazon Bedrock chatbot UI

The Amazon Bedrock chatbot Streamlit application is designed to provide insights from tweets, whether they are real tweets ingested from the X API or simulated tweets or messages from the My Social Media application.

In the Streamlit application, we can provide the parameters that will be used to make the API requests to the X Developer API and pull the data from X. We developed an Apache Flink application that adjusts the API requests based on the provided parameters.

As parameters, you need to provide the following:

Bearer token for API authorization – This is obtained from the X Developer platform when you sign up to use the APIs.
Query terms to be used to filter the tweets consumed – You can use the search operators available in the X documentation.
Frequency of the request – The X basic API only allows you to make a request every 15 seconds. If a lower interval is set, the application won’t pull data.

The parameters are sent to Kinesis Data Streams through API Gateway and are consumed by the Apache Flink application.

My Social Media UI

The My Social Media application is a Streamlit application that serves as an additional UI. Through this application, users can compose and send messages, simulating the experience of posting on a social media site. These messages are then ingested into an AWS data pipeline consisting of API Gateway, Kinesis Data Streams, and an Apache Flink application. The Apache Flink application processes the incoming messages, invokes an Amazon Bedrock embedding model, and stores the data in an OpenSearch Service cluster.

To accommodate both real X data and simulated data from the My Social Media application, we’ve set up separate indexes within the OpenSearch Service cluster. This separation allows users to choose which data source they want to analyze or query. The Streamlit application features a sidebar option called Use X Index that acts as a toggle. When this option is enabled, the application queries and analyzes data from the index containing real tweets ingested from the X API. If the option is disabled, the application queries and displays data from the index containing messages sent through the My Social Media application.

Apache Flink is used because of its ability to scale with the increasing volume of tweets. The Apache Flink application is responsible for performing data ingestion as explained previously. Let’s dive into the details of the flow.

Consume data from X

We use Apache Flink to process the API parameters sent from the Streamlit UI. We store the parameters in Apache Flink’s state, which allows us to modify and update the parameters without having to restart the application. We use the ProcessFunction to use Apache Flink’s internal timers to schedule the frequency of requests to fetch tweets. In this post, we use X’s Recent search API, which allows us to access filtered public tweets posted over the last 7 days. The API response is paginated and returns a maximum of 100 tweets on each request in reverse chronological order. If there are more tweets to be consumed, the response of the previous request will return a token, which needs to be used in the next API call. After we receive the tweets from the API, we apply the following transformations:

Filter out the empty tweets (tweets without any text).
Partition the set of tweets by author ID. This helps distribute the processing to multiple subtasks in Apache Flink.
Apply a deduplication logic to only process tweets that haven’t been processed. For this, we store the already processed tweet ID in Apache Flink’s state and match and filter out the tweets that have already been processed. We store the tweets’ ID grouped by author ID, which can cause the state size of the application to increase. Because the API only provides tweets from the last 7 days when invoked, we have introduced a time-to-live (TTL) of 7 days so we don’t grow the application’s state indefinitely. You can modify this based on your requirements.
Convert tweets into JSON objects for a later Amazon Bedrock API invocation.

Create vector embeddings

The vector embeddings are created by invoking the Amazon Titan Embeddings model through the Amazon Bedrock API. Asynchronous invocations of external APIs are important performance considerations when building a stream processing architecture. Synchronous calls increase latency, reduce throughput, and can be a bottleneck for overall processing.

To invoke the Amazon Bedrock API, you will use the Amazon Bedrock Runtime dependency in Java, which provides an asynchronous client that allows us invoke Amazon Bedrock models asynchronously through the BedrockRuntimeAsyncClient. This is invoked to create the embeddings. For this we use Apache Flink’s async I/O to make asynchronous requests to external APIs. Apache Flink’s async I/O is a library within Apache Flink that allows you to write asynchronous, non-blocking operators for stream processing applications, enabling better utilization of resources and higher throughput. We provide the asynchronous function to be called, the timeout duration that determines how long an asynchronous operation can take before it’s considered failed, and the maximum number of requests that should be in progress at any point in time. Limiting the number of concurrent requests makes sure that the operator won’t accumulate an ever-growing backlog of pending requests. However, this can cause backpressure after the capacity is exhausted. Because we use the timestamp of creation when we ingest into OpenSearch Service and so order won’t affect our results, we can use Apache Flink’s async I/O unordered function, allowing us to have better throughput and performance. See the following code:

DataStream<JSONObject> resultStream = AsyncDataStream 
.unorderedWait(inputJSON, new BedRockEmbeddingModelAsyncTweetFunction(), 15000, TimeUnit.MILLISECONDS, 1000)
.uid("tweet-async-function");

Let’s have a closer look into the Apache Flink async I/O function. The following is within the CompletableFuture Java class:

First, we create the Amazon Bedrock Runtime async client:

BedrockRuntimeAsyncClient runtime = BedrockRuntimeAsyncClient.builder()
.region(Region.of(region))  // Use the specified AWS region 
.build();

We then extract the tweet for the event and build the payload that we will send to Amazon Bedrock:

String stringBody = jsonObject.getString("tweet");
 ArrayList<String> stringList = new ArrayList<>();  
stringList.add(stringBody);  
JSONObject jsonBody = new JSONObject()
.put("inputText", stringBody);  
SdkBytes body = SdkBytes.fromUtf8String(jsonBody.toString());

After we have the payload, we can call the InvokeModel API and invoke Amazon Titan to create the vector embeddings for the tweets:

InvokeModelRequest request = InvokeModelRequest.builder()         
.modelId("amazon.titan-embed-text-v1")         
.contentType("application/json")         
.accept("*/*")         
.body(body)         
.build();

CompletableFuture<InvokeModelResponse> futureResponse = runtime.invokeModel(request);

After receiving the vector, we append the following fields to the output JSONObject:
1. Cleaned tweet
2. Tweet creation timestamp
3. Number of likes of the tweet
4. Number of retweets
5. Number of views from the tweet (impressions)
6. Tweet ID

// Extract and process the response when it is available
JSONObject response = new JSONObject(
        futureResponse.join().body().asString(StandardCharsets.UTF_8)
);

// Add additional fields related to tweet data to the response
response.put("tweet", jsonObject.get("tweet"));
response.put("@timestamp", jsonObject.get("created_at"));
response.put("likes", jsonObject.get("likes"));
response.put("retweet_count", jsonObject.get("retweet_count"));
response.put("impression_count", jsonObject.get("impression_count"));
response.put("_id", jsonObject.get("_id"));

return response;

This will return the embeddings, original text, additional fields, and the number of tokens used for the embedding. In our connector, we are only consuming messages in English, as well as ignoring messages that are retweets from other tweets.

The same processing steps are replicated for messages coming from the My Social Media application (manually ingested).

Store vector embeddings in OpenSearch Service

We use OpenSearch Service as a vector database for semantic search. Before we can write the data into OpenSearch Service, we need to create an index that supports semantic search. We are using the k-NN plugin. The vector database index mapping should have the following properties for storing vectors for similarity search:

…

"embeddings": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }

…

The key parameters are as follows:

type – This specifies that the field will hold vector data for a k-NN similarity search. The value should be knn_vector.
dimension – The number of dimensions for each vector. This must match the model dimension. In this case we use 1,536 dimensions, the same as the Amazon Titan Text Embeddings v1 model.
method – Defines the algorithm and parameters for indexing and searching the vectors:
- name – The identifier for the nearest neighbor method. We use hierarchical navigable small worlds (HNSW)—a hierarchical proximity graph approach—to run a approximate k-NN (A-NN) search because standard k-NN is not a scalable approach.
- space_type – The vector space used to calculate the distance between vectors. It supports multiple space type. The default value is 12.
- engine – The approximate k-NN library to use for indexing and search. The available libraries are faiss, nmslib, and Lucene.
- ef_construction – The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed.
- m – The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2–100.

Standard k-NN search methods compute similarity using a brute-force approach that measures the nearest distance between a query and a number of points, which produces exact results. This works well for most applications. However, in the case of extremely large datasets with high dimensionality, this creates a scaling problem that reduces the efficiency of the search. The approximate k-NN search methods used by OpenSearch Service use approximate nearest neighbor (ANN) algorithms from the nmslib, faiss, and Lucene libraries to power k-NN search. These search methods employ ANN to improve search latency for large datasets. Of the three search methods the k-NN plugin provides, this method offers the best search scalability for large datasets. This approach is the preferred method when a dataset reaches hundreds of thousands of vectors. For more information about the different methods and their trade-offs, refer to Comprehensive Guide To Approximate Nearest Neighbors Algorithms.

To use the k-NN plugin’s approximate search functionality, we must first create a k-NN index with index.knn set to true:

    "settings" : {
      "index" : {
        "knn": true,
        "number_of_shards" : "5",
        "number_of_replicas" : "1"
      }
    }

After we have our indexes created, we can sink the data from our Apache Flink application into OpenSearch.

RetrievalQA using Lambda and LangChain implementation

For this part, we take an input question from the user and invoke a Lambda function. The Lambda function retrieves relevant tweets from OpenSearch Service as context and generates an answer using the LangChain RAG chain RetrievalQA. LangChain is a framework for developing applications powered by language models.

First, some setup. We instantiate the bedrock-runtime client that will allow the Lambda function to invoke the models:

bedrock_runtime = boto3.client("bedrock-runtime", "us-east-1")

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)

The BedrockEmbeddings class uses the Amazon Bedrock API to generate embeddings for the user’s input question. It strips new line characters from the text. Notice that we need to pass as an argument the instantiation of the bedrock_runtime client and the model ID for the Amazon Titan Text Embeddings v1 model.

Next, we instantiate the client for the OpenSearchVectorSeach LangChain class that will allow the Lambda function to connect to the OpenSearch Service domain and perform the semantic search against the previously indexed X embeddings. For the embedding function, we’re passing the embeddings model that we defined previously. This will be used during the LangChain orchestration process:

os_client = OpenSearchVectorSearch(
        index_name=aos_index,
        embedding_function=embeddings,
        http_auth=(os.environ['aosUser'], os.environ['aosPassword']),
        opensearch_url=os.environ['aosDomain'],
        timeout=300,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        )

We need to define the LLM model from Amazon Bedrock to use for text generation. The temperature is set to 0 to reduce hallucinations:

model_kwargs={"temperature": 0, "max_tokens": 4096}

llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=bedrock_runtime,
    model_kwargs=model_kwargs
)

Next, in our Lambda function, we create the prompt to instruct the model on the specific task of analyzing hundreds of tweets in the context. To normalize the output, we use a prompt engineering technique called few-shot prompting. Few-shot prompting allows language models to learn and generate responses based on a small number of examples or demonstrations provided in the prompt itself. In this approach, instead of training the model on a large dataset, we provide a few examples of the desired task or output within the prompt. These examples serve as a guide or conditioning for the model, enabling it to understand the context and the desired format or pattern of the response. When presented with a new input after the examples, the model can then generate an appropriate response by following the patterns and context established by the few-shot demonstrations in the prompt.

As part of the prompt, we then provide examples of questions and answers, so the chatbot can follow the same pattern when used (see the Lambda function to view the complete prompt):

template = """As a helpful agent that is an expert analysing tweets, please answer the question using only the provided tweets from the context in <context></context> tags. If you don't see valuable information on the tweets provided in the context in <context></context> tags, say you don't have enough tweets related to the question. Cite the relevant context you used to build your answer. Print in a bullet point list the top most influential tweets from the context at the end of the response.
    
    Find below some examples:
    <example1>
    question: 
    What are the main challenges or concerns mentioned in tweets about using Bedrock as a generative AI service on AWS, and how can they be addressed?
    
    answer:
    Based on the tweets provided in the context, the main challenges or concerns mentioned about using Bedrock as a generative AI service on AWS are:

1.	...
2.	...
3.	...
4.	...
...
    
    To address these concerns:

1.	...
2.	...
3.	...
4.	...
...

    Top tweets from context:

    [1] ...
    [2] ...
    [3] ...
    [4] ...

    </example1>
    
    <example2>
    ...
    </example2>
    
    Human: 
    
    question: {question}
    
    <context>
    {context}
    </context>
    
    Assistant:"""

    prompt = PromptTemplate(input_variables=["context","question"], template=template)

We then create the RetrievalQA LangChain chain using the prompt template, Anthropic Claude on Amazon Bedrock, and the OpenSearch Service retriever configured previously. The RetrievalQA LangChain chain will orchestrate the following RAG steps:

Invoke the text embedding model to create a vector for the user’s question
Perform a semantic search on OpenSearch Service using the vector to retrieve the relevant tweets to the user’s question (k=200)
Invoke the LLM model using the augmented prompt containing the prompt template, context (stuffed retrieved tweets) and question

chain = RetrievalQA.from_chain_type(
    llm=llm,
    verbose=True,
    chain_type="stuff",
    retriever=os_client.as_retriever(
        search_type="similarity",
        search_kwargs={
            "k": 200, 
            "space_type": "l2", 
            "vector_field": "embeddings", 
            "text_field": text_field
        }
    ),
    chain_type_kwargs = {"prompt": prompt}
)

Finally, we run the chain:

answer = chain.invoke({"query": message})

The response from the LLM is sent back to the user application. As shown in the following screenshot:

Considerations

You can extend the solution provided in this post. When you do, consider the following suggestions:

Configure index retention and rollover in OpenSearch Service to manage index lifecycle and data retention effectively
Incorporate chat history into the chatbot to provide richer context and improve the relevance of LLM responses
Add filters and hybrid search with the possibility to modify the weight given to the keyword and semantic search to enhance search on RAG
Modify the TTL for Apache Flink’s state to match your requirements (the solution in this post uses 7 days)
Enable logging to API Gateway and in the Streamlit application.

Summary

This post demonstrates how to combine real-time analytics with generative AI capabilities to analyze tweets related to a brand, product, or topic of interest. It uses Amazon Managed Service for Apache Flink to process tweets from the X API, create vector embeddings using the Amazon Titan Embeddings model on Amazon Bedrock, and store the embeddings in an OpenSearch Service index configured for vector similarity search—all these steps happen in real time.

The post also explains how users can input queries through a Streamlit frontend application, which invokes a Lambda function. This Lambda function retrieves relevant tweets from OpenSearch Service by performing semantic search on the stored embeddings using the LangChain RetrievalQA chain. As a result, it generates insightful answers using the Anthropic Claude LLM on Amazon Bedrock.

The solution enables identifying trends, conducting sentiment analysis, detecting nuances, addressing concerns, guiding product development, and creating targeted customer segments based on real-time X data.

To get started with generative AI, visit Generative AI on AWS for information about industry use cases, tools to build and scale generative AI applications, as well as the post Exploring real-time streaming for generative AI Applications for other use cases for streaming with generative AI.

About the Authors

Francisco Morillo is a Streaming Solutions Architect at AWS, specializing in real-time analytics architectures. With over five years in the streaming data space, Francisco has worked as a data analyst for startups and as a big data engineer for consultancies, building streaming data pipelines. He has deep expertise in Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon Managed Service for Apache Flink. Francisco collaborates closely with AWS customers to build scalable streaming data solutions and advanced streaming data lakes, ensuring seamless data processing and real-time insights.

Sergio Garcés Vitale is a Senior Solutions Architect at AWS, passionate about generative AI. With over 10 years of experience in the telecommunications industry, where he helped build data and observability platforms, Sergio now focuses on guiding Retail and CPG customers in their cloud adoption, as well as customers across all industries and sizes in implementing Artificial Intelligence use cases.

Subham Rakshit is a Streaming Specialist Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build search and streaming data platforms that help them achieve their business objective. Outside of work, he enjoys spending time solving jigsaw puzzles with his daughter.

AWS Weekly Roundup: Claude 3.5 Sonnet in Amazon Bedrock, CodeCatalyst updates, SageMaker with MLflow, and more (June 24, 2024)

2024-06-24 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-3-5-sonnet-in-amazon-bedrock-codecatalyst-updates-sagemaker-with-mlflow-and-more-june-24-2024/

This week, I had the opportunity to try the new Anthropic Claude 3.5 Sonnet model in Amazon Bedrock just before it launched, and I was really impressed by its speed and accuracy! It was also the week of AWS Summit Japan; here’s a nice picture of the busy AWS Community stage.

Last week’s launches
With many new capabilities, from recommendations on the size of your Amazon Relational Database Services (Amazon RDS) databases to new built-in transformations in AWS Glue, here’s what got my attention:

Amazon Bedrock – Now supports Anthropic’s Claude 3.5 Sonnet and compressed embeddings from Cohere Embed.

AWS CodeArtifact – With support for Rust packages with Cargo, developers can now store and access their Rust libraries (known as crates).

Amazon CodeCatalyst – Many updates from this unified software development service. You can now assign issues in CodeCatalyst to Amazon Q and direct it to work with source code hosted in GitHub Cloud and Bitbucket Cloud and ask Amazon Q to analyze issues and recommend granular tasks. These tasks can then be individually assigned to users or to Amazon Q itself. You can now also use Amazon Q to help pick the best blueprint for your needs. You can now securely store, publish, and share Maven, Python, and NuGet packages. You can also link an issue to other issues. This allows customers to link issues in CodeCatalyst as blocked by, duplicate of, related to, or blocks another issue. You can now configure a single CodeBuild webhook at organization or enterprise level to receive events from all repositories in your organizations, instead of creating webhooks for each individual repository. Finally, you can now add a default IAM role to an environment.

Amazon EC2 – C7g and R7g instances (powered by AWS Graviton3 processors) are now available in Europe (Milan), Asia Pacific (Hong Kong), and South America (São Paulo) Regions. C7i-flex instances are now available in US East (Ohio) Region.

AWS Compute Optimizer – Now provides rightsizing recommendations for Amazon RDS MySQL, and RDS PostgreSQL. More info in this Cloud Financial Management blog post.

Amazon OpenSearch Service – With JSON Web Token (JWT) authentication and authorization, it’s now easier to integrate identity providers and isolate tenants in a multi-tenant application.

Amazon SageMaker – Now helps you manage machine learning (ML) experiments and the entire ML lifecycle with a fully managed MLflow capability.

AWS Glue – The serverless data integration service now offers 13 new built-in transforms: flag duplicates in column, format Phone Number, format case, fill with mode, flag duplicate rows, remove duplicates, month name, iIs even, cryptographic hash, decrypt, encrypt, int to IP, and IP to int.

Amazon MWAA – Amazon Managed Workflows for Apache Airflow (MWAA) now supports custom domain names for the Airflow web server, allowing to use private web servers with load balancers, custom DNS entries, or proxies to point users to a user-friendly web address.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

AWS re:Inforce 2024 re:Cap – A summary of our annual, immersive, cloud-security learning event by my colleague Wojtek.

Three ways Amazon Q Developer agent for code transformation accelerates Java upgrades – This post offers interesting details on how Amazon Q Developer handles major version upgrades of popular frameworks, replacing deprecated API calls on your behalf, and explainability on code changes.

Five ways Amazon Q simplifies AWS CloudFormation development – For template code generation, querying CloudFormation resource requirements, explaining existing template code, understanding deployment options and issues, and querying CloudFormation documentation.

Improving air quality with generative AI – A nice solution that uses artificial intelligence (AI) to standardize air quality data, addressing the air quality data integration problem of low-cost sensors.

Deploy a Slack gateway for Amazon Bedrock – A solution bringing the power of generative AI directly into your Slack workspace.

An agent-based simulation of Amazon’s inbound supply chain – Simulating the entire US inbound supply chain, including the “first-mile” of distribution and tracking the movement of hundreds of millions of individual products through the network.

AWS CloudFormation Linter (cfn-lint) v1 – This upgrade is particularly significant because it converts from using the CloudFormation spec to using CloudFormation registry resource provider schemas.

A practical approach to using generative AI in the SDLC – Learn how an AI assistant like Amazon Q Developer helps my colleague Jenna figure out what to build and how to build it.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. This week, you can join the AWS Summit in Washington, DC, June 26–27. Learn here about future AWS Summit events happening in your area.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. This week there are AWS Community Days in Switzerland (June 27), Sri Lanka (June 27), and the Gen AI Edition in Ahmedabad, India (June 29).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Anthropic’s Claude 3.5 Sonnet model now available in Amazon Bedrock: Even more intelligence than Claude 3 Opus at one-fifth the cost

2024-06-20 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/anthropics-claude-3-5-sonnet-model-now-available-in-amazon-bedrock-the-most-intelligent-claude-model-yet/

It’s been just 3 months since Anthropic launched Claude 3, a family of state-of-the-art artificial intelligence (AI) models that allows you to choose the right combination of intelligence, speed, and cost that suits your needs.

Today, Anthropic introduced Claude 3.5 Sonnet, its first release in the forthcoming Claude 3.5 model family. We are happy to announce that Claude 3.5 Sonnet is now available in Amazon Bedrock.

Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming other generative AI models on a wide range of evaluations, including Anthropic’s previously most intelligent model, Claude 3 Opus. Claude 3.5 Sonnet is available with the speed and cost of the original Claude 3 Sonnet model. In fact, you can now get intelligence and speed better than Claude 3 Opus at one-fifth of the price because Claude 3.5 Sonnet is 80 percent cheaper than Opus.

The frontier intelligence displayed by Claude 3.5 Sonnet combined with cost-effective pricing, makes the model ideal for complex tasks such as context-sensitive customer support, orchestrating multi-step workflows, and streamlining code translations.

Claude 3.5 Sonnet sets new industry benchmarks for undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), code (HumanEval), and more. As you can see in the following table, according to Anthropic, Claude 3.5 Sonnet outperforms OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro in nearly every benchmark.

Claude 3.5 Sonnet is also Anthropic’s strongest vision model yet, performing an average of 10 percent better than Claude 3 Opus across the majority of vision benchmarks. According to Anthropic, Claude 3.5 Sonnet also outperforms other generative AI models in nearly every category.

Anthropic’s Claude 3.5 Sonnet key improvements
The release of Claude 3.5 Sonnet brings significant improvements across multiple domains, empowering software developers and businesses with new generative AI-powered capabilities. Here are some of the key strengths of this new model:

Visual processing and understanding – Claude 3.5 Sonnet demonstrates remarkable capabilities in processing images, particularly in interpreting charts and graphs. It accurately transcribes text from imperfect images, a core capability for industries such as retail, logistics, and financial services, to gather more insights from graphics or illustrations than from text alone. Use Claude 3.5 Sonnet to automate visual data processing tasks, extract valuable information, and enhance data analysis pipelines.

Writing and content generation – Claude 3.5 Sonnet represents a significant leap in its ability to understand nuance and humor. The model produces high-quality written content with a more natural, human tone that feels more authentic and relatable. Use the model to generate engaging and compelling content, streamline your writing workflows, and enhance your storytelling capabilities.

Customer support and natural language processing – With its improved understanding of context and multistep workflow orchestration, Claude 3.5 Sonnet excels at handling intricate customer inquiries. This capability enables round-the-clock support, faster response times, and more natural-sounding interactions, ultimately leading to improved customer satisfaction. Use this model to automate and enhance customer support processes and provide a seamless experience for end users. For an example of a similar implementation, see how DoorDash built a generative AI self-service contact center solution using Anthropic’s Claude 3 models in Amazon Bedrock.

Analytics and insights – Claude 3.5 Sonnet augments human expertise in data science by effortlessly navigating unstructured data and using multiple tools to generate insights. It produces high-quality statistical visualizations and actionable predictions, ranging from business strategies to real-time product trends. Use Claude 3.5 Sonnet to simplify data analysis workflows, uncover valuable insights, and drive data-driven decision-making.

Coding and software development – Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities when provided with the relevant tools. Use Claude 3.5 Sonnet to streamline developer workflows, accelerate coding tasks, reduce manual effort, and enhance overall productivity.

Using Anthropic’s Claude 3.5 Sonnet in the Amazon Bedrock console
Because this is a new model, I go to the Amazon Bedrock console and choose Model access from the navigation pane to enable access to Claude 3.5 Sonnet.

Now that I have access to the model, I’d like to use it to extract information from the following image from the Our World in Data website:

I choose Chat in the Playgrounds section of the navigation pane and select Anthropic and Claude 3.5 Sonnet as the model. I then choose the Image upload button to upload the previous image, type the following prompt, and choose Run:

Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON.

Here’s the result and an animated image recapping what I did.

To determine which countries consume more than 1,000 TWh from hydropower in 2022, I need to carefully examine the map and color-coding. The scale at the bottom shows that dark blue represents consumption between 1,000 TWh and 2,000 TWh.

Let's look at each region systematically:

1. North America: Canada is dark blue, indicating it consumes between 1,000-2,000 TWh from hydropower.
2. South America: Brazil is dark blue, also consuming between 1,000-2,000 TWh.
3. Europe: No country is dark blue; all are light green or have no data.
4. Africa: No data available for any country.
5. Asia: China is dark blue, indicating consumption between 1,000-2,000 TWh.
6. Oceania: No country is dark blue; Australia is light green.

In summary, there are three countries that consume more than 1,000 TWh from hydropower in 2022:
1. Canada (North America)
2. Brazil (South America)
3. China (Asia)

Here's the response in JSON format:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    {
      "name": "Canada",
      "region": "North America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "Brazil",
      "region": "South America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "China",
      "region": "Asia",
      "consumption_range": "1,000-2,000 TWh"
    }
  ]
}
```

These three countries stand out as the highest consumers of hydropower energy, each using between 1,000 and 2,000 terawatt-hours in 2022.

The model’s ability to reliably extract information from unstructured data, like images, opens up a world of new possibilities.

I choose the three small dots in the corner of the playground window and then View API request to see code examples using the model in the AWS Command Line Interface (AWS CLI) and AWS SDKs. Let’s have a better look at the code syntax.

Using Claude 3.5 Sonnet with AWS SDKs
You can use Claude 3.5 Sonnet with any AWS SDK using the new Amazon Bedrock Converse API or Anthropic Claude Messages API.

To update code already using a Claude 3 model, I just need to replace the model ID with:

anthropic.claude-3-5-sonnet-20240620-v1:0

Here’s a sample implementation with the AWS SDK for Python (Boto3) using the same image as before to show how to use images and text with the Converse API.

import boto3
from botocore.exceptions import ClientError

MODEL_ID = "anthropic.claude-3-5-sonnet-20240620-v1:0"

IMAGE_NAME = "primary-energy-hydro.png"

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

When I run it, I get a similar output as in the console:

Let's approach this step-by-step:

1. First, I'll examine the color scale at the bottom of the map. The darkest blue color represents consumption of 2,000 TWh or more.

2. Now, I'll scan the map region by region:

   North America: Canada is dark blue, indicating over 1,000 TWh.
   South America: Brazil is also dark blue, over 1,000 TWh.
   Europe: No country appears to be dark blue.
   Africa: No country appears to be dark blue.
   Asia: China stands out as dark blue, indicating over 1,000 TWh.
   Oceania: No country appears to be dark blue.

3. To be thorough, I'll double-check for any medium blue countries that might be close to or over 1,000 TWh, but I don't see any that appear to reach that threshold.

4. Based on this analysis, there are three countries that clearly consume more than 1,000 TWh from hydropower.

Now, I'll format the answer in JSON:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    "Canada",
    "Brazil",
    "China"
  ]
}
```

This JSON output lists the three countries that visually appear to consume more than 1,000 TWh of primary energy from hydropower according to the 2022 data presented in the map.

Because I didn’t specify a JSON syntax, the two answers use a different format. In your applications, you can describe in the prompt the JSON properties you want or provide a sample to get a standard format in output.

For more examples, see the code samples in the Amazon Bedrock User Guide. For a more advanced use case, here’s a fully functional tool use demo illustrating how to connect a generative AI model with a custom tool or API.

Using Claude 3.5 Sonnet with the AWS CLI
There are times when nothing beats the speed of the command line. This is how you can use the AWS CLI with the new model:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20240620-v1:0 \
    --messages '{"role": "user", "content": [{"text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"}]}' \
    --region us-east-1
    --query output.message.content

In the output, I use the query option to only get the content of the output message:

[
    {
        "text": "Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters."
    }
]

I copy the text into a small Python program to see it printed on multiple lines:

print("Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters.")

Let's approach this step-by-step:

1. First, we need to understand the relationships:
   - Alice has N brothers
   - Alice has M sisters

2. Now, let's consider Alice's brother:
   - He is one of Alice's N brothers
   - He has the same parents as Alice

3. This means that Alice's brother has:
   - The same sisters as Alice
   - One sister more than Alice (because Alice herself is his sister)

4. Therefore, the number of sisters Alice's brother has is:
   M + 1

   Where M is the number of sisters Alice has.

So, the answer is: Alice's brother has M + 1 sisters.

Even if this was a quite nuanced question, Claude 3.5 Sonnet got it right and described its reasoning step by step.

Things to know
Anthropic’s Claude 3.5 Sonnet is available in Amazon Bedrock today in the US East (N. Virginia) AWS Region. More information on Amazon Bedrock model support by Region is available in the documentation. View the Amazon Bedrock pricing page to determine the costs for your specific use case.

By providing access to a faster and more powerful model at a lower cost, Claude 3.5 Sonnet makes generative AI easier and more effective to use for many industries, such as:

Healthcare and life sciences – In the medical field, Claude 3.5 Sonnet shows promise in enhancing imaging analysis, acting as a diagnostic assistant for patient triage, and summarizing the latest research findings in an easy-to-digest format.

Financial services – The model can provide valuable assistance in identifying financial trends and creating personalized debt repayment plans tailored to clients’ unique situations.

Legal – Law firms can use the model to accelerate legal research by quickly surfacing relevant precedents and statutes. Additionally, the model can increase paralegal efficiency through contract analysis and assist with drafting standard legal documents.

Media and entertainment – The model can expedite research for journalists, support the creative process of scriptwriting and character development, and provide valuable audience sentiment analysis.

Technology – For software developers, Claude 3.5 Sonnet offers opportunities in rapid application prototyping, legacy code migration, innovative feature ideation, user experience optimization, and identification of friction points.

Education – Educators can use the model to streamline grant proposal writing, develop comprehensive curricula incorporating emerging trends, and receive research assistance through database queries and insight generation.

It’s an exciting time for for generative AI. To start using this new model, see the Anthropic Claude models section of the Amazon Bedrock User Guide. You can also visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you do with these enhanced capabilities!

— Danilo

AWS Weekly Roundup: New AWS Heroes, Amazon API Gateway, Amazon Q and more (June 10, 2024)

2024-06-10 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-aws-heroes-amazon-api-gateway-amazon-q-and-more-june-10-2024/

In the last AWS Weekly Roundup, Channy reminded us on how life has ups and downs. It’s just how life is. But, that doesn’t mean that we should do it alone. Farouq Mousa, AWS Community Builder, is fighting brain cancer and Allen Helton, AWS Serverless Hero, his daughter is fighting leukemia.

If you have a moment, please visit their campaign pages and give your support.

Meanwhile, we’ve just finished a few AWS Summits in India, Korea and also Thailand. As always, I had so much fun working together at Developer Lounge with AWS Heroes, AWS Community Builders, and AWS User Group leaders. Here’s a photo from everyone here.

Last Week’s Launches
Here are some launches that caught my attention last week:

Welcome, new AWS Heroes! — Last week, we just announced new cohort for AWS Heroes, worldwide group of AWS experts who go above and beyond to share knowledge and empower their communities.

Amazon API Gateway increased integration timeout limit — If you’re using Regional REST APIs and private REST APIs in Amazon API Gateway, now you can increase the integration timeout limit greater than 29 seconds. This allows you to run various workloads requiring longer timeouts.

Amazon Q offers inline completion in the command line — Now, Amazon Q Developer provides real-time AI-generated code suggestions as you type in your command line. As a regular command line interface (CLI) user, I’m really excited about this.

New common control library in AWS Audit Manager — This announcement helps you to save time when mapping enterprise controls into AWS Audit Manager. Check out Danilo’s post where he elaborated how that you can simplify risk and complicance assessment with the new common control library.

Amazon Inspector container image scanning for Amazon CodeCatalyst and GitHub actions — If you need to integrate your CI/CD with software vulnerabilities checking, you can use Amazon Inspector. Now, with this native integration in GitHub actions and Amazon CodeCatalyst, it streamlines your development pipeline process.

Ingest streaming data with Amazon OpenSearch Ingestion and Amazon Managed Streaming for Apache Kafka — With this new capability, now you can build more efficient data pipelines for your complex analytics use cases. Now, you can seamlessly index the data from your Amazon MSK Serverless clusters in Amazon OpenSearch service.

Amazon Titan Text Embeddings V2 now available in Amazon Bedrock Knowledge Base — You now can embed your data into a vector database using Amazon Titan Text Embeddings V2. This will be helpful for you to retrieve relevant information for various tasks.

Max tokens	8,192
Languages	100+ in pre-training
Fine-tuning supported	No
Normalization supported	Yes
Vector size	256, 512, 1,024 (default)

From Community.aws
Here’s my 3 personal favorites posts from community.aws:

Upcoming AWS events
Check your calendars and sign up for these AWS and AWS Community events:

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Japan (June 20), Washington, DC (June 26–27), and New York (July 10).
AWS re:Inforce — Join us for AWS re:Inforce (June 10–12) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. Connect with the AWS teams that build the security tools and meet AWS customers to learn about their security journeys.
AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), New Zealand (August 15), Nigeria (August 24), and New York (August 28).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Donnie

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS Weekly Roundup: Amazon EC2 U7i Instances, Bedrock Converse API, AWS World IPv6 Day and more (June 3, 2024)

2024-06-03 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-ec2-u7i-instances-bedrock-converse-api-aws-world-ipv6-day-and-more-june-3-2024/

Life is not always happy, there are difficult times. However, we can share our joys and sufferings with those we work with. The AWS Community is no exception.

Jeff Barr introduced two members of the AWS community who are dealing with health issues. Farouq Mousa is an AWS Community Builder and fighting brain cancer. Allen Helton is an AWS Serverless Hero and his young daughter is fighting leukemia.

Please donate to support Farauq and Olivia, Allen’s daughter to overcome their disease.

Last week’s launches
Here are some launches that got my attention:

Amazon EC2 high memory U7i Instances – These instances with up to 32 TiB of DDR5 memory and 896 vCPUs are powered by custom fourth generation Intel Xeon Scalable Processors (Sapphire Rapids). These high memory instances are designed to support large, in-memory databases including SAP HANA, Oracle, and SQL Server. To learn more, visit Jeff’s blog post.

New Amazon Connect analytics data lake – You can use a single source for contact center data including contact records, agent performance, Contact Lens insights, and more — eliminating the need to build and maintain complex data pipelines. Your organization can create your own custom reports using Amazon Connect data or combine data queried from third-party sources. To learn more, visit Donnie’s blog post.

Amazon Bedrock Converse API – This API provides developers a consistent way to invoke Amazon Bedrock models removing the complexity to adjust for model-specific differences such as inference parameters. With this API, you can write a code once and use it seamlessly with different models in Amazon Bedrock. To learn more, visit Dennis’s blog post to get started.

New Document widget for PartyRock – You can build, use, and share generative AI-powered apps for fun and for boosting personal productivity, using PartyRock. Its widgets display content, accept input, connect with other widgets, and generate outputs like text, images, and chats using foundation models. You can now use new document widget to integrate text content from files and documents directly into a PartyRock app.

30 days of alarm history in Amazon CloudWatch – You can view the history of your alarm state changes for up to 30 days prior. Previously, CloudWatch provided 2 weeks of alarm history. This extended history makes it easier to observe past behavior and review incidents over a longer period of time. To learn more, visit the CloudWatch alarms documentation section.

10x faster startup time in Amazon SageMaker Canvas – You can launch SageMaker Canvas in less than a minute and get started with your visual, no-code interface for machine learning 10x faster than before. Now, all new user profiles created in existing or new SageMaker domains can experience this accelerated startup time.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and a Twitch show that you might find interesting:

Let us manage your relational database! – Jeff Barr ran a poll to better understand why some AWS customers still choose to host their own databases in the cloud. Working backwards, he highlights four issues that AWS managed database services address. Consider these before hosting your own database.

Amazon Bedrock Serverless Prompt Chaining – This repository provides examples of using AWS Step Functions and Amazon Bedrock to build complex, serverless, and highly scalable generative AI applications with prompt chaining.

AWS Merch Store Spring Sale – Do you want to buy AWS branded t-shirts, hats, bags, and so on? Get 15% off on all items now through June 7th.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS World IPv6 Day — Join us a free in-person celebration event on June 6, for technical presentations from AWS experts plus a workshop and whiteboarding session. You will learn how to get started with IPv6 and hear from customers who have started on the journey of IPv6 adoption. Check out your near city: San Francisco, Seattle, New York, London, Mumbai, Bangkok, Singapore, Kuala Lumpur, Beijing, Manila, and Sydney.

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Stockholm (June 4), Madrid (June 5), and Washington, DC (June 26–27).

AWS re:Inforce — Join us for AWS re:Inforce (June 10–12) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. Connect with the AWS teams that build the security tools and meet AWS customers to learn about their security journeys.

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), New Zealand (August 15), Nigeria (August 24), and New York (August 28).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Channy

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Build a decentralized semantic search engine on heterogeneous data stores using autonomous agents

2024-05-28 Dhaval Shah

Post Syndicated from Dhaval Shah original https://aws.amazon.com/blogs/big-data/build-a-decentralized-semantic-search-engine-on-heterogeneous-data-stores-using-autonomous-agents/

Large language models (LLMs) such as Anthropic Claude and Amazon Titan have the potential to drive automation across various business processes by processing both structured and unstructured data. For example, financial analysts currently have to manually read and summarize lengthy regulatory filings and earnings transcripts in order to respond to Q&A on investment strategies. LLMs could automate the extraction and summarization of key information from these documents, enabling analysts to query the LLM and receive reliable summaries. This would allow analysts to process the documents to develop investment recommendations faster and more efficiently. Anthropic Claude and other LLMs on Amazon Bedrock can bring new levels of automation and insight across many business functions that involve both human expertise and access to knowledge spread across an organization’s databases and content repositories.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

In this post, we show how to build a Q&A bot with RAG (Retrieval Augmented Generation). RAG uses data sources like Amazon Redshift and Amazon OpenSearch Service to retrieve documents that augment the LLM prompt. For getting data from Amazon Redshift, we use the Anthropic Claude 2.0 on Amazon Bedrock, summarizing the final response based on pre-defined prompt template libraries from LangChain. To get data from Amazon OpenSearch Service, we chunk, and convert the source data chunks to vectors using Amazon Titan Text Embeddings model.

For client interaction we use Agent Tools based on ReAct. A ReAct prompt consists of few-shot task-solving trajectories, with human-written text reasoning traces and actions, as well as environment observations in response to actions. In this example, we use ReAct for zero-shot training to generate responses to fit in a pre-defined template. The additional information is concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time.

Solution overview

Our solution demonstrates how financial analysts can use generative artificial intelligence (AI) to adapt their investment recommendations based on financial reports and earnings transcripts with RAG to use LLMs to generate factual content.

The hybrid architecture uses multiple databases and LLMs, with foundation models from Amazon Bedrock for data source identification, SQL generation, and text generation with results. In the following architecture, Steps 1 and 2 represent data ingestion to be done by data engineering in batch mode. Steps 3, 4, and 5 are the queries and response formation.

The following diagram shows a more detailed view of the Q&A processing chain. The user asks a question, and LangChain queries the Redshift and OpenSearch Service data stores for relevant information to build the prompt. It sends the prompt to the Anthropic Claude on Amazon Bedrock model, and returns the response.

The details of each step are as follows:

Populate the Amazon Redshift Serverless data warehouse with company stock information stored in Amazon Simple Storage Service (Amazon S3). Redshift Serverless is a fully functional data warehouse holding data tables maintained in real time.
Load the unstructured data from your S3 data lake to OpenSearch Service to create an index to store and perform semantic search. The LangChain library loads knowledge base documents, splits the documents into smaller chunks, and uses Amazon Titan to generate embeddings for chunks.
The client submits a question via an interface like a chatbot or website.
You will create multiple steps to transform a user query passed from Amazon SageMaker Notebook to execute API calls to LLMs from Amazon Bedrock. Use LLM-based Agents to generate SQL from Text and then validate if query is relevant to data warehouse tables. If yes, run query to extract information. The LangChain library calls Amazon Titan embeddings to generate a vector for the user’s question. It calls OpenSearch vector search to get similar documents.
LangChain calls Anthropic Claude on Amazon Bedrock model with the additional, retrieved knowledge as context, to generate an answer for the question. It returns generated content to client

In this deployment, you will choose Amazon Redshift Serverless, use Anthropic Claude 2.0 model on Amazon Bedrock and Amazon Titan Text Embeddings model. Overall spend for the deployment will be directly proportional to number of input/output tokens for Amazon Bedrock models, Knowledge base volume, usage hours and so on.

To deploy the solution, you need two datasets: SEC Edgar Annual Financial Filings and Stock pricing data. To join these datasets for analysis, you need to choose Stock Symbol as the join key. The provided AWS CloudFormation template deploys the datasets required for this post, along with the SageMaker notebook.

Prerequisites

To follow along with this post, you should have an AWS account with AWS Identity and Access Management (IAM) user credentials to deploy AWS services.

Deploy the chat application using AWS CloudFormation

To deploy the resources, complete the following steps:

Deploy the following CloudFormation template to create your stack in the us-east-1 AWS Region.The stack will deploy an OpenSearch Service domain, Redshift Serverless endpoint, SageMaker notebook, and other services like VPC and IAM roles that you will use in this post. The template sets a default user name password for the OpenSearch Service domain, and sets up a Redshift Serverless admin. You can choose to modify them or use the default values.
On the AWS CloudFormation console, navigate to the stack you created.
On the Outputs tab, choose the URL for SageMakerNotebookURL to open the notebook.
In Jupyter, choose semantic-search-with-amazon-opensearch, thenblog, then the LLM-Based-Agentfolder.
Open the notebook Generative AI with LLM based autonomous agents augmented with structured and unstructured data.ipynb.
Follow the instructions in the notebook and run the code sequentially.

Run the notebook

There are six major sections in the notebook:

Prepare the unstructured data in OpenSearch Service – Download the SEC Edgar Annual Financial Filings dataset and convert the company financial filing document into vectors with Amazon Titan Text Embeddings model and store the vector in an Amazon OpenSearch Service vector database.
Prepare the structured data in a Redshift database – Ingest the structured data into your Amazon Redshift Serverless table.
Query the unstructured data in OpenSearch Service with a vector search – Create a function to implement semantic search with OpenSearch Service. In OpenSearch Service, match the relevant company financial information to be used as context information to LLM. This is unstructured data augmentation to the LLM.
Query the structured data in Amazon Redshift with SQLDatabaseChain – Use the LangChain library LLM text to SQL to query company stock information stored in Amazon Redshift. The search result will be used as context information to the LLM.
Create an LLM-based ReAct agent augmented with data in OpenSearch Service and Amazon Redshift – Use the LangChain library to define a ReAct agent to judge whether the user query is stock- or investment-related. If the query is stock related, the agent will query the structured data in Amazon Redshift to get the stock symbol and stock price to augment context to the LLM. The agent also uses semantic search to retrieve relevant financial information from OpenSearch Service to augment context to the LLM.
Use the LLM-based agent to generate a final response based on the template used for zero-shot training – The following is a sample user flow for a stock price recommendation for the query, “Is ABC a good investment choice right now.”

Example questions and responses

In this section, we show three example questions and responses to test our chatbot.

Example 1: Historical data is available

In our first test, we explore how the bot responds to a question when historical data is available. We use the question, “Is [Company Name] a good investment choice right now?” Replace [Company Name] with a company you want to query.

This is a stock-related question. The company stock information is in Amazon Redshift and the financial statement information is in OpenSearch Service. The agent will run the following process:

Determine if this is a stock-related question.
Get the company name.
Get the stock symbol from Amazon Redshift.
Get the stock price from Amazon Redshift.
Use semantic search to get related information from 10k financial filing data from OpenSearch Service.

response = zero_shot_agent("\n\nHuman: Is {company name} a good investment choice right now? \n\nAssistant:")

The output may look like the following:

Final Answer: Yes, {company name} appears to be a good investment choice right now based on the stable stock price, continued revenue and earnings growth, and dividend payments. I would recommend investing in {company name} stock at current levels.

You can view the final response from the complete chain in your notebook.

Example 2: Historical data is not available

In this next test, we see how the bot responds to a question when historical data is not available. We ask the question, “Is Amazon a good investment choice right now?”

This is a stock-related question. However, there is no Amazon stock price information in the Redshift table. Therefore, the bot will answer “I cannot provide stock analysis without stock price information.” The agent will run the following process:

Determine if this is a stock-related question.
Get the company name.
Get the stock symbol from Amazon Redshift.
Get the stock price from Amazon Redshift.

response = zero_shot_agent("\n\nHuman: Is Amazon a good investment choice right now? \n\nAssistant:")

The output looks like the following:

Final Answer: I cannot provide stock analysis without stock price information.

Example 3: Unrelated question and historical data is not available

For our third test, we see how the bot responds to an irrelevant question when historical data is not available. This is testing for hallucination. We use the question, “What is SageMaker?”

This is not a stock-related query. The agent will run the following process:

Determine if this is a stock-related question.

response = zero_shot_agent("\n\nHuman: What is SageMaker? \n\nAssistant:")

The output looks like the following:

Final Answer: What is SageMaker? is not a stock related query.

This was a simple RAG-based ReAct chat agent analyzing the corpus from different data stores. In a realistic scenario, you might choose to further enhance the response with restrictions or guardrails for input and output like filtering harsh words for robust input sanitization, output filtering, conversational flow control, and more. You may also want to explore the programmable guardrails to LLM-based conversational systems.

Clean up

To clean up your resources, delete the CloudFormation stack llm-based-agent.

Conclusion

In this post, you explored how LLMs play a part in answering user questions. You looked at a scenario for helping financial analysts. You could employ this methodology for other Q&A scenarios, like supporting insurance use cases, by quickly contextualizing claims data or customer interactions. You used a knowledge base of structured and unstructured data in a RAG approach, merging the data to create intelligent chatbots. You also learned how to use autonomous agents to help provide responses that are contextual and relevant to the customer data and limit irrelevant and inaccurate responses.

Leave your feedback and questions in the comments section.

References

About the Authors

Dhaval Shah is a Principal Solutions Architect with Amazon Web Services based out of New York, where he guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. He brings over 20 years of technology experience on Software Development and Architecture, Data Engineering, and IT Management.

Soujanya Konka is a Senior Solutions Architect and Analytics specialist at AWS, focused on helping customers build their ideas on cloud. Expertise in design and implementation of Data platforms. Before joining AWS, Soujanya has had stints with companies such as HSBC & Cognizant

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Jianwei Li is a Principal Analytics Specialist TAM at Amazon Web Services. Jianwei provides consultant service for customers to help customer design and build modern data platform. Jianwei has been working in big data domain as software developer, consultant and tech leader.

Hrishikesh Karambelkar is a Principal Architect for Data and AIML with AWS Professional Services for Asia Pacific and Japan. He is proactively engaged with customers in APJ region to enable enterprises in their Digital Transformation journey on AWS Cloud in the areas of Generative AI, machine learning and Data, Analytics, Previously, Hrishikesh has authored books on enterprise search, biig data and co-authored research publications in the areas of Enterprise Search and AI-ML.

AWS Weekly Roundup – LlamaIndex support for Amazon Neptune, force AWS CloudFormation stack deletion, and more (May 27, 2024)

2024-05-27 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-llamaindex-support-for-amazon-neptune-force-aws-cloudformation-stack-deletion-and-more-may-27-2024/

Last week, Dr. Matt Wood, VP for AI Products at Amazon Web Services (AWS), delivered the keynote at the AWS Summit Los Angeles. Matt and guest speakers shared the latest advancements in generative artificial intelligence (generative AI), developer tooling, and foundational infrastructure, showcasing how they come together to change what’s possible for builders. You can watch the full keynote on YouTube.

Announcements during the LA Summit included two new Amazon Q courses as part of Amazon’s AI Ready initiative to provide free AI skills training to 2 million people globally by 2025. The courses are part of the Amazon Q learning plan. But that’s not all that happened last week.

Last week’s launches
Here are some launches that got my attention:

LlamaIndex support for Amazon Neptune — You can now build Graph Retrieval Augmented Generation (GraphRAG) applications by combining knowledge graphs stored in Amazon Neptune and LlamaIndex, a popular open source framework for building applications with large language models (LLMs) such as those available in Amazon Bedrock. To learn more, check the LlamaIndex documentation for Amazon Neptune Graph Store.

AWS CloudFormation launches a new parameter called DeletionMode for the DeleteStack API — You can use the AWS CloudFormation DeleteStack API to delete your stacks and stack resources. However, certain stack resources can prevent the DeleteStack API from successfully completing, for example, when you attempt to delete non-empty Amazon Simple Storage Service (Amazon S3) buckets. The DeleteStack API can enter into the DELETE_FAILED state in such scenarios. With this launch, you can now pass FORCE_DELETE_STACK value to the new DeletionMode parameter and delete such stacks. To learn more, check the DeleteStack API documentation.

Mistral Small now available in Amazon Bedrock — The Mistral Small foundation model (FM) from Mistral AI is now generally available in Amazon Bedrock. This a fast-follow to our recent announcements of Mistral 7B and Mixtral 8x7B in March, and Mistral Large in April. Mistral Small, developed by Mistral AI, is a highly efficient large language model (LLM) optimized for high-volume, low-latency language-based tasks. To learn more, check Esra’s post.

New Amazon CloudFront edge location in Cairo, Egypt — The new AWS edge location brings the full suite of benefits provided by Amazon CloudFront, a secure, highly distributed, and scalable content delivery network (CDN) that delivers static and dynamic content, APIs, and live and on-demand video with low latency and high performance. Customers in Egypt can expect up to 30 percent improvement in latency, on average, for data delivered through the new edge location. To learn more about AWS edge locations, visit CloudFront edge locations.

Amazon OpenSearch Service zero-ETL integration with Amazon S3 — This Amazon OpenSearch Service integration offers a new efficient way to query operational logs in Amazon S3 data lakes, eliminating the need to switch between tools to analyze data. You can get started by installing out-of-the-box dashboards for AWS log types such as Amazon VPC Flow Logs, AWS WAF Logs, and Elastic Load Balancing (ELB). To learn more, check out the Amazon OpenSearch Service Integrations page and the Amazon OpenSearch Service Developer Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and a Twitch show that you might find interesting:

AWS Build On Generative AI Build On Generative AI — Now streaming every Thursday, 2:00 PM US PT on twitch.tv/aws, my colleagues Tiffany and Mike discuss different aspects of generative AI and invite guest speakers to demo their work. Check out show notes and the full list of episodes on community.aws.

Amazon Bedrock Studio bootstrapper script — We’ve heard your feedback! To everyone who struggled setting up the required AWS Identity and Access Management (IAM) roles and permissions to get started with Amazon Bedrock Studio: You can now use the Bedrock Studio bootstrapper script to automate the creation of the permissions boundary, service role, and provisioning role.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Summits — It’s AWS Summit season! Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Dubai (May 29), Bangkok (May 30), Stockholm (June 4), Madrid (June 5), and Washington, DC (June 26–27).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Optimized for low-latency workloads, Mistral Small now available in Amazon Bedrock

2024-05-24 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/optimized-for-low-latency-workloads-mistral-small-now-available-in-amazon-bedrock/

Today, I am happy to announce that the Mistral Small foundation model (FM) from Mistral AI is now generally available in Amazon Bedrock. This a fast-follow to our recent announcements of Mistral 7B and Mixtral 8x7B in March, and Mistral Large in April. You can now access four high-performing models from Mistral AI in Amazon Bedrock including Mistral Small, Mistral Large, Mistral 7B, and Mixtral 8x7B, further expanding model choice.

Mistral Small, developed by Mistral AI, is a highly efficient large language model (LLM) optimized for high-volume, low-latency language-based tasks. Mistral Small is perfectly suited for straightforward tasks that can be performed in bulk, such as classification, customer support, or text generation. It provides outstanding performance at a cost-effective price point.

Some key features of Mistral Small you need to know about:

Retrieval-Augmented Generation (RAG) specialization – Mistral Small ensures that important information is retained even in long context windows, which can extend up to 32K tokens.
Coding proficiency – Mistral Small excels in code generation, review, and commenting, supporting major coding languages.
Multilingual capability – Mistral Small delivers top-tier performance in French, German, Spanish, and Italian, in addition to English. It also supports dozens of other languages.

Getting started with Mistral Small
I first need access to the model to get started with Mistral Small. I go to the Amazon Bedrock console, choose Model access, and then choose Manage model access. I expand the Mistral AI section, choose Mistral Small, and then choose Save changes.

I now have model access to Mistral Small, and I can start using it in Amazon Bedrock. I refresh the Base models table to view the current status.

I use the following template to build a prompt for the model to get sub-optimal outputs:

<s>[INST] Instruction [/INST]

Note that <s> is a special token for beginning of string (BOS) while [INST] and [/INST] are regular strings.

I try the following prompt to see a classification example:

Prompt:

<s>[INST] Classify the following email to determine if it is spam or not. Only respond with the exact words "Spam" or "Not spam". 🎉 Limited time offer! Buy 2, Get 1 Free! Offer ends today! Don't miss out the CHANCE TO SAVE, please click on the link below: https://bit.ly/buy-2-get-1-free [/INST]

Mistral 7B, Mixtral 8x7B, and Mistral Large can all correctly classify this email as “Spam.” Mistral Small is also able to classify this accurately, just as the larger models can. I also try several similar tasks, such as generating a Bash script from a text prompt and generating a recipe to prepare yoghurt, and get good results. For this reason, Mistral Small is the most cost-effective and efficient option of the Mistral AI models in Amazon Bedrock for such tasks.

Mistral Small excels in multilingual tasks. Along with English, it delivers superior performance in French, German, Spanish, and Italian. I test the model’s understanding of the German language by asking it to give me two sentences about sustainability:

Prompt:

<s>[INST] Geben Sie mir zwei Sätze zum Thema Nachhaltigkeit. [/INST]

Output:

1. Nachhaltigkeit bedeutet, die Bedürfnisse der heutigen Generation zu befriedigen, ohne die Möglichkeiten künftiger Generationen zu gefährden. 2. Ein nachhaltiger Lebensstil umfasst den bewussten Umgang mit Ressourcen, wie Wasser, Energie und Rohstoffen, sowie den Schutz von Umwelt und Klima. Diese Sätze sollten Ihnen einen guten Überblick über das Thema Nachhaltigkeit geben.

Programmatically interact with Mistral Small
I can use AWS Command Line Interface (AWS CLI) and AWS Software Development Kit (SDK) to programmatically interact with Mistral Small using Amazon Bedrock APIs. I use the following code in Python, which interacts with Amazon Bedrock Runtime APIs with AWS SDK, asking, “What is the color of the sky?”:

import argparse
import boto3
from botocore.exceptions import ClientError
import json

accept = "application/json"
content_type = "application/json"

def invoke_model(model_id, input_data, region, streaming): 
  client = boto3.client('bedrock-runtime', region_name=region)
  try:
    if streaming:
      response = client.invoke_model_with_response_stream(body=input_data, modelId=model_id, accept=accept, contentType=content_type)
    else:
      response = client.invoke_model(body=input_data, modelId=model_id, accept=accept, contentType=content_type)
    status_code = response['ResponseMetadata']['HTTPStatusCode']
    print(json.loads(response.get('body').read()))
  except ClientError as e:
    print(e)

if __name__ == "__main__":
  parser = argparse.ArgumentParser(description="Bedrock Testing Tool")
  parser.add_argument("--prompt", type=str, help="prompt to use", default="Hello")
  parser.add_argument("--max-tokens", type=int, default=64)
  parser.add_argument("--streaming", choices=["true", "false"], help="whether to stream or not", default="false")
  args = parser.parse_args()
  streaming = False
  if args.streaming == "true":
    streaming = True
  input_data = json.dumps({
    "prompt": f"<s>[INST]{args.prompt}[/INST]",
    "max_tokens": args.max_tokens
  })
  invoke_model(model_id="mistral.mistral-small-2402-v1:0", input_data=input_data, region="us-east-1", streaming=streaming)

I get the following output:

{'outputs': [{'text': ' The color of the sky can vary depending on the time of day, weather,', 'stop_reason': 'length'}]}

Now available
The Mistral Small model is now available in Amazon Bedrock in the US East (N. Virginia) Region.

To learn more, visit the Mistral AI in Amazon Bedrock product page. For pricing details, review the Amazon Bedrock pricing page.

To get started with Mistral Small in Amazon Bedrock, visit the Amazon Bedrock console and Amazon Bedrock User Guide.

— Esra

AWS Weekly Roundup: New capabilities in Amazon Bedrock, AWS Amplify Gen 2, Amazon RDS and more (May 13, 2024)

2024-05-13 Abhishek Gupta

Post Syndicated from Abhishek Gupta original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-capabilities-in-amazon-bedrock-aws-amplify-gen-2-amazon-rds-and-more-may-13-2024/

AWS Summit is in full swing around the world, with the most recent one being AWS Summit Singapore! Here is a sneak peek of the AWS staff and ASEAN community members at the Developer Lounge booth. It featured AWS Community speakers giving lightning talks on serverless, Amazon Elastic Kubernetes Service (Amazon EKS), security, generative AI, and more.

Last week’s launches
Here are some launches that caught my attention. Not surprisingly, a lot of interesting generative AI features!

Amazon Titan Text Premier is now available in Amazon Bedrock – This is the latest addition to the Amazon Titan family of large language models (LLMs) and offers optimized performance for key features like Retrieval Augmented Generation (RAG) on Knowledge Bases for Amazon Bedrock, and function calling on Agents for Amazon Bedrock.

Amazon Bedrock Studio is now available in public preview – Amazon Bedrock Studio offers a web-based experience to accelerate the development of generative AI applications by providing a rapid prototyping environment with key Amazon Bedrock features, including Knowledge Bases, Agents, and Guardrails.

Agents for Amazon Bedrock now supports Provisioned Throughput pricing model – As agentic applications scale, they require higher input and output model throughput compared to on-demand limits. The Provisioned Throughput pricing model makes it possible to purchase model units for the specific base model.

MongoDB Atlas is now available as a vector store in Knowledge Bases for Amazon Bedrock – With MongoDB Atlas vector store integration, you can build RAG solutions to securely connect your organization’s private data sources to foundation models (FMs) in Amazon Bedrock.

Amazon RDS for PostgreSQL supports pgvector 0.7.0 – You can use the open-source PostgreSQL extension for storing vector embeddings and add retrieval-augemented generation (RAG) capability in your generative AI applications. This release includes features that increase the number of dimensions of vectors you can index, reduce index size, and includes additional support for using CPU SIMD in distance computations. Also Amazon RDS Performance Insights now supports the Oracle Multitenant configuration on Amazon RDS for Oracle.

Amazon EC2 Inf2 instances are now available in new regions – These instances are optimized for generative AI workloads and are generally available in the Asia Pacific (Sydney), Europe (London), Europe (Paris), Europe (Stockholm), and South America (Sao Paulo) Regions.

New Generative Engine in Amazon Polly is now generally available – The generative engine in Amazon Polly is it’s most advanced text-to-speech (TTS) model and currently includes two American English voices, Ruth and Matthew, and one British English voice, Amy.

AWS Amplify Gen 2 is now generally available – AWS Amplify offers a code-first developer experience for building full-stack apps using TypeScript and enables developers to express app requirements like the data models, business logic, and authorization rules in TypeScript. AWS Amplify Gen 2 has added a number of features since the preview, including a new Amplify console with features such as custom domains, data management, and pull request (PR) previews.

Amazon EMR Serverless now includes performance monitoring of Apache Spark jobs with Amazon Managed Service for Prometheus – This lets you analyze, monitor, and optimize your jobs using job-specific engine metrics and information about Spark event timelines, stages, tasks, and executors. Also, Amazon EMR Studio is now available in the Asia Pacific (Melbourne) and Israel (Tel Aviv) Regions.

Amazon MemoryDB launched two new condition keys for IAM policies – The new condition keys let you create AWS Identity and Access Management (IAM) policies or Service Control Policies (SCPs) to enhance security and meet compliance requirements. Also, Amazon ElastiCache has updated it’s minimum TLS version to 1.2.

Amazon Lightsail now offers a larger instance bundle – This includes 16 vCPUs and 64 GB memory. You can now scale your web applications and run more compute and memory-intensive workloads in Lightsail.

Amazon Elastic Container Registry (ECR) adds pull through cache support for GitLab Container Registry – ECR customers can create a pull through cache rule that maps an upstream registry to a namespace in their private ECR registry. Once rule is configured, images can be pulled through ECR from GitLab Container Registry. ECR automatically creates new repositories for cached images and keeps them in-sync with the upstream registry.

AWS Resilience Hub expands application resilience drift detection capabilities – This new enhancement detects changes, such as the addition or deletion of resources within the application’s input sources.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects and blog posts that you might find interesting.

Building games with LLMs – Check out this fun experiment by Banjo Obayomi to generate Super Mario levels using different LLMs on Amazon Bedrock!

Troubleshooting with Amazon Q – Ricardo Ferreira walks us through how he solved a nasty data serialization problem while working with Apache Kafka, Go, and Protocol Buffers.

Getting started with Amazon Q in VS Code – Check out this excellent step-by-step guide by Rohini Gaonkar that covers installing the extension for features like code completion chat, and productivity-boosting capabilities powered by generative AI.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Bengaluru (May 15–16), Seoul (May 16–17), Hong Kong (May 22), Milan (May 23), Stockholm (June 4), and Madrid (June 5).

AWS re:Inforce – Explore 2.5 days of immersive cloud security learning in the age of generative AI at AWS re:Inforce, June 10–12 in Pennsylvania.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Turkey (May 18), Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), Nigeria (August 24), and New York (August 28).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Abhishek

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Build RAG and agent-based generative AI applications with new Amazon Titan Text Premier model, available in Amazon Bedrock

2024-05-07 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/build-rag-and-agent-based-generative-ai-applications-with-new-amazon-titan-text-premier-model-available-in-amazon-bedrock/

Today, we’re happy to welcome a new member of the Amazon Titan family of models: Amazon Titan Text Premier, now available in Amazon Bedrock.

Following Amazon Titan Text Lite and Titan Text Express, Titan Text Premier is the latest large language model (LLM) in the Amazon Titan family of models, further increasing your model choice within Amazon Bedrock. You can now choose between the following Titan Text models in Bedrock:

Titan Text Premier is the most advanced Titan LLM for text-based enterprise applications. With a maximum context length of 32K tokens, it has been specifically optimized for enterprise use cases, such as building Retrieval Augmented Generation (RAG) and agent-based applications with Knowledge Bases and Agents for Amazon Bedrock. As with all Titan LLMs, Titan Text Premier has been pre-trained on multilingual text data but is best suited for English-language tasks. You can further custom fine-tune (preview) Titan Text Premier with your own data in Amazon Bedrock to build applications that are specific to your domain, organization, brand style, and use case. I’ll dive deeper into model highlights and performance in the following sections of this post.
Titan Text Express is ideal for a wide range of tasks, such as open-ended text generation and conversational chat. The model has a maximum context length of 8K tokens.
Titan Text Lite is optimized for speed, is highly customizable, and is ideal to be fine-tuned for tasks such as article summarization and copywriting. The model has a maximum context length of 4K tokens.

Now, let’s discuss Titan Text Premier in more detail.

Amazon Titan Text Premier model highlights
Titan Text Premier has been optimized for high-quality RAG and agent-based applications and customization through fine-tuning while incorporating responsible artificial intelligence (AI) practices.

Optimized for RAG and agent-based applications – Titan Text Premier has been specifically optimized for RAG and agent-based applications in response to customer feedback, where respondents named RAG as one of their key components in building generative AI applications. The model training data includes examples for tasks like summarization, Q&A, and conversational chat and has been optimized for integration with Knowledge Bases and Agents for Amazon Bedrock. The optimization includes training the model to handle the nuances of these features, such as their specific prompt formats.

High-quality RAG through integration with Knowledge Bases for Amazon Bedrock – With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for RAG. You can now choose Titan Text Premier with Knowledge Bases to implement question-answering and summarization tasks over your company’s proprietary data.
Automating tasks through integration with Agents for Amazon Bedrock – You can also create custom agents that can perform multistep tasks across different company systems and data sources using Titan Text Premier with Agents for Amazon Bedrock. Using agents, you can automate tasks for your internal or external customers, such as managing retail orders or processing insurance claims.

We already see customers exploring Titan Text Premier to implement interactive AI assistants that create summaries from unstructured data such as emails. They’re also exploring the model to extract relevant information across company systems and data sources to create more meaningful product summaries.

Here’s a demo video created by my colleague Brooke Jamieson that shows an example of how you can put Titan Text Premier to work for your business.

Custom fine-tuning of Amazon Titan Text Premier (preview) – You can fine-tune Titan Text Premier with your own data in Amazon Bedrock to increase model accuracy by providing your own task-specific labeled training dataset. Customizing Titan Text Premier helps to further specialize your model and create unique user experiences that reflect your company’s brand, style, voice, and services.

Built responsibly – Amazon Titan Text Premier incorporates safe, secure, and trustworthy practices. The AWS AI Service Card for Amazon Titan Text Premier documents the model’s performance across key responsible AI benchmarks from safety and fairness to veracity and robustness. The model also integrates with Guardrails for Amazon Bedrock so you can implement additional safeguards customized to your application requirements and responsible AI policies. Amazon indemnifies customers who responsibly use Amazon Titan models against claims that generally available Amazon Titan models or their outputs infringe on third-party copyrights.

Amazon Titan Text Premier model performance
Titan Text Premier has been built to deliver broad intelligence and utility relevant for enterprises. The following table shows evaluation results on public benchmarks that assess critical capabilities, such as instruction following, reading comprehension, and multistep reasoning against price-comparable models. The strong performance across these diverse and challenging benchmarks highlights that Titan Text Premier is built to handle a wide range of use cases in enterprise applications, offering great price performance. For all benchmarks listed below, a higher score is a better score.

Capability	Benchmark	Description	Amazon	Google	OpenAI
Capability	Benchmark	Description	Titan Text Premier	Gemini Pro 1.0	GPT-3.5
General	MMLU (Paper)	Representation of questions in 57 subjects	70.4% (5-shot)	71.8% (5-shot)	70.0% (5-shot)
Instruction following	IFEval (Paper)	Instruction-following evaluation for large language models	64.6% (0-shot)	not published	not published
Reading comprehension	RACE-H (Paper)	Large-scale reading comprehension	89.7% (5-shot)	not published	not published
Reasoning	HellaSwag (Paper)	Common-sense reasoning	92.6% (10-shot)	84.7% (10-shot)	85.5% (10-shot)
	DROP, F1 score (Paper)	Reasoning over text	77.9 (3-shot)	74.1 (Variable Shots)	64.1 (3-shot)
	BIG-Bench Hard (Paper)	Challenging tasks requiring multistep reasoning	73.7% (3-shot CoT)	75.0% (3-shot CoT)	not published
	ARC-Challenge (Paper)	Common-sense reasoning	85.8% (5-shot)	not published	85.2% (25-shot)

Note: Benchmarks evaluate model performance using a variation of few-shot and zero-shot prompting. With few-shot prompting, you provide the model with a number of concrete examples (three for 3-shot, five for 5-shot, etc.) of how to solve a specific task. This demonstrates the model’s ability to learn from example, called in-context learning. With zero-shot prompting on the other hand, you evaluate a model’s ability to perform tasks by relying only on its preexisting knowledge and general language understanding without providing any examples.

Get started with Amazon Titan Text Premier
To enable access to Amazon Titan Text Premier, navigate to the Amazon Bedrock console and choose Model access on the bottom left pane. On the Model access overview page, choose the Manage model access button in the upper right corner and enable access to Amazon Titan Text Premier.

To use Amazon Titan Text Premier in the Bedrock console, choose Text or Chat under Playgrounds in the left menu pane. Then choose Select model and select Amazon as the category and Titan Text Premier as the model. To explore the model, you can load examples. The following screenshot shows one of those examples that demonstrates the model’s chain of thought (CoT) and reasoning capabilities.

By choosing View API request, you can get a code example of how to invoke the model using the AWS Command Line Interface (AWS CLI) with the current example prompt. You can also access Amazon Bedrock and available models using the AWS SDKs. In the following example, I will use the AWS SDK for Python (Boto3).

Amazon Titan Text Premier in action
For this demo, I ask Amazon Titan Text Premier to summarize one of my previous AWS News Blog posts that announced the availability of Amazon Titan Image Generator and the watermark detection feature.

For summarization tasks, a recommended prompt template looks like this:

The following is text from a {{Text Category}}:
{{Text}}
Summarize the {{Text Category}} in {{length of summary}}

For more prompting best practices, check out the Amazon Titan Text Prompt Engineering Guidelines.

I adapt this template to my example and define the prompt. In preparation, I saved my News Blog post as a text file and read it into the post string variable.

prompt = """
The following is text from a AWS News Blog post:

<text>
%s
</text>

Summarize the above AWS News Blog post in a short paragraph.
""" % post

Similar to previous Amazon Titan Text models, Amazon Titan Text Premier supports temperature and topP inference parameters to control the randomness and diversity of the response, as well as maxTokenCount and stopSequences to control the length of the response.

import boto3
import json

bedrock_runtime = boto3.client(service_name="bedrock-runtime")

body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{  
        "maxTokenCount":256,
        "stopSequences":[],
        "temperature":0,
        "topP":0.9
    }
})

Then, I use the InvokeModel API to send the inference request.

response = bedrock_runtime.invoke_model(
    body=body,
	modelId="amazon.titan-text-premier-v1:0",
    accept="application/json", 
    contentType="application/json"
)

response_body = json.loads(response.get('body').read())
print(response_body.get('results')[0].get('outputText'))

And here’s the response:

Amazon Titan Image Generator is now generally available in Amazon Bedrock, giving you an easy way to build and scale generative AI applications with new image generation and image editing capabilities, including instant customization of images. Watermark detection for Titan Image Generator is now generally available in the Amazon Bedrock console. Today, we’re also introducing a new DetectGeneratedContent API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

For more examples in different programming languages, check out the code examples section in the Amazon Bedrock User Guide.

More resources
Here are some additional resources that you might find helpful:

Intended use cases and more — Check out the AWS AI Service Card for Amazon Titan Text Premier to learn more about the models’ intended use cases, design, and deployment, as well as performance optimization best practices.

AWS Generative AI CDK Constructs — Amazon Titan Text Premier is supported by the AWS Generative AI CDK Constructs, an open source extension of the AWS Cloud Development Kit (AWS CDK), providing sample implementations of AWS CDK for common generative AI patterns.

Amazon Titan models — If you’re curious to learn more about Amazon Titan models in general, check out the following video. Dr. Sherry Marcus, Director of Applied Science for Amazon Bedrock, shares how the Amazon Titan family of models incorporates the 25 years of experience Amazon has innovating with AI and machine learning (ML) across its business.

Now available
Amazon Titan Text Premier is available today in the AWS US East (N. Virginia) Region. Custom fine-tuning for Amazon Titan Text Premier is available today in preview in the AWS US East (N. Virginia) Region. Check the full Region list for future updates. To learn more about the Amazon Titan family of models, visit the Amazon Titan product page. For pricing details, review the Amazon Bedrock pricing page.

Give Amazon Titan Text Premier a try in the Amazon Bedrock console today, send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

Build generative AI applications with Amazon Bedrock Studio (preview)

2024-05-07 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/build-generative-ai-applications-with-amazon-bedrock-studio-preview/

Today, we’re introducing Amazon Bedrock Studio, a new web-based generative artificial intelligence (generative AI) development experience, in public preview. Amazon Bedrock Studio accelerates the development of generative AI applications by providing a rapid prototyping environment with key Amazon Bedrock features, including Knowledge Bases, Agents, and Guardrails.

As a developer, you can now use your company’s single sign-on credentials to sign in to Bedrock Studio and start experimenting. You can build applications using a wide array of top performing models, evaluate, and share your generative AI apps within Bedrock Studio. The user interface guides you through various steps to help improve a model’s responses. You can experiment with model settings, and securely integrate your company data sources, tools, and APIs, and set guardrails. You can collaborate with team members to ideate, experiment, and refine your generative AI applications—all without requiring advanced machine learning (ML) expertise or AWS Management Console access.

As an Amazon Web Services (AWS) administrator, you can be confident that developers will only have access to the features provided by Bedrock Studio, and won’t have broader access to AWS infrastructure and services.

Now, let me show you how to get started with Amazon Bedrock Studio.

Get started with Amazon Bedrock Studio
As an AWS administrator, you first need to create an Amazon Bedrock Studio workspace, then select and add users you want to give access to the workspace. Once the workspace is created, you can share the workspace URL with the respective users. Users with access privileges can sign in to the workspace using single sign-on, create projects within their workspace, and start building generative AI applications.

Create Amazon Bedrock Studio workspace
Navigate to the Amazon Bedrock console and choose Bedrock Studio on the bottom left pane.

Before creating a workspace, you need to configure and secure the single sign-on integration with your identity provider (IdP) using the AWS IAM Identity Center. For detailed instructions on how to configure various IdPs, such as AWS Directory Service for Microsoft Active Directory, Microsoft Entra ID, or Okta, check out the AWS IAM Identity Center User Guide. For this demo, I configured user access with the default IAM Identity Center directory.

Next, choose Create workspace, enter your workspace details, and create any required AWS Identity and Access Management (IAM) roles.

If you want, you can also select default generative AI models and embedding models for the workspace. Once you’re done, choose Create.

Next, select the created workspace.

Then, choose User management and Add users or groups to select the users you want to give access to this workspace.

Back in the Overview tab, you can now copy the Bedrock Studio URL and share it with your users.

Build generative AI applications using Amazon Bedrock Studio
As a builder, you can now navigate to the provided Bedrock Studio URL and sign in with your single sign-on user credentials. Welcome to Amazon Bedrock Studio! Let me show you how to choose from industry leading FMs, bring your own data, use functions to make API calls, and safeguard your applications using guardrails.

Choose from multiple industry leading FMs
By choosing Explore, you can start selecting available FMs and explore the models using natural language prompts.

If you choose Build, you can start building generative AI applications in a playground mode, experiment with model configurations, iterate on system prompts to define the behavior of your application, and prototype new features.

Bring your own data
With Bedrock Studio, you can securely bring your own data to customize your application by providing a single file or by selecting a knowledge base created in Amazon Bedrock.

Use functions to make API calls and make model responses more relevant
A function call allows the FM to dynamically access and incorporate external data or capabilities when responding to a prompt. The model determines which function it needs to call based on an OpenAPI schema that you provide.

Functions enable a model to include information in its response that it doesn’t have direct access to or prior knowledge of. For example, a function could allow the model to retrieve and include the current weather conditions in its response, even though the model itself doesn’t have that information stored.

Safeguard your applications using Guardrails for Amazon Bedrock
You can create guardrails to promote safe interactions between users and your generative AI applications by implementing safeguards customized to your use cases and responsible AI policies.

When you create applications in Amazon Bedrock Studio, the corresponding managed resources such as knowledge bases, agents, and guardrails are automatically deployed in your AWS account. You can use the Amazon Bedrock API to access those resources in downstream applications.

Here’s a short demo video of Amazon Bedrock Studio created by my colleague Banjo Obayomi.

Join the preview
Amazon Bedrock Studio is available today in public preview in AWS Regions US East (N. Virginia) and US West (Oregon). To learn more, visit the Amazon Bedrock Studio page and User Guide.

Give Amazon Bedrock Studio a try today and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.

— Antje

Build RAG applications with MongoDB Atlas, now available in Knowledge Bases for Amazon Bedrock

2024-05-02 Abhishek Gupta

Post Syndicated from Abhishek Gupta original https://aws.amazon.com/blogs/aws/build-rag-applications-with-mongodb-atlas-now-available-in-knowledge-bases-for-amazon-bedrock/

Foundational models (FMs) are trained on large volumes of data and use billions of parameters. However, in order to answer customers’ questions related to domain-specific private data, they need to reference an authoritative knowledge base outside of the model’s training data sources. This is commonly achieved using a technique known as Retrieval Augmented Generation (RAG). By fetching data from the organization’s internal or proprietary sources, RAG extends the capabilities of FMs to specific domains, without needing to retrain the model. It is a cost-effective approach to improving model output so it remains relevant, accurate, and useful in various contexts.

Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows.

Today, we are announcing the availability of MongoDB Atlas as a vector store in Knowledge Bases for Amazon Bedrock. With MongoDB Atlas vector store integration, you can build RAG solutions to securely connect your organization’s private data sources to FMs in Amazon Bedrock. This integration adds to the list of vector stores supported by Knowledge Bases for Amazon Bedrock, including Amazon Aurora PostgreSQL-Compatible Edition, vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud.

Build RAG applications with MongoDB Atlas and Knowledge Bases for Amazon Bedrock
Vector Search in MongoDB Atlas is powered by the vectorSearch index type. In the index definition, you must specify the field that contains the vector data as the vector type. Before using MongoDB Atlas vector search in your application, you will need to create an index, ingest source data, create vector embeddings and store them in a MongoDB Atlas collection. To perform queries, you will need to convert the input text into a vector embedding, and then use an aggregation pipeline stage to perform vector search queries against fields indexed as the vector type in a vectorSearch type index.

Thanks to the MongoDB Atlas integration with Knowledge Bases for Amazon Bedrock, most of the heavy lifting is taken care of. Once the vector search index and knowledge base are configured, you can incorporate RAG into your applications. Behind the scenes, Amazon Bedrock will convert your input (prompt) into embeddings, query the knowledge base, augment the FM prompt with the search results as contextual information and return the generated response.

Let me walk you through the process of setting up MongoDB Atlas as a vector store in Knowledge Bases for Amazon Bedrock.

Configure MongoDB Atlas
Start by creating a MongoDB Atlas cluster on AWS. Choose an M10 dedicated cluster tier. Once the cluster is provisioned, create a database and collection. Next, create a database user and grant it the Read and write to any database role. Select Password as the Authentication Method. Finally, configure network access to modify the IP Access List – add IP address 0.0.0.0/0 to allow access from anywhere.

Use the following index definition to create the Vector Search index:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "AMAZON_BEDROCK_CHUNK_VECTOR",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "AMAZON_BEDROCK_METADATA",
      "type": "filter"
    },
    {
      "path": "AMAZON_BEDROCK_TEXT_CHUNK",
      "type": "filter"
    }
  ]
}

Configure the knowledge base
Create an AWS Secrets Manager secret to securely store the MongoDB Atlas database user credentials. Choose Other as the Secret type. Create an Amazon Simple Storage Service (Amazon S3) storage bucket and upload the Amazon Bedrock documentation user guide PDF. Later, you will use the knowledge base to ask questions about Amazon Bedrock.

You can also use another document of your choice because Knowledge Base supports multiple file formats (including text, HTML, and CSV).

Navigate to the Amazon Bedrock console and refer to the Amzaon Bedrock User Guide to configure the knowledge base. In the Select embeddings model and configure vector store, choose Titan Embeddings G1 – Text as the embedding model. From the list of databases, choose MongoDB Atlas.

Enter the basic information for the MongoDB Atlas cluster (Hostname, Database name, etc.) as well as the ARN of the AWS Secrets Manager secret you had created earlier. In the Metadata field mapping attributes, enter the vector store specific details. They should match the vector search index definition you used earlier.

Initiate the knowledge base creation. Once complete, synchronise the data source (S3 bucket data) with the MongoDB Atlas vector search index.

Once the synchronization is complete, navigate to MongoDB Atlas to confirm that the data has been ingested into the collection you created.

Notice the following attributes in each of the MongoDB Atlas documents:

AMAZON_BEDROCK_TEXT_CHUNK – Contains the raw text for each data chunk.
AMAZON_BEDROCK_CHUNK_VECTOR – Contains the vector embedding for the data chunk.
AMAZON_BEDROCK_METADATA – Contains additional data for source attribution and rich query capabilities.

Test the knowledge base
It’s time to ask questions about Amazon Bedrock by querying the knowledge base. You will need to choose a foundation model. I picked Claude v2 in this case and used “What is Amazon Bedrock” as my input (query).

If you are using a different source document, adjust the questions accordingly.

You can also change the foundation model. For example, I switched to Claude 3 Sonnet. Notice the difference in the output and select Show source details to see the chunks cited for each footnote.

Integrate knowledge base with applications
To build RAG applications on top of Knowledge Bases for Amazon Bedrock, you can use the RetrieveAndGenerate API which allows you to query the knowledge base and get a response.

Here is an example using the AWS SDK for Python (Boto3):

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieveAndGenerate(input, kbId):
    return bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
                }
            }
        )

response = retrieveAndGenerate("What is Amazon Bedrock?", "BFT0P4NR1U")["output"]["text"]

If you want to further customize your RAG solutions, consider using the Retrieve API, which returns the semantic search responses that you can use for the remaining part of the RAG workflow.

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_runtime.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults
            }
        }
    )

response = retrieve("What is Amazon Bedrock?", "BGU0Q4NU0U")["retrievalResults"]

Things to know

MongoDB Atlas cluster tier – This integration requires requires an Atlas cluster tier of at least M10.
AWS PrivateLink – For the purposes of this demo, MongoDB Atlas database IP Access List was configured to allow access from anywhere. For production deployments, AWS PrivateLink is the recommended way to have Amazon Bedrock establish a secure connection to your MongoDB Atlas cluster. Refer to the Amazon Bedrock User guide (under MongoDB Atlas) for details.
Vector embedding size – The dimension size of the vector index and the embedding model should be the same. For example, if you plan to use Cohere Embed (which has a dimension size of 1024) as the embedding model for the knowledge base, make sure to configure the vector search index accordingly.
Metadata filters – You can add metadata for your source files to retrieve a well-defined subset of the semantically relevant chunks based on applied metadata filters. Refer to the documentation to learn more about how to use metadata filters.

Now available
MongoDB Atlas vector store in Knowledge Bases for Amazon Bedrock is available in the US East (N. Virginia) and US West (Oregon) Regions. Be sure to check the full Region list for future updates.

Learn more

Try out the MongoDB Atlas integration with Knowledge Bases for Amazon Bedrock! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts and engage with the generative AI builder community at community.aws.

— Abhishek

Amazon Titan Text V2 now available in Amazon Bedrock, optimized for improving RAG

2024-04-30 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-titan-text-v2-now-available-in-amazon-bedrock-optimized-for-improving-rag/

The Amazon Titan family of models, available exclusively in Amazon Bedrock, is built on top of 25 years of Amazon expertise in artificial intelligence (AI) and machine learning (ML) advancements. Amazon Titan foundation models (FMs) offer a comprehensive suite of pre-trained image, multimodal, and text models accessible through a fully managed API. Trained on extensive datasets, Amazon Titan models are powerful and versatile, designed for a range of applications while adhering to responsible AI practices.

The latest addition to the Amazon Titan family is Amazon Titan Text Embeddings V2, the second-generation text embeddings model from Amazon now available within Amazon Bedrock. This new text embeddings model is optimized for Retrieval-Augmented Generation (RAG). It is pre-trained on 100+ languages and on code.

Amazon Titan Text Embeddings V2 now lets you choose the size of of the output vector (either 256, 512, or 1024). Larger vector sizes create more detailed responses, but will also increase the computational time. Shorter vector lengths are less detailed but will improve the response time. Using smaller vectors helps to reduce your storage costs and the latency to search and retrieve document extracts from a vector database. We measured the accuracy of the vectors generated by Amazon Titan Text Embeddings V2 and we observed that vectors with 512 dimensions keep approximately 99 percent of the accuracy provided by vectors with 1024 dimensions. Vectors with 256 dimensions keep 97 percent of the accuracy. This means that you can save 75 percent in vector storage (from 1024 down to 256 dimensions) and keep approximately 97 percent of the accuracy provided by larger vectors.

Amazon Titan Text Embeddings V2 also proposes an improved unit vector normalization that helps improve the accuracy when measuring vector similarity. You can choose between normalized or unnormalized versions of the embeddings based on your use case (normalized is more accurate for RAG use cases). Normalization of a vector is the process of scaling it to have a unit length or magnitude of 1. It is useful to ensure that all vectors have the same scale and contribute equally during vector operations, preventing some vectors from dominating others due to their larger magnitudes.

This new text embeddings model is well-suited for a variety of use cases. It can help you perform semantic searches on documents, for example, to detect plagiarism. It can classify labels into data-based learned representations, for example, to categorize movies into genres. It can also improve the quality and relevance of retrieved or generated search results, for example, recommending content based on interest using RAG.

How embeddings help to improve accuracy of RAG
Imagine you’re a superpowered research assistant for a large language model (LLM). LLMs are like those brainiacs who can write different creative text formats, but their knowledge comes from the massive datasets they were trained on. This training data might be a bit outdated or lack specific details for your needs.

This is where RAG comes in. RAG acts like your assistant, fetching relevant information from a custom source, like a company knowledge base. When the LLM needs to answer a question, RAG provides the most up-to-date information to help it generate the best possible response.

To find the most up-to-date information, RAG uses embeddings. Imagine these embeddings (or vectors) as super-condensed summaries that capture the key idea of a piece of text. A high-quality embeddings model, such as Amazon Titan Text Embeddings V2, can create these summaries accurately, like a great assistant who can quickly grasp the important points of each document. This ensures RAG retrieves the most relevant information for the LLM, leading to more accurate and on-point answers.

Think of it like searching a library. Each page of the book is indexed and represented by a vector. With a bad search system, you might end up with a pile of books that aren’t quite what you need. But with a great search system that understands the content (like a high-quality embeddings model), you’ll get exactly what you’re looking for, making the LLM’s job of generating the answer much easier.

Amazon Titan Text Embeddings V2 overview
Amazon Titan Text Embeddings V2 is optimized for high accuracy and retrieval performance at smaller dimensions for reduced storage and latency. We measured that vectors with 512 dimensions maintain approximately 99 percent of the accuracy provided by vectors with 1024 dimensions. Those with 256 dimensions offer 97 percent of the accuracy.

Max tokens	8,192
Languages	100+ in pre-training
Fine-tuning supported	No
Normalization supported	Yes
Vector size	256, 512, 1,024 (default)

How to use Amazon Titan Text Embeddings V2
It’s very likely you will interact with Amazon Titan Text Embeddings V2 indirectly through Knowledge Bases for Amazon Bedrock. Knowledge Bases takes care of the heavy lifting to create a RAG-based application. However, you can also use the Amazon Bedrock Runtime API to directly invoke the model from your code. Here is a simple example in the Swift programming language (just to show you you can use any programming language, not just Python):

import Foundation
import AWSBedrockRuntime 

let text = "This is the text to transform in a vector"

// create an API client
let client = try BedrockRuntimeClient(region: "us-east-1")

// create the request 
let request = InvokeModelInput(
   accept: "application/json",
   body: """
   {
      "inputText": "\(text)",
      "dimensions": 256,
      "normalize": true
   }
   """.data(using: .utf8), 
   contentType: "application/json",
   modelId: "amazon.titan-embed-text-v2:0")

// send the request 
let response = try await client.invokeModel(input: request)

// decode the response
let response = String(data: (response.body!), encoding: .utf8)

print(response ?? "")

The model takes three parameters in its payload:

inputText – The text to convert to embeddings.
normalize – A flag indicating whether or not to normalize the output embeddings. It defaults to true, which is optimal for RAG use cases.
dimensions – The number of dimensions the output embeddings should have. Three values are accepted: 256, 512, and 1024 (the default value).

I added the dependency on the AWS SDK for Swift in my Package.swift. I type swift run to build and run this code. It prints the following output (truncated to keep it brief):

{"embedding":[-0.26757812,0.15332031,-0.015991211...-0.8203125,0.94921875],
"inputTextTokenCount":9}

As usual, do not forget to enable access to the new model in the Amazon Bedrock console before using the API.

Amazon Titan Text Embeddings V2 will soon be the default LLM proposed by Knowledge Bases for Amazon Bedrock. Your existing knowledge bases created with the original Amazon Titan Text Embeddings model will continue to work without changes.

To learn more about the Amazon Titan family of models, view the following video:

The new Amazon Titan Text Embeddings V2 model is available today in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) AWS Regions. Check the full Region list for future updates.

To learn more, check out the Amazon Titan in Amazon Bedrock product page and pricing page. Also, do not miss this blog post to learn how to use Amazon Titan Text Embeddings models. You can also visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions.

Give Amazon Titan Text Embeddings V2 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— seb

Run scalable, enterprise-grade generative AI workloads with Cohere Command R & R+, now available in Amazon Bedrock

2024-04-29 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/run-scalable-enterprise-grade-generative-ai-workloads-with-cohere-r-r-now-available-in-amazon-bedrock/

In November 2023, we made two new Cohere models available in Amazon Bedrock (Cohere Command Light and Cohere Embed English). Today, we’re announcing the addition of two more Cohere models in Amazon Bedrock; Cohere Command R and Command R+.

Organizations need generative artificial intelligence (generative AI) models to securely interact with information stored in their enterprise data sources. Both Command R and Command R+ are powerful, scalable large language models (LLMs), purpose-built for real-world, enterprise-grade workloads. These models are multilingual and are focused on balancing high efficiency with strong accuracy to excel at capabilities such as Retrieval-Augmented Generation (RAG), and tool use to enable enterprises to move beyond proof-of-concept (POC), and into production using artificial intelligence (AI).

Command R is a scalable multilingual generative model targeting RAG and tool use to enable production-scale AI for enterprises. Command R+ is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads and optimize business AI applications. Command R+ is optimized for advanced RAG to provide enterprise-ready, highly reliable, and veriﬁable responses due to in-line citations which come standard with this model. With these new Cohere models in Bedrock, you can scale with AI to quickly ﬁnd the most relevant information to support tasks across business functions like ﬁnance, human resources (HR), sales, marketing, and customer support, among others, in a range of business sectors. Tool use is also available in Command R+. Command R+ is a strong multilingual model, and similarly to Command R, features a tokenizer that compresses non-English text much better than the tokenizer used for other models in the market.

Getting started with Command R and Command R+
To get started with both models in Amazon Bedrock, you first need to get access to the models. In the Amazon Bedrock console, choose Model access, and then choose Manage model access. Next, choose your preferred model(s), and then choose Save changes. As you can see, you now have six Cohere models to choose from in Amazon Bedrock – including Command R and Command R+ – giving you greater choice and flexibility to use the optimal models for your specific business needs.

Once you have access to your preferred model, you can use the model in Amazon Bedrock. Refresh the base models table to view the updated status.

The models have been trained to respond in the language of the user such as English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. Here’s an example:

Prompt

<s>"Écris une description de produit pour une voiture électrique en 50 à 75 mots"

Output

Découvrez la voiture électrique qui va révolutionner votre façon de conduire.
Avec son design élégant, cette voiture offre une expérience de conduite unique
avec une accélération puissante et une autonomie impressionnante. Sa
technologie avancée vous garantit une charge rapide et une fiabilité inégalée.
Avec sa conception innovante et durable, cette voiture est parfaite pour les 
trajets urbains et les longues distances. Profitez d'une conduite silencieuse
et vivez l'expérience de la voiture électrique!

Programmatically interact with Command R and Command R+
You can also use AWS Command Line Interface (CLI) and AWS Software Development Kit (SDK) to make various calls using Amazon Bedrock APIs. Following, is a sample code in Python that interacts with Amazon Bedrock Runtime APIs with AWS SDK. Taking the same text generation prompt I used earlier, here is how it looks when used programmatically. In this example I’m interacting with the Command R model. Back to Python, I first run the ListFoundationModels API call to discover the modelId for Command R.

import boto3
import json
import numpy

bedrock = boto3.client(service_name='bedrock', region_name='us-east-1')

listModels = bedrock.list_foundation_models(byProvider='cohere')
print("\n".join(list(map(lambda x: f"{x['modelName']} : { x['modelId'] }", listModels['modelSummaries']))))

Running this code gives the list:

Command : cohere.command-text-v14
Command Light : cohere.command-light-text-v14
Embed English : cohere.embed-english-v3
Embed Multilingual : cohere.embed-multilingual-v3
Command R: cohere.command-r-v1:0
Command R+: cohere.command-r-plus-v1:0

From this list, I select cohere.command-r-v1:0 model ID and write the code to generate the text as shown earlier in this post.

import boto3
import json

bedrock = boto3.client(service_name="bedrock-runtime", region_name='us-east-1')

prompt = """
<s>Écris une description de produit pour une voiture électrique en 50 à 75 mots

body = json.dumps({
    "prompt": prompt,
    "max_tokens": 512,
    "top_p": 0.8,
    "temperature": 0.5,
})

modelId = "cohere.command-r-v1:0"

accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body,
    modelId=modelId,
    accept=accept,
    contentType=contentType
)

print(json.loads(response.get('body').read()))

You can get JSON formatted output as like:

Découvrez la voiture électrique qui va révolutionner votre façon de conduire.
Avec son design élégant, cette voiture offre une expérience de conduite unique
avec une accélération puissante et une autonomie impressionnante. Sa
technologie avancée vous garantit une charge rapide et une fiabilité inégalée.
Avec sa conception innovante et durable, cette voiture est parfaite pour les 
trajets urbains et les longues distances. Profitez d'une conduite silencieuse
et vivez l'expérience de la voiture électrique!

Now Available

Command R and Command R+ models, along with other Cohere models, are available today in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) Regions; check the full Region list for future updates.

Visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Give Command R and Command R+ a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

– Veliswa.

AWS Weekly Roundup: Amazon Bedrock, AWS CodeBuild, Amazon CodeCatalyst, and more (April 29, 2024)

2024-04-29 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-bedrock-aws-codebuild-amazon-codecatalyst-and-more-april-29-2024/

This was a busy week for Amazon Bedrock with many new features! Using GitHub Actions with AWS CodeBuild is much easier. Also, Amazon Q in Amazon CodeCatalyst can now manage more complex issues.

I was amazed to meet so many new and old friends at the AWS Summit London. To give you a quick glimpse, here’s AWS Hero Yan Cui starting his presentation at the AWS Community stage.

Last week’s launches
With so many interesting new features, I start with generative artificial intelligence (generative AI) and then move to the other topics. Here’s what got my attention:

Amazon Bedrock – For supported architectures such as Llama, Mistral, or Flan T5, you can now import custom models and access them on demand. Model evaluation is now generally available to help you evaluate, compare, and select the best foundation models (FMs) for your specific use case. You can now access Meta’s Llama 3 models.

Agents for Amazon Bedrock – A simplified agent creation and return of control, so that you can define an action schema and get the control back to perform those action without needing to create a specific AWS Lambda function. Agents also added support for Anthropic Claude 3 Haiku and Sonnet to help build faster and more intelligent agents.

Knowledge Bases for Amazon Bedrock – You can now ingest data from up to five data sources and provide more complete answers. In the console, you can now chat with one of your documents without needing to set up a vector database (read more in this Machine Learning blog post).

Guardrails for Amazon Bedrock – The capability to implement safeguards based on your use cases and responsible AI policies is now available with new safety filters and privacy controls.

Amazon Titan – The new watermark detection feature is now generally available in Amazon Bedrock. In this way, you can identify images generated by Amazon Titan Image Generator using an invisible watermark present in all images generated by Amazon Titan.

Amazon CodeCatalyst – Amazon Q can now split complex issues into separate, simpler tasks that can then be assigned to a user or back to Amazon Q. CodeCatalyst now also supports approval gates within a workflow. Approval gates pause a workflow that is building, testing, and deploying code so that a user can validate whether it should be allowed to proceed.

Amazon EC2 – You can now remove an automatically assigned public IPv4 address from an EC2 instance. If you no longer need the automatically assigned public IPv4 (for example, because you are migrating to using a private IPv4 address for SSH with EC2 instance connect), you can use this option to quickly remove the automatically assigned public IPv4 address and reduce your public IPv4 costs.

Network Load Balancer – Now supports Resource Map in AWS Management Console, a tool that displays all your NLB resources and their relationships in a visual format on a single page. Note that Application Load Balancer already supports Resource Map in the console.

AWS CodeBuild – Now supports managed GitHub Action self-hosted runners. You can configure CodeBuild projects to receive GitHub Actions workflow job events and run them on CodeBuild ephemeral hosts.

Amazon Route 53 – You can now define a standard DNS configuration in the form of a Profile, apply this configuration to multiple VPCs, and share it across AWS accounts.

AWS Direct Connect – Hosted connections now support capacities up to 25 Gbps. Before, the maximum was 10 Gbps. Higher bandwidths simplify deployments of applications such as advanced driver assistance systems (ADAS), media and entertainment (M&E), artificial intelligence (AI), and machine learning (ML).

NoSQL Workbench for Amazon DynamoDB – A revamped operation builder user interface to help you better navigate, run operations, and browse your DynamoDB tables.

Amazon GameLift – Now supports in preview end-to-end development of containerized workloads, including deployment and scaling on premises, in the cloud, or for hybrid configurations. You can use containers for building, deploying, and running game server packages.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

GQL, the new ISO standard for graphs, has arrived – GQL, which stands for Graph Query Language, is the first new ISO database language since the introduction of SQL in 1987.

Authorize API Gateway APIs using Amazon Verified Permissions and Amazon Cognito – Externalizing authorization logic for application APIs can yield multiple benefits. Here’s an example of how to use Cedar policies to secure a REST API.

Build and deploy a 1 TB/s file system in under an hour – Very nice walkthrough for something that used to be not so easy to do in the recent past.

Let’s Architect! Discovering Generative AI on AWS – A new episode in this amazing series of posts that provides a broad introduction to the domain and then shares a mix of videos, blog posts, and hands-on workshops.

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock – This post explores the new features (including AWS CloudFormation support) and how they align with the AWS Well-Architected Framework.

Using the unified CloudWatch Agent to send traces to AWS X-Ray – With added support for the collection of AWS X-Ray and OpenTelemetry traces, you can now provision a single agent to capture metrics, logs, and traces.

The executive’s guide to generative AI for sustainability – A guide for implementing a generative AI roadmap within sustainability strategies.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Singapore (May 7), Seoul (May 16–17), Hong Kong (May 22), Milan (May 23), Stockholm (June 4), and Madrid (June 5).

AWS re:Inforce – Explore 2.5 days of immersive cloud security learning in the age of generative AI at AWS re:Inforce, June 10–12 in Pennsylvania.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Turkey (May 18), Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), Nigeria (August 24), and New York (August 28).

GOTO EDA Day London – Join us in London on May 14 to learn about event-driven architectures (EDA) for building highly scalable, fault tolerant, and extensible applications. This conference is organized by GOTO, AWS, and partners.

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Let’s Architect! Discovering Generative AI on AWS

2024-04-24 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-generative-ai/

Generative artificial intelligence (generative AI) is a type of AI used to generate content, including conversations, images, videos, and music. Generative AI can be used directly to build customer-facing features (a chatbot or an image generator), or it can serve as an underlying component in a more complex system. For example, it can generate embeddings (or compressed representations) or any other artifact necessary to improve downstream machine learning (ML) models or back-end services.

With the advent of generative AI, it’s fundamental to understand what it is, how it works under the hood, and which options are available for putting it into production. In some cases, it can also be helpful to move closer to the underlying model in order to fine tune or drive domain-specific improvements. With this edition of Let’s Architect!, we’ll cover these topics and share an initial set of methodologies to put generative AI into production. We’ll start with a broad introduction to the domain and then share a mix of videos, blogs, and hands-on workshops.

Navigating the future of AI

Many teams are turning to open source tools running on Kubernetes to help accelerate their ML and generative AI journeys. In this video session, experts discuss why Kubernetes is ideal for ML, then tackle challenges like dependency management and security. You will learn how tools like Ray, JupyterHub, Argo Workflows, and Karpenter can accelerate your path to building and deploying generative AI applications on Amazon Elastic Kubernetes Service (Amazon EKS). A real-world example showcases how Adobe leveraged Amazon EKS to achieve faster time-to-market and reduced costs. You will be also introduced to Data on EKS, a new AWS project offering best practices for deploying various data workloads on Amazon EKS.

Take me to this video!

Figure 1. Containers are a powerful tool for creating reproducible research and production environments for ML.

Generative AI: Architectures and applications in depth

This video session aims to provide an in-depth exploration of the emerging concepts in generative AI. By delving into practical applications and detailing best practices for implementation, the session offers a concrete understanding that empowers businesses to harness the full potential of these technologies. You can gain valuable insights into navigating the complexities of generative AI, equipping you with the knowledge and strategies necessary to stay ahead of the curve and capitalize on the transformative power of these new methods. If you want to dive even deeper, check this generative AI best practices post.

Take me to this video!

Figure 2. Models are growing exponentially: improved capabilities come with higher costs for productionizing them.

SaaS meets AI/ML & generative AI: Multi-tenant patterns & strategies

Working with AI/ML workloads and generative AI in a production environment requires appropriate system design and careful considerations for tenant separation in the context of SaaS. You’ll need to think about how the different tenants are mapped to models, how inferencing is scaled, how solutions are integrated with other upstream/downstream services, and how large language models (LLMs) can be fine-tuned to meet tenant-specific needs.

This video drills down into the concept of multi-tenancy for AI/ML workloads, including the common design, performance, isolation, and experience challenges that you can find during your journey. You will also become familiar with concepts like RAG (used to enrich the LLMs with contextual information) and fine tuning through practical examples.

Take me to this video!

Supporting different tenants might need fetching different context information with RAGs or offering different options for fine-tuning.

Figure 3. Supporting different tenants might need fetching different context information with RAGs or offering different options for fine-tuning.

Achieve DevOps maturity with BMC AMI zAdviser Enterprise and Amazon Bedrock

DevOps Research and Assessment (DORA) metrics, which measure critical DevOps performance indicators like lead time, are essential to engineering practices, as shown in the Accelerate book‘s research. By leveraging generative AI technology, the zAdviser Enterprise platform can now offer in-depth insights and actionable recommendations to help organizations optimize their DevOps practices and drive continuous improvement. This blog demonstrates how generative AI can go beyond language or image generation, applying to a wide spectrum of domains.

Take me to this blog post!

Figure 4. Generative AI is used to provide summarization, analysis, and recommendations for improvement based on the DORA metrics.

Hands-on Generative AI: AWS workshops

Getting hands on is often the best way to understand how everything works in practice and create the mental model to connect theoretical foundations with some real-world applications.

Generative AI on Amazon SageMaker shows how you can build, train, and deploy generative AI models. You can learn about options to fine-tune, use out-of-the-box existing models, or even customize the existing open source models based on your needs.

Building with Amazon Bedrock and LangChain demonstrates how an existing fully-managed service provided by AWS can be used when you work with foundational models, covering a wide variety of use cases. Also, if you want a quick guide for prompt engineering, you can check out the PartyRock lab in the workshop.

Figure 5. An image replacement example that you can find in the workshop.

See you next time!

Thanks for reading! We hope you got some insight into the applications of generative AI and discovered new strategies for using it. In the next blog, we will dive deeper into machine learning.

To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

Meta’s Llama 3 models are now available in Amazon Bedrock

2024-04-23 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/metas-llama-3-models-are-now-available-in-amazon-bedrock/

Today, we are announcing the general availability of Meta’s Llama 3 models in Amazon Bedrock. Meta Llama 3 is designed for you to build, experiment, and responsibly scale your generative artificial intelligence (AI) applications. New Llama 3 models are the most capable to support a broad range of use cases with improvements in reasoning, code generation, and instruction.

According to Meta’s Llama 3 announcement, the Llama 3 model family is a collection of pre-trained and instruction-tuned large language models (LLMs) in 8B and 70B parameter sizes. These models have been trained on over 15 trillion tokens of data—a training dataset seven times larger than that used for Llama 2 models, including four times more code, which supports an 8K context length that doubles the capacity of Llama 2.

You can now use two new Llama 3 models in Amazon Bedrock, further increasing model choice within Amazon Bedrock. These models provide the ability for you to easily experiment with and evaluate even more top foundation models (FMs) for your use case:

Llama 3 8B is ideal for limited computational power and resources, and edge devices. The model excels at text summarization, text classification, sentiment analysis, and language translation.
Llama 3 70B is ideal for content creation, conversational AI, language understanding, research development, and enterprise applications. The model excels at text summarization and accuracy, text classification and nuance, sentiment analysis and nuance reasoning, language modeling, dialogue systems, code generation, and following instructions.

Meta is also currently training additional Llama 3 models over 400B parameters in size. These 400B models will have new capabilities, including multimodality, multiple languages support, and a much longer context window. When released, these models will be ideal for content creation, conversational AI, language understanding, research and development (R&D), and enterprise applications.

Llama 3 models in action
If you are new to using Meta models, go to the Amazon Bedrock console and choose Model access on the bottom left pane. To access the latest Llama 3 models from Meta, request access separately for Llama 3 8B Instruct or Llama 3 70B Instruct.

To test the Meta Llama 3 models in the Amazon Bedrock console, choose Text or Chat under Playgrounds in the left menu pane. Then choose Select model and select Meta as the category and Llama 8B Instruct or Llama 3 70B Instruct as the model.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. You can use model IDs such as meta.llama3-8b-instruct-v1 or meta.llama3-70b-instruct-v1.

Here is a sample of the AWS CLI command:

$ aws bedrock-runtime invoke-model \
  --model-id meta.llama3-8b-instruct-v1:0 \
  --body "{\"prompt\":\"Simply put, the theory of relativity states that\\n the laws of physics are the same everywhere in the universe, and that the passage of time and the length of objects can vary depending on their speed and position in a gravitational field \",\"max_gen_len\":512,\"temperature\":0.5,\"top_p\":0.9}" \
  --cli-binary-format raw-in-base64-out \
  --region us-east-1 \
  invoke-model-output.txt

You can use code examples for Amazon Bedrock using AWS SDKs to build your applications with various programming languages. The following Python code examples show how to invoke the Llama 3 Chat model in Amazon Bedrock for text generation.

def invoke_llama3(self, prompt):
        try:
            body = {
                "prompt": prompt,
                "temperature": 0.5,
                "top_p": 0.9,
                "max_gen_len": 512,
            }

            response = self.bedrock_runtime_client.invoke_model(
                modelId="meta.llama3-8b-instruct-v1:0", body=json.dumps(body)
            )

            response_body = json.loads(response["body"].read())
            completion = response_body["generation"]

            return completion

        except ClientError:
            logger.error("Couldn't invoke Llama 3")
            raise

You can use these Llama 3 models for a variety of tasks, such as question answering, language translation, sentiment analysis, and more. Here is an example of Llama 3 models with few-shot in-context learning where we provide training samples available to the model.

Input: Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
Output:  fromage
panda => panda
penguin => pingouin
polar bear => ours polaire
puffin => macareux
sloth => paresseux
walrus => morse
whale => baleine
zebra => z\u00e8bre

Translate English to German:
sea otter => Seeotter

You can also use Llama 3 instruct models optimized for dialogue use cases. The input to the instruct model endpoints is the previous history between the chat assistant and the user. So, you can ask questions contextual to the conversation that has happened so far and provide the system configuration, such as personas, which define the chat assistant’s behavior in the following format:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
[SYSTEM_TEXT]<|eot_id|><|start_header_id|>user<|end_header_id|>
[USER_TEXT]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

In the following example, the user has a conversation with the assistant about tourist sites in Paris. Then, the user inquires about the first option recommended by the chat assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n
I am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n
1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n
2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n
3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\n
These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n
What is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great:\n\n
1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).\n
2. **Panoramic views**: The Eiffel Tower offers breathtaking 360-degree views of the city from its observation decks. You can see many of Paris' famous landmarks, such as the Seine River, the Louvre, and Montmartre.\n
3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and is a popular spot for proposals, weddings, and romantic dinners. The tower's iron latticework and twinkling lights at night add to its romantic ambiance.\n
4. **Historical significance**: The Eiffel Tower has played a significant role in French history, serving as a communication tower during World War I and a symbol of French culture and engineering.\n
5. **Accessibility**: The Eiffel Tower is easily accessible by public transportation or on foot, and there are various ways to experience it, including taking the stairs, elevator, or a guided tour.\n
6. **Nighttime spectacle**: The Eiffel Tower is even more magical at night, when it's illuminated with thousands of twinkling lights. The light show, known as the \"Eiffel Tower Sparkle,\" takes place every hour on the hour from dusk to midnight.\n
7. **Iconic status**: The Eiffel Tower has been featured in countless films, literature, and art, making it an instantly recognizable symbol of Paris and France.\n\n
Overall, the Eiffel Tower is a must-visit attraction in Paris, offering a unique combination of history, engineering, romance, and stunning views.

To learn more about the new prompt template and special tokens of Llama 3, check out Meta’s model cards and prompt formats or Llama Recipes in the GitHub repository.

Now available
Meta’s Llama 3 models are available today in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) Regions. Check the full Region list for future updates. To learn more, check out the Llama in Amazon Bedrock product page and pricing page.

Give Llama 3 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions.

— Channy

Guardrails for Amazon Bedrock now available with new safety filters and privacy controls

2024-04-23 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/guardrails-for-amazon-bedrock-now-available-with-new-safety-filters-and-privacy-controls/

Today, I am happy to announce the general availability of Guardrails for Amazon Bedrock, first released in preview at re:Invent 2023. With Guardrails for Amazon Bedrock, you can implement safeguards in your generative artificial intelligence (generative AI) applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to diﬀerent use cases and apply them across multiple foundation models (FMs), improving end-user experiences and standardizing safety controls across generative AI applications. You can use Guardrails for Amazon Bedrock with all large language models (LLMs) in Amazon Bedrock, including fine-tuned models.

Guardrails for Bedrock offers industry-leading safety protection on top of the native capabilities of FMs, helping customers block as much as 85% more harmful content than protection natively provided by some foundation models on Amazon Bedrock today. Guardrails for Amazon Bedrock is the only responsible AI capability offered by top cloud providers that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution, and it works with all large language models (LLMs) in Amazon Bedrock, as well as fine-tuned models.

Aha! is a software company that helps more than 1 million people bring their product strategy to life. “Our customers depend on us every day to set goals, collect customer feedback, and create visual roadmaps,” said Dr. Chris Waters, co-founder and Chief Technology Officer at Aha!. “That is why we use Amazon Bedrock to power many of our generative AI capabilities. Amazon Bedrock provides responsible AI features, which enable us to have full control over our information through its data protection and privacy policies, and block harmful content through Guardrails for Bedrock. We just built on it to help product managers discover insights by analyzing feedback submitted by their customers. This is just the beginning. We will continue to build on advanced AWS technology to help product development teams everywhere prioritize what to build next with confidence.”

In the preview post, Antje showed you how to use guardrails to configure thresholds to filter content across harmful categories and define a set of topics that need to be avoided in the context of your application. The Content filters feature now has two additional safety categories: Misconduct for detecting criminal activities and Prompt Attack for detecting prompt injection and jailbreak attempts. We also added important new features, including sensitive information filters to detect and redact personally identifiable information (PII) and word filters to block inputs containing profane and custom words (for example, harmful words, competitor names, and products).

Guardrails for Amazon Bedrock sits in between the application and the model. Guardrails automatically evaluates everything going into the model from the application and coming out of the model to the application to detect and help prevent content that falls into restricted categories.

You can recap the steps in the preview release blog to learn how to configure Denied topics and Content filters. Let me show you how the new features work.

New features
To start using Guardrails for Amazon Bedrock, I go to the AWS Management Console for Amazon Bedrock, where I can create guardrails and configure the new capabilities. In the navigation pane in the Amazon Bedrock console, I choose Guardrails, and then I choose Create guardrail.

I enter the guardrail Name and Description. I choose Next to move to the Add sensitive information filters step.

I use Sensitive information filters to detect sensitive and private information in user inputs and FM outputs. Based on the use cases, I can select a set of entities to be either blocked in inputs (for example, a FAQ-based chatbot that doesn’t require user-specific information) or redacted in outputs (for example, conversation summarization based on chat transcripts). The sensitive information filter supports a set of predefined PII types. I can also define custom regex-based entities specific to my use case and needs.

I add two PII types (Name, Email) from the list and add a regular expression pattern using Booking ID as Name and [0-9a-fA-F]{8} as the Regex pattern.

I choose Next and enter custom messages that will be displayed if my guardrail blocks the input or the model response in the Define blocked messaging step. I review the configuration at the last step and choose Create guardrail.

I navigate to the Guardrails Overview page and choose the Anthropic Claude Instant 1.2 model using the Test section. I enter the following call center transcript in the Prompt field and choose Run.

Please summarize the below call center transcript. Put the name, email and the booking ID to the top: Agent: Welcome to ABC company. How can I help you today? Customer: I want to cancel my hotel booking. Agent: Sure, I can help you with the cancellation. Can you please provide your booking ID? Customer: Yes, my booking ID is 550e8408. Agent: Thank you. Can I have your name and email for confirmation? Customer: My name is Jane Doe and my email is [email protected] Agent: Thank you for confirming. I will go ahead and cancel your reservation.

Guardrail action shows there are three instances where the guardrails came in to effect. I use View trace to check the details. I notice that the guardrail detected the Name, Email and Booking ID and masked them in the final response.

I use Word filters to block inputs containing profane and custom words (for example, competitor names or offensive words). I check the Filter profanity box. The profanity list of words is based on the global definition of profanity. Additionally, I can specify up to 10,000 phrases (with a maximum of three words per phrase) to be blocked by the guardrail. A blocked message will show if my input or model response contain these words or phrases.

Now, I choose Custom words and phrases under Word filters and choose Edit. I use Add words and phrases manually to add a custom word CompetitorY. Alternatively, I can use Upload from a local file or Upload from S3 object if I need to upload a list of phrases. I choose Save and exit to return to my guardrail page.

I enter a prompt containing information about a fictional company and its competitor and add the question What are the extra features offered by CompetitorY?. I choose Run.

I use View trace to check the details. I notice that the guardrail intervened according to the policies I configured.

Now available
Guardrails for Amazon Bedrock is now available in US East (N. Virginia) and US West (Oregon) Regions.

For pricing information, visit the Amazon Bedrock pricing page.

To get started with this feature, visit the Guardrails for Amazon Bedrock web page.

For deep-dive technical content and to learn how our Builder communities are using Amazon Bedrock in their solutions, visit our community.aws website.

— Esra

Solution overview

Prerequisites

Implementation details

Prompt engineering

Build the Flink application

Deploy the AWS CDK stack

Set up and connect to OpenSearch Dashboards

Set up the streaming producer

Clean up

Conclusion

About the Authors

Solution overview

Data ingestion

Insights retrieval

Solution architecture

Amazon Bedrock chatbot UI

My Social Media UI

Consume data from X

Create vector embeddings

Store vector embeddings in OpenSearch Service

RetrievalQA using Lambda and LangChain implementation

Considerations

Summary

About the Authors

Solution overview

Prerequisites

Deploy the chat application using AWS CloudFormation

Run the notebook

Example questions and responses

Example 1: Historical data is available

Example 2: Historical data is not available

Example 3: Unrelated question and historical data is not available

Clean up

Conclusion

References

About the Authors

Hands-on Generative AI: AWS workshops

See you next time!

The collective thoughts of the interwebz