Tag Archives: generative AI

Introducing Amazon Nova: Frontier intelligence and industry leading price performance

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/

Today, we’re thrilled to announce Amazon Nova, a new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available exclusively in Amazon Bedrock.

You can use Amazon Nova to lower costs and latency for almost any generative AI task. You can build on Amazon Nova to analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents, from across a range of intelligence classes optimized for enterprise workloads.

Whether you’re developing document processing applications that need to process images and text, creating marketing content at scale, or building AI assistants that can understand and act on visual information, Amazon Nova provides the intelligence and flexibility you need with two categories of models: understanding and creative content generation.

Amazon Nova understanding models accept text, image, or video inputs to generate text output. Amazon creative content generation models accept text and image inputs to generate image or video output.

Understanding models: Text and visual intelligence
The Amazon Nova models include three understanding models (with a fourth one coming soon) designed to meet different needs:

Amazon Nova Micro – A text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat and brainstorming, and simple mathematical reasoning and coding. Amazon Nova Micro also supports customization on proprietary data using fine-tuning and model distillation to boost accuracy.

Amazon Nova Lite – A very low-cost multimodal model that is lightning fast for processing image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. The model processes inputs up to 300K tokens in length and can analyze multiple images or up to 30 minutes of video in a single request. Amazon Nova Lite also supports text and multimodal fine-tuning and can be optimized to deliver the best quality and costs for your use case with techniques such as model distillation.

Amazon Nova Pro – A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro is capable of processing up to 300K input tokens and sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. It achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and excels at analyzing financial documents. With an input context of 300K tokens, it can process code bases with over fifteen thousand lines of code. Amazon Nova Pro also serves as a teacher model to distill custom variants of Amazon Nova Micro and Lite.

Amazon Nova Premier – Our most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. We’re targeting availability in early 2025.

Amazon Nova understanding models excel in Retrieval-Augmented Generation (RAG), function calling, and agentic applications. This is reflected in Amazon Nova model scores in the Comprehensive RAG Benchmark (CRAG) evaluation, Berkeley Function Calling Leaderboard (BFCL), VisualWebBench, and Mind2Web.

What makes Amazon Nova particularly powerful for enterprises is its customization capabilities. Think of it as tailoring a suit: you start with a high-quality foundation and adjust it to fit your exact needs. You can fine-tune the models with text, image, and video to understand your industry’s terminology, align with your brand voice, and optimize for your specific use cases. For instance, a legal firm might customize Amazon Nova to better understand legal terminology and document structures.

You can see the latest benchmark scores for these models on the Amazon Nova product page.

Creative content generation: Bringing concepts to life
The Amazon Nova models also include two creative content generation models:

Amazon Nova Canvas – A state-of-the-art image generation model producing studio-quality images with precise control over style and content, including rich editing features such as inpainting, outpainting, and background removal. Amazon Nova Canvas excels on human evaluations and key benchmarks such as text-to-image faithfulness evaluation with question answering (TIFA) and ImageReward.

Amazon Nova Reel – A state-of-the-art video generation model. Using Amazon Nova Reel, you can produce short videos through text prompts and images, control visual style and pacing, and generate professional-quality video content for marketing, advertising, and entertainment. Amazon Nova Reel outperforms existing models on human evaluations of video quality and video consistency.

All Amazon Nova models include built-in safety controls and creative content generation models include watermarking capabilities to promote responsible AI use.

Let’s see how these models work in practice for a few use cases.

Using Amazon Nova Pro for document analysis
To demonstrate the capabilities of document analysis, I downloaded the Choosing a generative AI service decision guide in PDF format from the AWS documentation.

First, I choose Model access in the Amazon Bedrock console navigation pane and request access to the new Amazon Nova models. Then, I choose Chat/text in the Playground section of the navigation pane and select the Amazon Nova Pro model. In the chat, I upload the decision guide PDF and ask:

Write a summary of this doc in 100 words. Then, build a decision tree.

The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.

Console screenshot.

Using Amazon Nova Pro for video analysis
To demonstrate video analysis, I prepared a video by joining two short clips (more on this in the next section):

This time, I use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video:

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "the-sea.mp4"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
    video = f.read()

user_message = "Describe this video."

messages = [ { "role": "user", "content": [
    {"video": {"format": "mp4", "source": {"bytes": video}}},
    {"text": user_message}
] } ]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
    inferenceConfig={"temperature": 0.0}
 )

response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.

In the script, I ask to describe the video. I run the script from the command line. Here’s the result:

The video begins with a view of a rocky shore on the ocean, and then transitions to a close-up of a large seashell resting on a sandy beach.

I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.

Using Amazon Nova for video creation
Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image.

Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:

StartAsyncInvoke – To start an asynchronous invocation

GetAsyncInvoke – To get the current status of a specific asynchronous invocation

ListAsyncInvokes – To list the status of all asynchronous invocations with optional filters such as status or date

Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:

Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close.

After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.

import random
import time

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"

video_prompt = "Closeup of a large seashell in the sand. Gentle waves flow all around the shell. Sunset light. Camera zoom in very close."

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {"text": video_prompt},
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)

if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

I run the script:

Status: InProgress
. . .
Status: Completed

Video is ready at s3://BUCKET/PREFIX/output.mp4

After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI):

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4

This is the resulting video. As requested, the camera zooms in on the subject.

Using Amazon Nova Reel with a reference image
To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:

A seascape image.

This script uses the reference image and a text prompt with a camera action (drone view flying over a coastal landscape) to create a video:

import base64
import random
import time

import boto3

S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view flying over a coastal landscape"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
    input_image_bytes = f.read()
    input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")

model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {
        "text": video_prompt,
        "images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
        },
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"

print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)
if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Again, I download the output using the AWS CLI:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4

This is the resulting video. The camera starts from the reference image and moves forward.

Building AI responsibly
Amazon Nova models are built with a focus on customer safety, security, and trust throughout the model development stages, offering you peace of mind as well as an adequate level of control to enable your unique use cases.

We’ve built in comprehensive safety features and content moderation capabilities, giving you the controls you need to use AI responsibly. Every generated image and video include digital watermarking.

The Amazon Nova foundation models are built with protections that match its increased capabilities. Amazon Nova extends our safety measures to combat the spread of misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks.

Things to know
Amazon Nova models are available in Amazon Bedrock in the US East (N. Virginia) AWS region. Amazon Nova Micro, Lite, and Pro are also available in the US West (Oregon), and US East (Ohio) regions via cross-Region inference. As usual with Amazon Bedrock, the pricing follows a pay-as-you-go model. For more information, see Amazon Bedrock pricing.

The new generation of Amazon Nova understanding models speaks your language. These models understand and generate content in over 200 languages, with particularly strong capabilities in English, German, Spanish, French, Italian, Japanese, Korean, Arabic, Simplified Chinese, Russian, Hindi, Portuguese, Dutch, Turkish, and Hebrew. This means you can build truly global applications without worrying about language barriers or maintaining separate models for different regions. Amazon Nova models for creative content generation support English prompts.

As you explore Amazon Nova, you’ll discover its ability to handle increasingly complex tasks. You can use these models to process lengthy documents up to 300K tokens, analyze multiple images in a single request, understand up to 30 minutes of video content, and generate images and videos at scale from natural language. This makes these models suitable for a variety of business use cases, from quick customer service interactions to deep analysis of corporate documentation and asset creation for advertising, ecommerce, and social media applications.

Integration with Amazon Bedrock makes deployment and scaling straightforward. You can leverage features like Amazon Bedrock Knowledge Bases to enhance your model with proprietary information, use Amazon Bedrock Agents to automate complex workflows, and implement Amazon Bedrock Guardrails to promote responsible AI use. The platform supports real-time streaming for interactive applications, batch processing for high-volume workloads, and detailed monitoring to help you optimize performance.

Ready to start building with Amazon Nova? Give the new models a try in the Amazon Bedrock console today, visit the Amazon Nova models section of the Amazon Bedrock documentation, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new models!

Danilo

Preparing for take-off: Regulatory perspectives on generative AI adoption within Australian financial services

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/preparing-for-take-off-regulatory-perspectives-on-generative-ai-adoption-within-australian-financial-services/

The Australian financial services regulator, the Australian Prudential Regulation Authority (APRA), has provided its most substantial guidance on generative AI to date in Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024. The guidance gives a green light for banks, insurance companies, and superannuation funds to accelerate their adoption of this transformative technology, but reminded the financial services industry of the need for adequate guardrails to make sure that the benefits of generative AI don’t come at an unacceptable cost to the community.

Amazon Web Services (AWS) is committed to developing AI responsibly and strongly supports APRA’s message to proceed with generative AI adoption with appropriate guardrails implemented. AWS is at the forefront of generative AI research and innovation, and many of our financial services customers are already harnessing the benefits of our artificial intelligence (AI), machine learning (ML), and generative AI services. AWS is committed to the responsible development and use of AI so that we can help our customers achieve their business goals while meeting—and aiming to exceed—their regulators’ expectations.

A green light for AI, ML, and generative AI

APRA’s guidance, as outlined in APRA Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024, offers a clear pathway for adoption of AI, ML, and generative AI technologies by APRA-regulated entities. Ms. McCarthy Hockey says that there is “keen support” within APRA and across government for companies to realize the benefits of technology-led innovation, and she highlights the significant advantages that effective use of generative AI can deliver, such as improved productivity, cost efficiencies, more personalized customer experiences, and the ability to divert valuable resources to higher-level areas of need.

“Within APRA and across governments and regulators there is keen support for the realisation of tangible improvements through innovation.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

AWS financial services customers are starting to use more advanced AI for a variety of purposes, such as customer service, marketing, application development, fraud detection, and regulatory compliance. Specific use cases cited by APRA were the use of generative AI to rapidly review long documents against criteria such as policy requirements, use of generative AI-powered coding tools to produce better code faster, and creating generative AI bots to simulate customer testing of products and services. This is an extension of less sophisticated forms of AI which have been in operation for some time, with APRA citing internet chat bots and natural language processing as examples where businesses have already realized efficiencies by automating and speeding up manual or time-consuming processes.

APRA and other financial services regulators are experimenting internally with AI themselves. In Ms. McCarthy Hockey’s speech, she noted that APRA itself is using text analysis tools on an ongoing basis to review responses to APRA risk culture surveys, with the results helping APRA risk specialists direct focus to where it’s most required. APRA is also experimenting with natural language processing tools to review incident reporting data from regulated entities and to highlight incidents that are worthy of further investigation. This helps to reduce the human effort required by APRA staff and increase regulatory efficiency. Finally, APRA is collaborating with the Australian Securities and Investments Commission (ASIC) and the Reserve Bank of Australia (RBA) on a proof of concept to reduce the effort required to compare, analyze, and summarize the reams of documentation the three agencies must review as part of their regular entity supervision duties.

Risks must be understood and managed

APRA advocates for a prudent approach to experimentation with these technologies. As was the case with cloud adoption, organizations with more mature risk and data management capabilities will be able to move faster than those without.

“APRA’s message to the entities we regulate is that firm board oversight, robust technology platforms and strong risk management are essential for companies that want to begin experimenting with new ways of harnessing AI.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

APRA’s current regulatory framework is fit-for-purpose

APRA also made the specific point that its existing prudential framework remains fit-for-purpose for the increased uptake of AI, ML, and generative AI.

APRA’s primary focus is on governance, citing three key areas:

  1. Do boards have sufficient capability to determine an appropriate AI strategy and make sound risk management decisions? Are they able to effectively challenge management? What sort of learning and development programs are in train, and do the boards have access to external skills and advice if required?
  2. How mature is the risk culture? Is a risk management mindset embedded and functioning effectively across all three lines of defense? What controls and monitoring are in place to help prevent employees making unauthorized use of AI, ML, and generative AI tools?
  3. Is there adequate data quality and reliability? AI outputs depend directly on the quality of the inputs. APRA states that data management is an area where many regulated entities have a long way to go.

APRA also focuses on accountability, reminding regulated entities that as with any form of outsourcing or use of third-party services, the regulated entity retains accountability for the outputs of the AI, ML, and generative AI programs they deploy. There must always be a human in the loop: a person accountable for verifying that AI operates as intended. The level of human involvement can vary—for example, APRA does not suggest that a human should be involved in every AI decision made by a fraud detection service, but there should be a human who is accountable for the algorithm it runs, its operations, and the outcomes it drives.

How AWS is helping customers locally and globally use AI responsibly

From the outset, AWS has prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes, and continuously educating our employees. We extend this commitment through to our customers by designing services that help customers derive business value from AI in a safe and responsible way.

AWS collaborates with organizations such as the OECD AI working groups, the Partnership on AI, the Responsible AI Institute, and strategic partnerships with universities worldwide. In Australia, AWS collaborates with key institutions like the National AI Centre, CSIRO, the Australian Information Industry Association, and the Tech Council of Australia to provide insights on responsible AI adoption and to maximize the benefits of AI technology for the country. The recent Voluntary AI Safety Standard developed by the National AI Centre is the start of clear guidance for Australian organizations to follow, and AWS is engaging with Australia and other governments on the responsible use adoption and use of generative AI.

Recently, AWS has supported global financial services customers in critical areas such as risk management, financial crime prevention, and cybersecurity by using generative AI to analyze and respond to large data volumes in real-time. Verafin (a Nasdaq company) used Amazon Bedrock to improve anti-money laundering and fraud prevention processes. This application of AI enhances the effectiveness of financial crime management programs. Mastercard employs AWS AI and machine learning services to detect and prevent fraud while providing the most seamless customer experience possible.

Generative AI’s role in modernizing legacy systems is increasingly recognized, especially among Australian financial services customers who are undertaking transformation programs to reduce technology debt and enhance process resilience. CommBank, PEXA, and National Australia Bank (NAB) employ generative AI technology to improve speed, quality, and security when building and modifying applications.

How to implement responsible AI within your organization

The core dimensions of responsible AI at AWS align to the key regulatory considerations of both APRA and regulators globally:

  • Fairness – Considering impacts on different groups of stakeholders
  • Explainability – Understanding and evaluating system outputs
  • Privacy and security – Appropriately obtaining, using, and protecting data and models
  • Safety – Working to prevent harmful system output and misuse
  • Controllability – Having mechanisms to monitor and steer AI system behaviour
  • Veracity and robustness – Achieving correct system outputs, even with unexpected or adversarial inputs
  • Governance – Incorporating best practices into the AI supply chain, including providers and deployers
  • Transparency – Enabling stakeholders to make informed choices about their engagement with an AI system

Note that responsible AI is a continually evolving field. Customers can keep updated with developments in this area on our Responsible AI webpage.

The Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI provides extensive guidance, and serves as both a starting point and a guide to help customers meet, and in many cases exceed, regulatory expectations.

We have integrated features into our generative AI services to facilitate the application of responsible AI policies for organizations. For example, Amazon Bedrock Guardrails can help financial services organizations comply with APRA guidance on AI use in several key ways:

  1. Content filtering – Guardrails allows organizations to configure content filters to block harmful or inappropriate content in AI model inputs and outputs. This helps AI applications to adhere to with APRA’s expectations for responsible AI use.
  2. Topic restrictions – Organizations can define specific topics to be avoided in AI interactions. For example, a banking chatbot could be configured so it won’t provide investment advice, aligning with regulatory restrictions.
  3. Sensitive information protection – Guardrails can detect and redact personally identifiable information (PII) in AI inputs and outputs. This helps protect customer privacy and aids in compliance with data protection requirements.
  4. Custom word filters – Companies can set up lists of words or phrases to block, helping maintain appropriate communication.
  5. Contextual grounding checks – This feature helps detect and filter AI hallucinations in model responses where a reference source and a user query are provided, improving the accuracy and reliability of AI-generated responses. This aligns with APRA’s focus on making sure that AI systems provide accurate and trustworthy information.
  6. Customizable policies – Guardrails allows organizations to tailor AI safeguards to their specific needs and regulatory requirements, helping them align with APRA’s principles-based approach.
  7. Consistent safeguards – Guardrails can be applied across multiple AI models and applications, enabling a standardized approach to responsible AI use across the organization.
  8. Transparency and testing – The ability to test guardrails and iterate on configurations supports APRA’s expectations for due diligence and appropriate monitoring of AI systems.

We have a comprehensive user guide detailing how to implement, configure, and test Amazon Bedrock Guardrails.

AWS AI Service Cards also provide detailed information on AWS AI services, including intended use cases, limitations, and responsible AI design choices. This transparency helps financial institutions understand and responsibly use AI technologies.

APRA’s existing prudential standards do not set specific rules for managing AI/ML and generative AI risks. Instead, APRA outlines desired risk management outcomes, leaving it to each regulated entity to assess AI deployment risks and implement appropriate controls. AWS offers the User Guide to Financial Services Regulations and Guidelines in Australia to help customers meet APRA’s requirements.

Ultimately, the rate of AI, ML, and generative AI adoption amongst APRA-regulated entities will be determined by the risk appetite and risk management capability of individual entities. APRA openly encourages its regulated entities—our financial services customers—who are considering AI, ML, and generative AI experimentation and adoption to reach out to APRA directly and initiate dialogue. APRA is a highly experienced, knowledgeable, and approachable regulator, and will be able to provide valuable insights and guidance to regulated entities.

Conclusion and next steps

APRA’s messaging to industry is a significant milestone for AI, ML, and generative AI adoption in the Australian financial services industry. Boards, executives, and technology decision-makers should review APRA’s Risk Summit speech and consider APRA’s support for the adoption of these technologies when refining their strategies and plans.

AWS, and our AWS Partner Network, are experienced in working with financial services customers, and there are already a number of examples both internationally and locally where generative AI has been implemented to create value for our customers. AWS is ready to help our customers meet and exceed APRA’s risk management expectations.

Contact your AWS representative to discuss how the AWS solution architects, AWS Professional Services teams, AWS Training and Certification, and the AWS Partner Network can assist with your AI, ML, and generative AI adoption journey. If you don’t have an AWS representative, please contact us at https://aws.amazon.com/contact-us.
 

Julian Busic
Julian Busic

Julian is a Security Solutions Architect with a focus on regulatory engagement. He works with our customers, their regulators, and AWS teams to help customers raise the bar on secure cloud adoption and usage. Julian has over 15 years of experience working in risk and technology across the financial services industry in Australia and New Zealand.
Jamie Simon
Jamie Simon

Jamie leads AWS business within the banking and financial services industry across Australia and New Zealand, supporting financial services customers as they make use of the cloud to transform their business for a digital and AI-enabled future.
Warren Cammack
Warren Cammack

Warren supports AWS customers in applying the value of the AWS Cloud at scale, focusing on identifying and overcoming blockers to adoption. Currently he is leading the rollout of generative AI services to enable enterprises to benefit from the new technology in a safe, responsible, and effective manner.
Krish De
Krish De

Krish is a Principal Solutions Architect with a focus on financial services. He works with AWS customers, their regulators, and AWS teams to safely accelerate customers’ cloud adoption, with prescriptive guidance on governance, risk, and compliance. Krish has over 20 years of experience working in governance, risk, and technology across the financial services industry in Australia, New Zealand, and the United States.

Enhance your productivity with new extensions and integrations in Amazon Q Business

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/enhance-your-productivity-with-new-extensions-and-integrations-in-amazon-q-business/

Today, we’re announcing a new capability from Amazon Q Business to seamlessly access your assistant within popular web browsers and productivity tools. This helps you save time and complete your work and tasks more efficiently without having to leave your preferred applications.

Now, you can use Amazon Q Business directly from your web browser and other supported messaging and collaboration applications. You can quickly gather insights, review information, and ask questions. For example, you can effortlessly analyze and summarize content, get explanations on complex topics, or create meeting summaries without switching between applications.

Let’s get started
Let me walk you through how to get started with the new browser extensions and integrations. First, let’s look at the browser extensions. The following screenshot shows how it looks.

As an administrator, I need to enable the browser extensions for users of my Amazon Q Business application. To do that, I navigate to my Amazon Q Business application dashboard and select Integrations under the Enhancements section in the left navigation pane.

Then, on the Integrations page, select Edit in the Browser extensions section.

I select the available options in the Browsers section and choose Save. After I’ve enabled these options, my users will receive notification emails prompting them to install the extension.

Now, I’m switching to a user perspective of the Amazon Q Business application. I’ve received an email with a link to the Amazon Q Business web application. I visit the link and sign in to the Amazon Q Business web application. Here, I see a banner with information and a link to install the extension for my browser. I select the Install extension button.

Then, I navigate to the Chrome Web Store and install the browser extension.

After I have installed the browser extension, I sign in to my Amazon Q Business application using the same URL and credentials I use to access the web application.

Now, I can chat with Amazon Q Business apps whenever I visit any webpage. For example, I can ask it to summarize the current website for me.

The following image shows the result.

Application integration with Amazon Q Business
With Amazon Q Business, you can get AI-powered assistance and information not only when browsing, but also when collaborating with your teams. Now, you can integrate Amazon Q Business with supported third-party applications, making it an always-ready productivity and creativity teammate in your conversations.

To add third-party applications to Amazon Q Business, I need to navigate to the Integrations page and choose Add integration.

Here, I find all available integrations that I can use. For this demo, I select Slack.

I fill in all the required details, including the Slack workspace team ID, which you can obtain by following the steps outlined on the Slack documentation page.

After the integration is successfully created, I need to deploy this integration as a Slack bot. From the Integrations page, I select the integration and complete the integration process in the Slack platform. With all the required steps completed, now I can now add the app into my Slack workspace.

Here’s a quick video showing how I use this integration to interact with Amazon Q Business on Slack.

As someone who juggles multiple tools and platforms daily, this new capability unlocks various possibilities for me to improve my productivity. The ability to access AI assistance and perform cross-application tasks without leaving my current workspace helps me save time and maintain focus.

Additional things to know

  • Supported browser extensions – At launch, the Amazon Q Business browser extension supports Chromium-based web browsers such as Google Chrome and Microsoft Edge. It also supports the Mozilla Firefox web browser.
  • Application integration support – For third-party applications, at launch, Amazon Q Business integrations support Slack and Microsoft Teams.
  • Availability – This new capability is available in AWS Regions where Amazon Q Business is available.

Get started today and experience an exciting opportunity to enhance your productivity and streamline cross-application workflows. Learn more on the Amazon Q Business page.

Happy building,
Donnie

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/

Today, we’re announcing two new evaluation capabilities in Amazon Bedrock that can help you streamline testing and improve generative AI applications:

Amazon Bedrock Knowledge Bases now supports RAG evaluation (preview) – You can now run an automatic knowledge base evaluation to assess and optimize Retrieval Augmented Generation (RAG) applications using Amazon Bedrock Knowledge Bases. The evaluation process uses a large language model (LLM) to compute the metrics for the evaluation. With RAG evaluations, you can compare different configurations and tune your settings to get the results you need for your use case.

Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (preview) – You can now perform tests and evaluate other models with humanlike quality at a fraction of the cost and time of running human evaluations.

These new capabilities make it easier to go into production by providing fast, automated evaluation of AI-powered applications, shortening feedback loops and speeding up improvements. These evaluations assess multiple quality dimensions including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.

To make it easy and intuitive, the evaluation results provide natural language explanations for each score in the output and on console, and the scores are normalized from 0 to 1 for ease of interpretability. Rubrics are published in full with the judge prompts in the documentation so non-scientists can understand how scores are derived.

Let’s see how they work in practice.

Using RAG evaluations in Amazon Bedrock Knowledge Bases
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section. There, I see the new Knowledge Bases tab.

Console screenshot.

I choose Create, enter a name and a description for the evaluation, and select the Evaluator model that will compute the metrics. In this case, I use Anthropic’s Claude 3.5 Sonnet.

Console screenshot.

I select the knowledge base to evaluate. I previously created a knowledge base containing only the AWS Lambda Developer Guide PDF file. In this way, for the evaluation, I can ask questions about the AWS Lambda service.

I can evaluate either the retrieval function alone or the complete retrieve-and-generate workflow. This choice affects the metrics that are available in the next step. I choose to evaluate both retrieval and response generation and select the model to use. In this case, I use Anthropic’s Claude 3 Haiku. I can also use Amazon Bedrock Guardrails and adjust runtime inference settings by choosing the configurations link after the response generator model.

Console screenshot.

Now, I can choose which metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

Console screenshot.

Now, I select the dataset that will be used for evaluation. This is the JSONL file I prepared and uploaded to Amazon Simple Storage Service (Amazon S3) for this evaluation. Each line provides a conversation, and for each message there is a reference response.

{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"A trigger is a resource or configuration that invokes a Lambda function such as an AWS service."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda trigger?"}]}}]}
{"conversationTurns":[{"referenceResponses":[{"content":[{"text":"An event is a JSON document defined by the AWS service or the application invoking a Lambda function that is provided in input to the Lambda function."}]}],"prompt":{"content":[{"text":"What is an AWS Lambda event?"}]}}]}

I specify the S3 location in which to store the results of the evaluation. The evaluation job requires that the S3 bucket is configured with the cross-origin resource sharing (CORS) permissions described in the Amazon Bedrock User Guide.

For service access, I need to create or provide an AWS Identity and Access Management (IAM) service role that Amazon Bedrock can assume and that allows access to the Amazon Bedrock and Amazon S3 resources used by the evaluation.

After a few minutes, the evaluation has completed, and I browse the results. The actual duration of an evaluation depends on the size of the prompt dataset and on the generator and the evaluator models used.

At the top, the Metric summary evaluates the overall performance using the average score across all conversations.

Console screenshot.

After that, the Generation metrics breakdown gives me details about each of the selected evaluation metrics. My evaluation dataset was small (two lines), so there isn’t a large distribution to look at.

From here, I can also see example conversations and how they were rated. To view all conversations, I can visit the full output in the S3 bucket.

I’m curious why Helpfulness is slightly below one. I expand and zoom Example conversations for Helpfulness. There, I see the generated output, the ground truth that I provided with the evaluation dataset, and the score. I choose the score to see the model reasoning. According to the model, it would have helped to have more in-depth information. Models really are strict judges.

Console screenshot.

Comparing RAG evaluations
The result of a knowledge base evaluation can be difficult to interpret by itself. For this reason, the console allows comparing results from multiple evaluations to understand the differences. In this way, you can understand if you’re improving or not for the metrics you care about.

For example, I previously ran two other knowledge base evaluations. They’re related to knowledge bases with the same data sources but different chunking and parsing configurations and different embedding models.

I select the two evaluations and choose Compare. To be comparable in the console, the evaluations need to cover the same metrics.

Console screenshot.

In the At a glance tab, I see a visual comparison of the metrics using a spider chart. In this case, the results are not much different. The main difference is the Faithfulness score.

Console screenshot.

In the Evaluation details tab, I find a detailed comparison of the results for each metric, including the difference in scores.

Console screenshot.

Using LLM-as-a-judge in Amazon Bedrock Model Evaluation (preview)
In the Amazon Bedrock console, I choose Evaluations in the Inference and Assessment section of the navigation pane. After I choose Create, I select the new Automatic: Model as a judge option.

I enter a name and a description for the evaluation and select the Evaluator model that is used to generate evaluation metrics. I use Anthropic’s Claude 3.5 Sonnet.

Console screenshot.

Then, I select the Generator model, which is the model I want to evaluate. Model evaluation can help me understand if a smaller and more cost-effective model meets the needs of my use case. I use Anthropic’s Claude 3 Haiku.

Console screenshot.

In the next section I select the Metrics to evaluate. I select Helpfulness and Correctness in the Quality section and Harmfulness in the Responsible AI metrics section.

Console screenshot.

In the Datasets section I specify the Amazon S3 location where my evaluation dataset is stored and the folder in an S3 bucket where the results of the model evaluation job are stored.

For the evaluation dataset, I prepared another JSONL file. Each line provides a prompt and a reference answer. Note that the format is different compared to knowledge base evaluations.

{"prompt":"Write a 15 words summary of this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"AWS Fargate allows running containers without managing servers or clusters, simplifying container deployment and scaling."}
{"prompt":"Give me a list of the top 3 benefits from this text:\n\nAWS Fargate is a technology that you can use to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, or scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.","referenceResponse":"- No need to manage servers or clusters.\n- Simplified infrastructure management.\n- Improved focus on application development."}

Finally, I can choose an IAM service role that gives Amazon Bedrock access to the resources used by this evaluation job.

I complete the creation of the evaluation. After a few minutes, the evaluation is complete. Similar to the knowledge base evaluation, the result starts with a Metrics Summary.

The Generation metrics breakdown details each metric, and I can look at details for a few sample prompts. I look at Helpfulness to better understand the evaluation score.

Console screenshot.

The prompts in the evaluation have been correctly processed by the model, and I can apply the results for my use case. If my application needs to manage prompts similar to the ones used in this evaluation, the evaluated model is a good choice.

Things to know
These new evaluation capabilities are available in preview in the following AWS Regions:

  • RAG evaluation in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris), and South America (São Paulo)
  • LLM-as-a-judge in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Seoul, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Zurich), and South America (São Paulo)

Note that the available evaluator models depend on the Region.

Pricing is based on the standard Amazon Bedrock pricing for model inference. There are no additional charges for evaluation jobs themselves. The evaluator models and models being evaluated are billed according to their normal on-demand or provisioned pricing. The judge prompt templates are part of the input tokens, and those judge prompts can be found in the AWS documentation for transparency.

The evaluation service is optimized for English language content at launch, though the underlying models can work with content in other languages they support.

To get started, visit the Amazon Bedrock console. To learn more, you can access the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

Danilo

Introducing new PartyRock capabilities and free daily usage

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-new-partyrock-capabilities-and-free-daily-usage/

PartyRock is an Amazon Bedrock playground that anyone can use to create generative AI-powered applications by simply describing the app you want to build without the need to write any code.

Since its launch in November 2023, over half a million apps have been built by users worldwide. These apps range from simple text generators to sophisticated productivity tools that combine multiple AI capabilities.

Throughout this year, we observed that as PartyRock users build skills and intuition by using the playground, they find interesting and useful ways to build apps for improving their daily lives. PartyRock apps increased their individual productivity, and they returned to PartyRock to use them regularly.

Today, we’re introducing improvements meeting the most requested customer needs:

Free daily usage – Previously, PartyRock offered a free trial period for a limited time. Starting in 2025, all users will have a recurring free daily usage granted, with no credit card required.

Search the app catalog – You can now explore hundreds of thousands of apps in the PartyRock catalog and find the right app for your use case by category or functionality. Relevant and popular apps are highlighted to showcase the creativity of the community. Results include app previews and last modified date to help you pick what’s best for you.

Do more with docs – You can upload and process multiple documents simultaneously, making it easier to build apps that handle batch processing, document comparison, or content aggregation.

Let’s see these some new features in action.

Searching and remixing a PartyRock app
I open PartyRock and sign in with my social credentials. In the Home section, I can use the search box to look for apps for a specific use case. I love traveling, and I’d like to improve the way I share my trips with family and friends. I enter travel and vlog in the search box. In the search results, I see an app that gets my attention.

Console screenshot.

I choose the Travel vlog script writer app and open it in a browser tab. The app generates a travel log script starting from a few inputs: the destination, the itinerary, and the tone.

I like to prepare some travel notes before a trip so that I know what the options are and what I want to visit. What if I can upload my notes and other documents to better personalize the vlog?

One of the key capabilities of PartyRock is that I can start with an existing app and “remix” it to tailor it to my needs. The resulting app can then be shared for others to use.

I choose Remix and then Edit to customize this app. I add a Document widget and edit it:

  • For Widget title, I use Notes.
  • For Instruction, I enter Upload your notes and documents with travel tips.

I save the new widget and move just after the other input fields.

To use these images in the app, I edit the Your Vlog script widget. I want the script to include the content of those images. In the prompt generating the script, I add a sentence to analyze and consider the image of the destination:

Get inspiration from what you see in @Notes.

I also update the Vlog cover widget prompt to consider the whole script when generating the cover image:

A portrait of a trip to @Destination considering the @Your Vlog script.

I save and leave edit. The remixed app is now ready to be tested.

Using the remixed PartyRock app
Let’s try the customized version of the app. I enter:

  • Rome, Italy as Destination
  • A walk in the old city center as Itinerary
  • Peaceful and relaxing as Tone.

Then, I upload my travel notes.

Console screenshot.

I choose the Play button to start the app. The app takes a few seconds to generate its output.

Console screenshot.

I like the result. The script is quite detailed, and the image cover a nice addition. I can further extend the app to use the image cover in a social media post generator for posting about the vlog to different platforms with different tones and styles. The possibilities are endless!

Things to know
PartyRock with these new capabilities is available at https://partyrock.aws.

No credit card or AWS account is required to use PartyRock, and you can explore hundreds of thousands of published apps even without signing in.

With PartyRock, everyone can become a builder. Apps can be generated from a textual description and then customized and extended with additional capabilities using the visual editor. All apps are automatically optimized for mobile devices and can be shared with others. To make it easier for others to view and use your apps, you can create your personalized playlist page.

For examples of how PartyRock can help you be more productive, refer to How 3 small businesses use PartyRock to help customers. And don’t forget to share your best apps with me!

Danilo

Exploring the benefits of artificial intelligence while maintaining digital sovereignty

Post Syndicated from Max Peterson original https://aws.amazon.com/blogs/security/exploring-benefits-of-artificial-intelligence-while-maintaining-digital-sovereignty/

Around the world, organizations are evaluating and embracing artificial intelligence (AI) and machine learning (ML) to drive innovation and efficiency. From accelerating research and enhancing customer experiences to optimizing business processes, improving patient outcomes, and enriching public services, the transformative potential of AI is being realized across sectors. Although using emerging technologies helps drive positive outcomes, leaders worldwide must balance these benefits with the need to maintain security, compliance, and resilience. Many organizations, including those in the public sector and regulated industries, are investing in generative AI applications powered by large language models (LLMs) and other foundation models (FMs) because these applications can transform and scale their work and provide better experiences for customers. Beyond computing power, unlocking this AI potential resides in the AI applications that organizations can create based on a variety of AI/ML development services, models, and data sources. Organizations must navigate the complexity of building AI applications in light of existing and emerging regulatory regimes while verifying that their AI applications and related data are secure, protected, and resilient to risks and threats.

AWS offers a wide range of AI/ML services and capabilities, built on our sovereign-by-design foundation, that are making it simpler for our customers to meet their digital sovereignty needs while getting the security, control, compliance, and resilience that they need. For example, Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon SageMaker provides tools and infrastructure to build, train, and deploy ML models at scale while supporting responsible AI with governance controls and access to pretrained models.

Innovating securely across the AI lifecycle

Security is and always has been our top priority at AWS. AWS customers benefit from our ongoing investment in data centers, networks, custom hardware, and secure software services, built to satisfy the requirements of the most security-sensitive organizations, including the government, healthcare, and financial services. We have always believed that it is essential that customers have control over their data and its location. That’s why we architected the AWS Cloud to be secure and sovereign-by-design from day one. We remain committed to giving our customers more control and choice so that they can use the full power of AWS while meeting their unique digital sovereignty needs.

As organizations develop and implement generative AI, they want to make sure that their data and applications are secured across the AI lifecycle, including data preparation, training, and inferencing. To help ensure the confidentiality and integrity of customer data, all of our Nitro-based Amazon Elastic Compute Cloud (Amazon EC2) instances that run ML accelerators such as AWS Inferentia and AWS Trainium, and graphics processing units (GPUs) such as P4, P5, G5, and G6, are backed by the industry-leading security capabilities of the AWS Nitro System. By design, there is no mechanism for anyone at AWS to access Nitro EC2 instances that customers use to run their workloads. The NCC Group, an independent cybersecurity firm, has validated the design of the Nitro System.

We take a secure approach to generative AI and make it practical for our customers to secure their generative AI workloads across the generative AI stack so that they can focus on building and scaling. All AWS services—including generative AI services—support encryption, and we continue to innovate and invest in controls and encryption features that allow our customers to encrypt everything everywhere.

For example, Amazon Bedrock uses encryption to protect data in transit and at rest, and data remains in the AWS Region where Amazon Bedrock is being used. Customer data, such as prompts, completions, custom models, and data used for fine-tuning or continued pre-training, is not used for Amazon Bedrock service improvement and is never shared with third-party model providers. When customers fine-tune a model in Amazon Bedrock, the data is never exposed to the public internet, never leaves the AWS network, is securely transferred through a customer’s virtual private cloud (VPN), and is encrypted in transit and at rest.

SageMaker protects ML model artifacts and other system artifacts by encrypting data in transit and at rest. Amazon Bedrock and SageMaker integrate with AWS Key Management Service (AWS KMS) so that customers can securely manage cryptographic keys. AWS KMS is designed so that no one—not even AWS employees—can retrieve plaintext keys from the service.

Developing responsibly

The responsible development and use of AI is a priority for AWS. We believe that AI should take a people-centric approach that makes AI safe, fair, secure, and robust. We are committed to supporting customers with responsible AI and helping them build fairer and more transparent AI applications to foster trust, meet regulatory requirements, and use AI to benefit their business and stakeholders. AWS is the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements and controls for organizations to promote the responsible development and use of AI systems.

We take responsible AI from theory into practice by providing the necessary tools, guidance, and resources, including Amazon Bedrock Guardrails to help implement safeguards tailored to customer generative AI applications and aligned with their responsible AI policies, or Model Evaluation on Amazon Bedrock to evaluate, compare, and select the best FMs for specific use cases based on custom metrics, such as accuracy, robustness, and toxicity. Additionally, Amazon SageMaker Model Monitor automatically detects and alerts customers of inaccurate predictions from deployed models. We continue to publish AI Service Cards to enhance transparency by providing a single place to find information on the intended use cases and limitations, responsible AI design choices, and performance optimization best practices for our AI services and models.

Building resilience

Resilience plays a pivotal role in the development of any workload, and AI/ML workloads are no different. Customers need to know that their workloads in the cloud will continue to operate in the face of natural disasters, network disruptions, or disruptions due to geopolitical crises. AWS delivers the highest network availability of any cloud provider and is the only cloud provider to offer three or more Availability Zones (AZs) in all Regions, providing more redundancy. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. We have published guidance on designing generative AI workloads for resilience. To enable higher throughput and enhanced resilience during periods of peak demands in Amazon Bedrock, customers can use cross-region inference to distribute traffic across multiple Regions. For customers with specific European Union data sovereignty requirements, we are launching the AWS European Sovereign Cloud in 2025 to offer an additional layer of control and resilience.

Supporting choice and flexibility

It’s important that customers have access to diverse AI technologies, while having the freedom to choose the right solutions to meet their needs. AWS provides more diversity, choice, and flexibility so that customers can select the AI solution that best aligns with their specific requirements, whether that’s using open-source models, proprietary solutions, or their own custom AI models. For example, we understand the importance of open-source AI in fostering transparency, collaboration, and rapid innovation. Open-source models enable scrutiny of vulnerabilities, drive security improvements, and support research on AI safety. Amazon SageMaker JumpStart provides pretrained, open-source models for a wide range of common use cases. To provide practitioners and developers with the guidance and tools that they need to create secure-by-design AI systems, we are a founding member of the open-source initiative Coalition for Secure AI (CoSAI).

Also, our commitment to portability and interoperability helps ensure that customers can move easily between environments. For customers changing IT providers, we’ve taken concrete steps to lower costs, and AWS is actively engaged in efforts to facilitate switching between cloud providers, including through our support of the Cloud Infrastructure Service Providers in Europe (CISPE) Cloud Switching Framework, which lays out guidance to assist providers and customers in the switching process. This gives organizations the flexibility to adapt their cloud and AI strategies as their needs evolve.

We remain committed to providing customers with a choice of diverse AI technologies, along with secure and compliant ways to build their AI applications throughout the development lifecycle. Through this approach, customers can enhance the security, compliance, and resilience of their systems.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Max Peterson
Max Peterson

Max is the Vice President of AWS Sovereign Cloud. He leads efforts to ensure that AWS customers around the world have the most advanced set of sovereignty controls, privacy safeguards, and security features available in the cloud. Previously, Max served as the VP of AWS Worldwide Public Sector (WWPS) and created and led the WWPS International Sales division, with a focus on empowering government, education, healthcare, aerospace and satellite, and nonprofit organizations to drive rapid innovation while meeting evolving compliance, security, and policy requirements. Max has over 30 years of public sector experience and served in other technology leadership roles before joining Amazon. Max has earned both a Bachelor of Arts in Finance and Master of Business Administration in Management Information Systems from the University of Maryland.

Supercharging LLM Application Development with LLM-Kit

Post Syndicated from Grab Tech original https://engineering.grab.com/supercharging-llm-application-development-with-llm-kit

Introduction

At Grab, we are committed to leveraging the power of technology to deliver the best services to our users and partners. As part of this commitment, we have developed the LLM-Kit, a comprehensive framework designed to supercharge the setup of production-ready Generative AI applications. This blog post will delve into the features of the LLM-Kit, the problems it solves, and the value it brings to our organisation.

Challenges

The introduction of the LLM-Kit has significantly addressed the challenges encountered in LLM application development. The involvement of sensitive data in AI applications necessitates that security remains a top priority, ensuring data safety is not compromised during AI application development.

Concerns such as scalability, integration, monitoring, and standardisation are common issues that any organisation will face in their LLM and AI development efforts.

The LLM-Kit has empowered Grab to pursue LLM application development and the rollout of Generative AI efficiently and effectively in the long term.

Introducing the LLM-Kit

The LLM-Kit is our solution to these challenges. Since the introduction of the LLM Kit, it has helped onboard hundreds of GenAI applications at Grab and has become the de facto choice for developers. It is a comprehensive framework designed to supercharge the setup of production-ready LLM applications. The LLM-Kit provides:

  • Pre-configured structure: The LLM-Kit comes with a pre-configured structure containing an API server, configuration management, a sample LLM Agent, and tests.
  • Integrated tech stack: The LLM-Kit integrates with Poetry, Gunicorn, FastAPI, LangChain, LangSmith, Hashicorp Vault, Amazon EKS, and Gitlab CI pipelines to provide a robust and end-to-end tech stack for LLM application development.
  • Observability: The LLM-Kit features built-in observability with Datadog integration and LangSmith, enabling real-time monitoring of LLM applications.
  • Config & secret management: The LLM-Kit utilises Python’s configparser and Vault for efficient configuration and secret management.
  • Authentication: The LLM-Kit provides built-in OpenID Connect (OIDC) auth helpers for authentication to Grab’s internal services.
  • API documentation: The LLM-Kit features comprehensive API documentation using Swagger and Redoc.
  • Redis & vector databases integration: The LLM-Kit integrates with Redis and Vector databases for efficient data storage and retrieval.
  • Deployment pipeline: The LLM-Kit provides a deployment pipeline for staging and production environments.
  • Evaluations: The LLM-Kit seamlessly integrates with LangSmith, utilising its robust evaluations framework to ensure the quality and performance of the LLM applications.

In addition to these features, the team has also included a cookbook with many commonly used examples within the organisation providing a valuable resource for developers. Our cookbook includes a diverse range of examples, such as persistent memory agents, Slackbot LLM agents, image analysers and full-stack chatbots with user interfaces, showcasing the versatility of the LLM-Kit.

The value of the LLM-Kit

The LLM-Kit brings significant value to our teams at Grab:

  • Increased development velocity: By providing a pre-configured structure and integrated tech stack, the LLM-Kit accelerates the development of LLM applications.
  • Improved observability: With built-in LangSmith and Datadog integration, teams can monitor their LLM applications in real-time, enabling faster issue detection and resolution.
  • Enhanced security: The LLM-Kit’s built-in OIDC auth helpers and secret management using Vault ensure the secure development and deployment of LLM applications.
  • Efficient data management: The integration with Vector databases facilitates efficient data storage and retrieval, crucial for the performance of LLM applications.
  • Standardisation: The LLM-Kit provides a paved-road framework for building LLM applications, promoting best practices and standardisation across teams.

Through the LLM-Kit, we can save an estimate of 1.5 weeks before teams start working on their first feature.

Figure 1. Project development process before LLM-Kit
Figure 2. Project development process after LLM-Kit

Architecture design and technical implementation

The LLM-Kit is designed with a modular architecture that promotes scalability, flexibility, and ease of use.

Figure 3. LLM-Kit modules

Automated steps

To better illustrate the technical implementation of the LLM-Kit, let’s take a look at figure 4 which outlines the step-by-step process of how an LLM application is generated with the LLM-Kit:

Figure 4. Process of generating LLM apps using LLM-Kit

The process begins when an engineer submits a form with the application name and other relevant details. This triggers the creation of a GitLab project, followed by the generation of a code scaffold specifically designed for the LLM application. GitLab CI files are then generated within the same repository to handle continuous integration and deployment tasks. The process continues with the creation of staging infrastructure, including components like Elastic Container Registry (ECR) and Elastic Kubernetes Service (EKS). Additionally, a Terraform folder is created to provision the necessary infrastructure, eventually leading to the deployment of production infrastructure. At the end of the pipeline, a GPT token is pushed to a secure Vault path, and the engineer is notified upon the successful completion of the pipeline.

Scaffold code structure

The scaffolded code is broken down into multiple folders:

  1. Agents: Contains the code to initialise an agent. We have gone ahead with LangChain as the agent framework; essentially the entry point for the endpoint defined in the Routes folder.
  2. Auth: Authentication and authorisation module for executing some of the APIs within Grab.
  3. Core: Includes extracting all configurations (i.e. GPT token) and secret decryption for running the LLM application.
  4. Models: Used to define the structure for the core LLM APIs within Grab.
  5. Routes: REST API endpoint definitions for the LLM Applications. It comes with health check, authentication, authorisation, and a simple agent by default.
  6. Storage: Includes connectivity with PGVector, our managed vector database within Grab and database schemas.
  7. Tools: Functions which are used as tools for the LLM Agent.
  8. Tracing: Integration with our tracing and monitoring tools to monitor various metrics for a production application.
  9. Utils: Default folder for utility functions.
Figure 5. Scaffold code structure

Infrastructure provisioning and deployment

Within the same codebase, we have integrated a comprehensive pipeline that automatically scaffolds the necessary code for infrastructure provisioning, deployment, and build processes. Using Terraform, the pipeline provisions the required infrastructure seamlessly. The deployment pipelines are defined in the .gitlab-ci.yml file, ensuring smooth and automated deployments. Additionally, the build process is specified in the Dockerfile, allowing for consistent builds. This automated scaffolding streamlines the development workflow, enabling developers to focus on writing business logic without worrying about the underlying infrastructure and deployment complexities.

Figure 6. Pipeline infrastructure

RAG scaffolding

At Grab, we’ve established a streamlined process for setting up a vector database (PGVector) and whitelisting the service using the LLM-Kit. Once the form (figure 7) is submitted, you can access the credentials and database host path. The secrets will be automatically added to the Vault path. Engineers will then only need to include the DB host path in the configuration file of the scaffolded LLM-Kit application.

Figure 7. Form submitted to access credentials and database host path

Conclusion

The LLM-Kit is a testament to Grab’s commitment to fostering innovation and growth in AI and ML. By addressing the challenges faced by our teams and providing a comprehensive, scalable, and flexible framework for LLM application development, the LLM-Kit is paving the way for the next generation of AI applications at Grab.

Growth and future plans

Looking ahead, the LLM-Kit team aims to significantly enhance the web server’s concurrency and scalability while providing reliable and easy-to-use SDKs. The team plans to offer reusable and composable LLM SDKs, including evaluation and guardrails frameworks, to enable service owners to build feature-rich Generative AI programs with ease. Key initiatives also include the development of a CLI for version updates and dev tooling, as well as a polling-based agent serving function. These advancements are designed to drive innovation and efficiency within the organisation, ultimately providing a more seamless and efficient development experience for engineers.

We would like to acknowledge and thank Pak Zan Tan, Han Su, and Jonathan Ku from the Yoshi team and Chen Fei Lee from the MEKS team for their contribution to this project under the leadership of Padarn George Wilson.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How SmugMug Increased Data Modeling Productivity with Amazon Q Developer

Post Syndicated from Will Matos original https://aws.amazon.com/blogs/devops/how-smugmug-increased-data-modeling-productivity-with-amazon-q-developer/

This post is co-written with Dr. Geoff Ryder, Manager, at SmugMug.

Introduction

SmugMug operates two very large online photo platforms: SmugMug and Flickr. These platforms enable more than 100 million customers to safely store, search, share, and sell tens of billions of photos every day. However, the data science and engineering team at SmugMug and Flickr often faces complex data modeling challenges that require significant time to resolve.

These challenges arise due to several factors. First, the team has to contend with diverse datasets from different sources. Additionally, the database schema and tables are highly complex, and the team needs to quickly understand application (PHP) code and database table structures in order to generate the necessary complex database queries. Specifically, SmugMug uses Amazon Redshift as its cloud data warehouse to analyze patterns in petabyte-scale data stored in Amazon S3, as well as transactional data in Amazon Aurora and Amazon DynamoDB. This allows them to generate dozens of business reports daily.

However, the complexity increases further as many database tables also need to be imported from third-party organizations into Amazon Redshift, where they are joined with SmugMug and Flickr’s internal tables. In extreme cases, properly modeling all these database tables and handling issues like granularity, cardinality, timestamps and missing data could take years – an impractical timeline for the business. We are excited to walk through SmugMug’s data modeling use cases and how SmugMug uses Amazon Q Developer to improve the data science and engineering team’s productivity.

Discovering Amazon Q Developer

SmugMug was one of the first customers to pilot Amazon Q Developer (previously Amazon CodeWhisperer), the most capable AI-powered assistant for software development that re-imagines the experience across the entire software development lifecycle, making it easier and faster to build, secure, manage, optimize, operate, and transform applications on AWS. There are multiple Amazon Q Developer use cases at SmugMug and Flickr, such as using Amazon Q Developer agent (/dev) for software development (i.e. generating implementation plans and the accompanying code), generating inline code suggestions, asking Amazon Q Developer in chat about AWS services and best practices, and analyzing AWS usage and costs for Cloud Financial Management (CFM) needs. For the data science and engineering team specifically, the key feature is chatting with Amazon Q Developer in integrated development environments (IDEs) like Intellij DataGrip. The data analysts and data scientists at SmugMug and Flickr ask questions in Amazon Q Developer chat to analyze database schemas, generate data model diagrams from DDL (Data Definition Language) statements, convert queries between languages, automatically generate complex database queries for data analysis, generate code to validate table contents, and predict trends using ML (Machine Learning).

Implementing Amazon Q Developer

To solve the data modeling challenges SmugMug faced, the team collaborated closely with their AWS Account Team, AWS Professional Services, and the Amazon Q Developer service team to create and test a data modeling assistant solution using Amazon Q Developer.

As a first step, the data modeler needs to bring the right metadata to bear. For simpler cases, the commands “show view myschema.v” or “show table myschema.t“ retrieve DDL schema information about the specified view or table from Amazon Redshift into the IDE console.

Here’s an example using simulated data for a hypothetical company. For this typical company that handles orders for products, the result of typing “show table sample.orderinfo” and “show table sample.skuinfo”might be:

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.skuinfo ( sku_id bigint ENCODE raw, sku_vendor bigint ENCODE az64, sku_category character varying(18) ENCODE lzo, sku_description character varying(255) ENCODE lzo, date_sku_created timestamp without time zone ENCODE az64, date_sku_updated timestamp without time zone ENCODE az64, pipeline_inserted_at timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( sku_id );"

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.orderinfo ( order_id bigint ENCODE raw, shipper_id bigint ENCODE az64 distkey, product_id bigint ENCODE az64, quantity_ordered integer ENCODE az64, date_order_placed timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( order_id );"

This DDL text is now in the open tab. By selecting the text to highlight it, that DDL text becomes part of the context that Amazon Q Developer sees. The modeler can start asking questions about them in the Amazon Q Developer chat window in the IDE.

Diagram showing what is considered part of the context included in a request including the RAG query result, related documents when using the at-workspace key word, the highlighted text in the IDE open tab,the chat history, and the prompt.

In complex scenarios, establishing the correct modeling context requires a combination of schema information, legacy SQL, application source code in various programming languages, sample values, and natural language documentation. Amazon Q Developer addresses this by creating a local index of relevant files and content. When a question is asked using @workspace, this index is consulted to identify and include pertinent sections of code and information in the request. (See this article for additional details on workspace). The prompt plays a crucial role in measuring similarity, so providing comprehensive context within it is essential. To optimize this process, the IDE settings feature a tunable workspace index function, allowing for enhanced performance in identifying and incorporating relevant context.

Image showing the Amazon Q Settings window where you enable the Workspace feature by checking the "Workspace index" box. You can also change the number of worker threads used, and the maximum workspace index size in MB.

Workspace Index Settings

By adopting Amazon Q Developer as a team, we are able to jointly develop and share proprietary prompt text to address the four steps in our modeling process, as follows.

Step 1. Define the goal for the data modeling project

From prior knowledge, sketch a high-level goal for a data model. Gather the data for it manually, or by e.g. querying a vector database and adding its documents to the project.

For this example, we choose as the goal to compute aggregated metrics from a new table or view composed of two existing tables, sample.orderinfo and sample.skuinfo. These contain simulated data about product sales that are common to many companies. The order table is in the style of a fact table that logs customer orders, and the stock keeping unit (SKU) table is a dimension table that provides additional data points of interest about each order. The order and SKU information need to be combined by a join operation before we can compute the metrics. We would like Amazon Q Developer to tell us how to write that SQL join statement.

Step 2. Conduct an exploratory analysis and generate candidates

Next, prompt Amazon Q Developer for candidate foreign keys to join the tables, and for SQL code to execute those joins. Generate an entity-relationship diagram (ERD) as a visual aid. Prompts do not have to be complicated. For example:

@workspace What columns of database tables sample.orderinfo and sample.skuinfo 
would be best to join the two tables? Provide SQL code for the join. Draw an 
entity relationship diagram that shows the joins between the two tables, and 
includes only the fields involved in the join. Add a crow's foot cardinality 
marker to indicate a 1:many relationship, and add it next to the high 
cardinality table.

Image with the first part of the response to the prompt with the following text: "Based on the table schemas, sku_id is the appropriate column to join these tables. The relationship is likely one-to-many (1:M) where one SKU can appear in multiple orders. Here's the SQL join: SELECT o.order_id, o.sku_id, s.sku_description FROM sample.orderinfo o JOIN sample.skuinfo s ON o.sku_id = s.sku_id;

Image with the second part of the response to the prompt with the ASCII relationship diagram showing the join relationship.

Each time tables are joined together, new aggregated metrics become available to drive business insights. Now, for instance, we can find the top selling SKUs in October thanks to our results:

Image shows the top 5 results from the prior query showing the top skus in October.

Sometimes we need to look at code written in languages other than SQL to complete the data model. For example, the names of some vendors this company works with happen to appear in application PHP code as human readable strings, but are saved in the application database as numbers. The analytics data staged in Redshift only contain the numbers. So, we pull a copy of the PHP text file into @workspace, and ask Amazon Q Developer to translate the relevant string-integer mappings into a SQL case statement.

Image shows the selected PHP code with a switch statement mapping Vendor Ids to Vendor Names.

PHP Switch statement showing the mapping of Vendor Ids to String Names.

I am a Redshift database administrator and I am working on a data modeling 
problem. I would like to write SQL statements to join tables sample.orderinfo 
and sample.skuinfo. Please write that SQL to join the two tables. Also, I 
would like to write a SQL case statement to recover all string values defined 
in PHP that are represented as integer values in the database table.

The output of that prompt is shown below.

Image showing the updated SQL query that maps the Vendor Id to the Vendor Name.

Amazon Q Developer automatically detected the PHP switch case statement, converted to SQL, and added it to the final query. Many other programming languages are supported, and modelers should try this technique with other kinds of source code. Note that data scientists and analysts may not know where to look in complex application code for these details, so this discovery-plus-code translation step is a net new benefit to our company that is only possible thanks to Amazon Q Developer.

Step 3. Create code to test the analysis

Now we request SQL source code for a battery of small test queries. These can return cardinality, grain, arithmetic, and null count results.

Please write a short SQL test to compute counts of the key fields that are used 
in the joins, which will verify the cardinality assignments indicated in the 
entity relationship diagram above. The SQL test should compare distinct counts 
to total counts and null counts when it verifies the cardinality.

Image of resulting SQL queries to check cardinality.

Step 4. Validate the results of the analysis

Run the test queries to see if the candidate solution from step 2 meets our goals. The “Insert at cursor” button at the bottom of the response is handy for this. The data modeler can easily spot an error in the join logic and ERD from inspecting the output of the test query. (Or, if it’s hard to interpret the results, keep making the test queries simpler.) If errors arise from the AI misinterpreting or miscalculating a result, or from a vaguely worded prompt, simply adjust the prompt in step 2 to fix the known errors, and repeat steps 2 – 4.

Image showing the query results from the cardinality query.

After a few iterations, taking from seconds to at most tens of minutes each, the modeling errors have been worked out and we arrive at a valid production query.

Key Benefits and Results

With this Amazon Q Developer powered solution and iterative approach, SmugMug has achieved highly accurate data modeling results across numerous database tables. Once the correct modeling configuration is established, various useful outputs may become available.

We already described production SQL, unit tests, and ERDs for documentation. By the end of the process, because Amazon Q Developer has a good understanding of the data it just modeled in its chat history, it will also generate useful Python machine learning programs to predict business trends. Here is a prompt for that, and a partial screenshot of the Python output:

Please write Python code to implement a linear regression that predicts the 
quantity_ordered value based on other fields in the data set. Choose predictor 
variables that are less likely to cause multi-collinearity problems.

Image showing the python code generated to predict quantity_ordered value.

This only shows the model training step, but the full response included all library imports, a Redshift query, feature engineering steps, ML performance metrics, and code for plotting the metrics. And the AI can produce other types of predictive models. For example, you can try:

Please write Python code to implement an XGBoost model that predicts the 
quantity_ordered value based on other fields in the data set.

Ultimately, the solution has improved team productivity for both existing and new team members, while maintaining legacy knowledge needed to onboard new team members more efficiently. Key benefits include:

  1. Reducing SmugMug data analyst and scientist’s time spent on data modeling tasks from days to hours, allowing them to reallocate this time to other high-priority projects.
  2. Automating the generation of BI documentation and predictive ML, also saving crucial time.
  3. Providing net new value by translating application code constant definitions into SQL. Due to organizational boundaries, we would not have achieved this without an assist from the AI.

Future Plans and Expansion

SmugMug conducted the initial data modeling use case testing with over a dozen data science team members and analysts. We are moving on to analyze more complex tables and data schemas, and generating Python code in Amazon SageMaker for ML tasks like data preparation, training, inference, and MLOps. From our experience, Amazon Q Developer has become a preferred internal tool for development that has a data modeling component, and its use continues to expand to different groups around the company.

For SmugMug’s data modeling projects, we continue to enhance the four-step process described above. In order to gather the most relevant context to solve a problem, we build vector database collections to pull from schemas, older SQL code, application source code, BI tool content, and curated documentation. The vector search operation surfaces the right content, and spares data modelers from manually searching in different code archives. We use ChromaDB to do the searches, and bring the results from ChromaDB into the workspace as additional files.

Conclusion

Using Amazon Q Developer for data modeling use cases, SmugMug has managed to increase data science and engineering team productivity by up to 100% when compared to prior workflows. To explore how Amazon Q Developer can benefit your organization, get started here. If you have questions or suggestions, please leave a comment below.

About the Authors

Image of Dr. Geoffrey Ryder

Dr. Geoffrey Ryder

Dr. Geoff Ryder serves as the Manager of Data Science and Engineering at SmugMug, where he leads Team Prophecy in managing the company’s cloud-based data warehouse and analytics platforms. With a focus on leveraging the best AI tools, his team empowers photography clients to enhance their sales of both physical and digital photographic products. Geoff brings over two decades of experience in technical and business roles across Silicon Valley companies, and holds a PhD in Computer Engineering from UC-Santa Cruz.

Will Matos

Will Matos is a Principal Specialist Solutions Architect at AWS, revolutionizing developer productivity through Generative AI, AI-powered chat interfaces, and code generation. With 25 years of tech experience, and over 9 years with AWS, he collaborates with product teams to create intelligent solutions that streamline workflows and accelerate software development cycles. A thought leader engaging early adopters, Will bridges innovation and real-world needs.

Sreenivas Adiki

Sreenivas Adiki is a Sr. Customer Delivery Architect in ProServe, with a focus on data and analytics. He ensures success in designing, building, optimizing, and transforming in the area of Big Data/Analytics. Ensuring solutions are well-designed for successful deployment, Sreenivas participates in deep architectural discussions and design exercises. He has also published several AWS assets, such as whitepapers and proof-of-concept papers.

Kevin Bell

Kevin Bell is a Sr. Solutions Architect at AWS based in Seattle. He has been building things in the cloud for about 10 years. You can find him online as @bellkev on GitHub.

Corey Keane

Corey Keane is a Media and Entertainment (M&E) Sr. Account Manager at AWS. Corey has held a number of positions at Amazon and AWS throughout his 8 years with the company across M&E—including technical business development for strategic partnerships with international game developers, in addition to his current role managing AWS customers in the Media vertical. He leans on his pan-Amazon experience from working on other teams to identify new partnerships between our customers and other Amazon businesses to bring disruptive products to market.

Dissecting the Performance Gains in Amazon Q Developer agent for code transformation

Post Syndicated from Jonathan Vogel original https://aws.amazon.com/blogs/devops/dissecting-the-performance-gains-in-amazon-q-developer-agent-for-code-transformation/

Amazon Q Developer Agent for code transformation is an AI-powered tool which modernizes code bases from Java 8 and Java 11 to Java 17. Integrated into VS Code and IntelliJ, Amazon Q simplifies the migration process and reduce the time and effort compared to manual process. It proposes and verifies code changes, using AI to debug compilation errors. In this blog post, we’ll explore recent improvements to our code transformation agent, particularly its enhanced debugging capabilities. The enhanced debugger agent significantly improves transformation efficiency and quality compared to the existing debugger.

How Amazon Q transforms Java applications

To upgrade Java codebases, the code transformation agent takes the source code input and verify the build and test in source Java version. It then uses deterministic tools to apply code changes, followed by building and testing the changed code in the target Java version. If errors occur in this stage, a generative AI-based system debugs and resolves the compilation errors. Until today, the debugger resolves each error one by one, locating the code file with the error in the codebase, and fixing it. This debug step iterates until all compilation errors are solved or the maximum number of iterations is reached.

A flowchart diagram illustrating Amazon Q's code transformation process for accelerating Java upgrades to version 17. The workflow begins with source code input, flowing through a transformation engine that applies deterministic tools and generative AI, followed by build/test verification cycles and AI-powered debugging to resolve any compilation errors.

As an example, if, as the result of a library upgrade, an import statement is missing or wrong, the AI debugger will re-build, iterate to find all the references in multiple files one by one, and update each reference to resolve the error. Refer to this blog “Three ways Amazon Q Developer agent for code transformation accelerates Java upgrades” for detailed explanation of each transformation step. This approach has helped Q Developer customers achieve accelerations of migration effort by over 40%.

Improving the debugging capabilities of code transformations

To further improve the ability of Q Developer to generate error-free code, we’ve just released multiple foundational improvements to the AI debugger.

  • Multi-error context: the debug AI can now take multiple build errors into consideration, which provides more context, leading to better solution discovery.
  • More tools available for the AI: compared to simply localizing error to a single file and fixing the error previously, the agent can now execute multi-file solutions by exploring the codebase and operating on multiple files.
  • Inter-iteration memory: the debugger AI now remembers previous errors, which contributes to debugging new errors.
  • Intelligent backtracking: the debugger AI can now recognize if the current solution path leads to a dead end, in which case the agent can roll back to the previous state.

To implement these capabilities, the debugger AI is re-architected as a multi-agent system. A memory management agent is responsible to analyze last iteration results and append the relevant portions to the inter-iteration memory. A critic agent is responsible to analyze progress and provide additional information to the debugger agent and, if a dead end is detected, rollback the progress to a previous state. A debugger agent, analyzes the memory and the critique from the previous agents and modifies or updates the plan to fix the remaining errors in the codebase. The debugger agent has its disposal a set of generic and specialized tools to browse and explore the codebase, edit source files, trigger builds, add dependencies, and so on. It is important to note that the agent only has access to the files and tools related to the transformation task, which limits hallucinations and drive towards progress.

Let’s examine how the agent handles recurring issues across multiple files with these improvements. Consider a scenario where several Java files are missing the same import statement after upgrading from Java 8 to Java 17. This happens when you upgrade from older Java collections (like Vector and Enumeration) to modern streaming operations. The system is capable of helping you update these patterns automatically. The agent is now able to intelligently detect this pattern and implement a comprehensive solution across all affected files. Suppose we have three Java files that use the java.util.stream.Collectors class, but the import is missing in each:

File1.java:

public class File1 {
    public List<String> process(List<String> input) {
        return input.stream()
            .filter(s → s.length() > 5)
            .collect(Collectors.toList()); // Error: Cannot resolve symbol 'Collectors'
    }
}

File2.java:

public class File2 {
    public Map<String, Long> countWords(List<String> words) {
        return words.stream()
            .collect(Collectors.groupingBy(
                word -> word.toLowerCase(),
                Collectors.counting()
            )); // Error: Cannot resolve symbol 'Collectors'
    }
}

File3.java:

public class File3 {
    public String concatenate(List<String> strings) {
        return strings.stream()
            .collect(Collectors.joining(", "));
            // Error: Cannot resolve symbol 'Collectors'
    }
}

After the agent detects the common issue and applies the fix, all three files would be updated as follows:

File1.java (after fix):

import java.util.stream.Collectors;

public class File1 {
    public List<String> process(List<String> input) {
        return input.stream()
            .filter(s -> s.length() > 5)
            .collect(Collectors.toList());
    }
}    

File2.java (after fix):

import java.util.stream.Collectors;

public class File2 {
    public Map<String, Long> countWords(List<String> words) {
        return words.stream()
            .collect(Collectors.groupingBy(
                word -> word.toLowerCase(),
                Collectors.counting()));
    }
}

File3.java (after fix):

import java.util.stream.Collectors;

public class File3 {
    public String concatenate(List<String> strings) {
        return strings.stream()
            .collect(Collectors.joining(", "));
    }
}

In this example, the agent has identified that the same import statement (import java.util.stream.Collectors;) was missing in all three files. It then applied the fix consistently across all affected files, demonstrating its ability to recognize patterns and implement solutions efficiently across the entire codebase, avoiding different solutions attempts for each individual error, and saving iteration budget to solve different errors, if present.

The contrast between existing debugger and enhanced Agent is more clear when handling complex, interconnected changes. For instance, in updating Springfox Swagger from 2.0 to 3.0 (OpenAPI), both systems initially made similar changes. However, when faced with subsequent errors, their approaches diverged significantly. Consider this scenario:
Initially, both systems removed Springfox dependencies:

<!-- Removed by both systems -->
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger2</artifactId>
    <version>2.9.2</version>
</dependency>

Later, when encountering a “missing symbol: Docket” error, existing debugger attempted to reintroduce Springfox:

<!-- existing debugger trying to add back Springfox -->
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-boot-starter</artifactId>
    <version>3.0.0</version>
</dependency>

In contrast, our Agent recognized this as consistent with the previous removal and rewrote the file using SpringDoc OpenAPI:

import org.springdoc.core.GroupedOpenApi;

@Configuration
public class SwaggerConfig {
    @Bean
    public GroupedOpenApi publicApi() {
        return GroupedOpenApi.builder()
                .group("springshop-public")
                .pathsToMatch("/public/**")
                .build();
    }
}   

These latest improvements in our debug AI have yielded positive results. By incorporating multi-error context analysis, additional tooling of multi-file solution, and inter-iteration memory, the agent now delivers more comprehensive and consistent codebase upgrades. We tested our new approach on 62 large open-source applications, some containing over 100,000 lines of code, incorporating more than 100 open-source libraries. The results showed an 85% higher success rate compared to the previous approach. These enhancements significantly boost both the quality and efficiency of code transformation, marking a substantial leap forward in automated application modernization for Java.

Conclusion

With the latest improvements, Q Developer continues to accelerate the journey to modernize Java applications across your organization. For more context, please refer to the blog “Accelerate application upgrades with Amazon Q Developer agent for code transformation.”

As we continue to innovate in code transformation use cases, this release creates the foundation to expand language support, further enhance AI-driven problem-solving algorithms, and streamlining the integration with development workflows. Our goal remains to provide developers and organizations with cutting-edge tools that simplify complex maintenance and modernization processes and foster the adoption of modern, cloud-native architectures. Stay tuned for future updates as we push the boundaries of AI-assisted code transformation.

About the authors

Omer Tripp

Omer heads the Q Code Transformation science team. His research work is at the intersection of programming languages and AI/ML, emphasizing developer productivity and acceleration as well as software security and reliability. Outside of work, Omer likes to stay physically active (through tennis, basketball, skiing, and various other activities), as well as tour the US and the world with his family.

Jonathan Vogel

Jonathan is a Developer Advocate at AWS. He was a DevOps Specialist Solutions Architect at AWS for two years prior to taking on the Developer Advocate role. Prior to AWS, he practiced professional software development for over a decade. Jonathan enjoys music, birding and climbing rocks.

Yiyi Guo

Yiyi is a Senior Product Manager at AWS working on Amazon Q developer agent for code transformation, she focuses on leveraging generative AI to accelerate enterprise application modernization.

Elio Damaggio

Elio Damaggio is the product lead for the transformation capabilities of Amazon Q Developer. With more than 15 years in tech, 11 patents, and a PhD in Computer Science, he is now looking for exciting ways to empower developers through AI.

Special thanks to the scientists on the Q Developer team who helped to provide input to this blog: Talha Oz and Zeren Shui.

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/introducing-generative-ai-troubleshooting-for-apache-spark-in-aws-glue-preview/

Organizations run millions of Apache Spark applications each month to prepare, move, and process their data for analytics and machine learning (ML). Building and maintaining these Spark applications is an iterative process, where developers spend significant time testing and troubleshooting their code. During development, data engineers often spend hours sifting through log files, analyzing execution plans, and making configuration changes to resolve issues. This process becomes even more challenging in production environments due to the distributed nature of Spark, its in-memory processing model, and the multitude of configuration options available. Troubleshooting these production issues requires extensive analysis of logs and metrics, often leading to extended downtimes and delayed insights from critical data pipelines.

Today, we are excited to announce the preview of generative AI troubleshooting for Spark in AWS Glue. This is a new capability that enables data engineers and scientists to quickly identify and resolve issues in their Spark applications. This feature uses ML and generative AI technologies to provide automated root cause analysis for failed Spark applications, along with actionable recommendations and remediation steps. This post demonstrates how you can debug your Spark applications with generative AI troubleshooting.

How generative AI troubleshooting for Spark works

For Spark jobs, the troubleshooting feature analyzes job metadata, metrics and logs associated with the error signature of your job to generates a comprehensive root cause analysis. You can initiate the troubleshooting and optimization process with a single click on the AWS Glue console. With this feature, you can reduce your mean time to resolution from days to minutes, optimize your Spark applications for cost and performance, and focus more on deriving value from your data.

Manually debugging Spark applications can get challenging for data engineers and ETL developers due to a few different reasons:

  • Extensive connectivity and configuration options to a variety of resources with Spark while makes it a popular data processing platform, often makes it challenging to root cause issues when configurations are not correct, especially related to resource setup (S3 bucket, databases, partitions, resolved columns) and access permissions (roles and keys).
  • Spark’s in-memory processing model and distributed partitioning of datasets across its workers while good for parallelism, often make it difficult for users to identify root cause of failures resulting from resource exhaustion issues like out of memory and disk exceptions.
  • Lazy evaluation of Spark transformations while good for performance, makes it challenging to accurately and quickly identify the application code and logic which caused the failure from the distributed logs and metrics emitted from different executors.

Let’s look at a few common and complex Spark troubleshooting scenarios where Generative AI Troubleshooting for Spark can save hours of manual debugging time required to deep dive and come up with the exact root cause.

Resource setup or access errors

Spark applications allows to integrate data from a variety of resources like datasets with several partitions and columns on S3 buckets and Data Catalog tables, use the associated job IAM roles and KMS keys for correct permissions to access these resources, and require these resources to exist and be available in the right regions and locations referenced by their identifiers. Users can mis-configure their applications that result in errors requiring deep dive into the logs to understand the root cause being a resource setup or permission issue.

Manual RCA: Failure reason and Spark application Logs

Following example shows the failure reason for such a common setup issue for S3 buckets in a production job run. The failure reason coming from Spark does not help understand the root cause or the line of code that needs to be inspected for fixing it.

Exception in User Class: org.apache.spark.SparkException : Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (172.36.245.14 executor 1): com.amazonaws.services.glue.util.NonFatalException: Error opening file:

After deep diving into the logs of one of the many distributed Spark executors, it becomes clear that the error was caused due to a S3 bucket not existing, however the error stack trace is usually quite long and truncated to understand the precise root cause and location within Spark application where the fix is needed.

Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 80MTEVF2RM7ZYAN9; S3 Extended Request ID: AzRz5f/Amtcs/QatfTvDqU0vgSu5+v7zNIZwcjUn4um5iX3JzExd3a3BkAXGwn/5oYl7hOXRBeo=; Proxy: null), S3 Extended Request ID: AzRz5f/Amtcs/QatfTvDqU0vgSu5+v7zNIZwcjUn4um5iX3JzExd3a3BkAXGwn/5oYl7hOXRBeo=
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.list(Jets3tNativeFileSystemStore.java:423)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.isFolderUsingFolderObject(Jets3tNativeFileSystemStore.java:249)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.isFolder(Jets3tNativeFileSystemStore.java:212)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:518)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:935)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:927)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:983)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:197)
at com.amazonaws.services.glue.hadoop.TapeHadoopRecordReaderSplittable.initialize(TapeHadoopRecordReaderSplittable.scala:168)
... 29 more

With Generative AI Spark Troubleshooting: RCA and Recommendations

With Spark Troubleshooting, you simply click the Troubleshooting analysis button on your failed job run, and the service analyzes the debug artifacts of your failed job to identify the root cause analysis along with the line number in your Spark application that you can inspect to further resolve the issue.

Spark Out of Memory Errors

Let’s take a common but relatively complex error that requires significant manual analysis to conclude its because of a Spark job running out of memory on Spark driver (master node) or one of the distributed Spark executors. Usually, troubleshooting requires an experienced data engineer to manually go over the following steps to identify the root cause.

  • Search through Spark driver logs to find the exact error message
  • Navigate to the Spark UI to analyze memory usage patterns
  • Review executor metrics to understand memory pressure
  • Analyze the code to identify memory-intensive operations

This process often takes hours because the failure reason from Spark is usually not challenging to understand that it was a out of memory issue on the Spark driver and what is the remedy to fix it.

Manual RCA: Failure reason and Spark application Logs

Following example shows the failure reason for the error.

Py4JJavaError: An error occurred while calling o4138.collectToPython. java.lang.StackOverflowError

Spark driver logs require extensive search to find the exact error message. In this case, the error stack trace consisted of more than hundred function calls and is challenging to understand the precise root cause as the Spark application terminated abruptly.

py4j.protocol.Py4JJavaError: An error occurred while calling o4138.collectToPython.
: java.lang.StackOverflowError
 at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1942/131413145.get$Lambda(Unknown Source)
 at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:798)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:459)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:781)
 at org.apache.spark.sql.catalyst.trees.TreeNode.clone(TreeNode.scala:881)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$clone(LogicalPlan.scala:30)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.clone(AnalysisHelper.scala:295)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.clone$(AnalysisHelper.scala:294)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.clone(LogicalPlan.scala:30)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.clone(LogicalPlan.scala:30)
 at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$clone$1(TreeNode.scala:881)
 at org.apache.spark.sql.catalyst.trees.TreeNode.applyFunctionIfChanged$1(TreeNode.scala:747)
 at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:783)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:459)
 ... repeated several times with hundreds of function calls

With Generative AI Spark Troubleshooting: RCA and Recommendations

With Spark Troubleshooting, you can click the Troubleshooting analysis button on your failed job run and get a detailed root cause analysis with the line of code which you can inspect, and also recommendations on best practices to optimize your Spark application for fixing the problem.

Spark Out of Disk Errors

Another complex error pattern with Spark is when it runs out of disk storage on one of the many Spark executors in the Spark application. Similar to Spark OOM exceptions, manual troubleshooting requires extensive deep dive into distributed executor logs and metrics to understand the root cause and identify the application logic or code causing the error due to Spark’s lazy execution of its transformations.

Manual RCA: Failure Reason and Spark application Logs

The associated failure reason and error stack trace in the application logs is again quiet long requiring the user to gather more insights from Spark UI and Spark metrics to identify the root cause and identify the resolution.

An error occurred while calling o115.parquet. No space left on device
py4j.protocol.Py4JJavaError: An error occurred while calling o115.parquet.
: org.apache.spark.SparkException: Job aborted.
 at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:638)
 at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:279)
 at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:193)
 at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
 at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
 at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
 ....

With Generative AI Spark Troubleshooting: RCA and Recommendations

With Spark Troubleshooting, it provides the RCA and the line number of code in the script where the data shuffle operation was lazily evaluated by Spark. It also points to best practices guide for optimizing the shuffle or wide transforms or using S3 shuffle plugin on AWS Glue.

Debug AWS Glue for Spark jobs

To use this troubleshooting feature for your failed job runs, complete following:

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Choose your job.
  3. On the Runs tab, choose your failed job run.
  4. Choose Troubleshoot with AI to start the analysis.
  5. You will be redirected to the Troubleshooting analysis tab with generated analysis.

You will see Root Cause Analysis and Recommendations sections.

The service analyzes your job’s debug artifacts and provide the results. Let’s look at a real example of how this works in practice.

We show below an end-to-end example where Spark Troubleshooting helps a user with identification of the root cause for a resource setup issue and help fix the job to resolve the error.

Considerations

During preview, the service focuses on common Spark errors like resource setup and access issues, out of memory exceptions on Spark driver and executors, out of disk exceptions on Spark executors, and will clearly indicate when an error type is not yet supported. Your jobs must run on AWS Glue version 4.0.

The preview is available at no additional charge in all AWS commercial Regions where AWS Glue is available. When you use this capability, any validation runs triggered by you to test proposed solutions will be charged according to the standard AWS Glue pricing.

Conclusion

This post demonstrated how generative AI troubleshooting for Spark in AWS Glue helps your day-to-day Spark application debugging. It simplifies the debugging process for your Spark applications by using generative AI to automatically identify the root cause of failures and provides actionable recommendations to resolve the issues.

To learn more about this new troubleshooting feature for Spark, please visit Troubleshooting Spark jobs with AI.

A special thanks to everyone who contributed to the launch of generative AI troubleshooting for Apache Spark in AWS Glue: Japson Jeyasekaran, Rahul Sharma, Mukul Prasad, Weijing Cai, Jeremy Samuel, Hirva Patel, Martin Ma, Layth Yassin, Kartik Panjabi, Maya Patwardhan, Anshi Shrivastava, Henry Caballero Corzo, Rohit Das, Peter Tsai, Daniel Greenberg, McCall Peltier, Takashi Onikura, Tomohiro Tanaka, Sotaro Hikita, Chiho Sugimoto, Yukiko Iwazumi, Gyan Radhakrishnan, Victor Pleikis, Sriram Ramarathnam, Matt Sampson, Brian Ross, Alexandra Tello, Andrew King, Joseph Barlan, Daiyan Alamgir, Ranu Shah, Adam Rohrscheib, Nitin Bahadur, Santosh Chandrachood, Matt Su, Kinshuk Pahare, and William Vambenepe.


About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his road bike.

Vishal Kajjam is a Software Development Engineer on the AWS Glue team. He is passionate about distributed computing and using ML/AI for designing and building end-to-end solutions to address customers’ data integration needs. In his spare time, he enjoys spending time with family and friends.

Shubham Mehta is a Senior Product Manager at AWS Analytics. He leads generative AI feature development across services such as AWS Glue, Amazon EMR, and Amazon MWAA, using AI/ML to simplify and enhance the experience of data practitioners building data applications on AWS.

Wei Tang is a Software Development Engineer on the AWS Glue team. She is strong developer with deep interests in solving recurring customer problems with distributed systems and AI/ML.

XiaoRun Yu is a Software Development Engineer on the AWS Glue team. He is working on building new features for AWS Glue to help customers. Outside of work, Xiaorun enjoys exploring new places in the Bay Area.

Jake Zych is a Software Development Engineer on the AWS Glue team. He has deep interest in distributed systems and machine learning. In his spare time, Jake likes to create video content and play board games.

Savio Dsouza is a Software Development Manager on the AWS Glue team. His team works on distributed systems & new interfaces for data integration and efficiently managing data lakes on AWS.

Mohit Saxena is a Senior Software Development Manager on the AWS Glue team. His team focuses on building distributed systems to enable customers with interactive and simple-to-use interfaces to efficiently manage and transform petabytes of data across data lakes on Amazon S3, and databases and data warehouses on the cloud.

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/introducing-generative-ai-upgrades-for-apache-spark-in-aws-glue-preview/

Organizations run millions of Apache Spark applications each month on AWS, moving, processing, and preparing data for analytics and machine learning. As these applications age, keeping them secure and efficient becomes increasingly challenging. Data practitioners need to upgrade to the latest Spark releases to benefit from performance improvements, new features, bug fixes, and security enhancements. However, these upgrades are often complex, costly, and time-consuming.

Today, we are excited to announce the preview of generative AI upgrades for Spark, a new capability that enables data practitioners to quickly upgrade and modernize their Spark applications running on AWS. Starting with Spark jobs in AWS Glue, this feature allows you to upgrade from an older AWS Glue version to AWS Glue version 4.0. This new capability reduces the time data engineers spend on modernizing their Spark applications, allowing them to focus on building new data pipelines and getting valuable analytics faster.

Understanding the Spark upgrade challenge

The traditional process of upgrading Spark applications requires significant manual effort and expertise. Data practitioners must carefully review incremental Spark release notes to understand the intricacies and nuances of breaking changes, some of which may be undocumented. They then need to modify their Spark scripts and configurations, updating features, connectors, and library dependencies as needed.

Testing these upgrades involves running the application and addressing issues as they arise. Each test run may reveal new problems, resulting in multiple iterations of changes. After the upgraded application runs successfully, practitioners must validate the new output against the expected results in production. This process often turns into year-long projects that cost millions of dollars and consume tens of thousands of engineering hours.

How generative AI upgrades for Spark works

The Spark upgrades feature uses AI to automate both the identification and validation of required changes to your AWS Glue Spark applications. Let’s explore how these capabilities work together to simplify your upgrade process.

AI-driven upgrade plan generation

When you initiate an upgrade, the service analyzes your application using AI to identify necessary changes across both PySpark code and Spark configurations. During preview, Spark Upgrades supports upgrading from Glue 2.0 (Spark 2.4.3, Python 3.7) to Glue 4.0 (Spark 3.3.0, Python 3.10), automatically handling changes that would typically require extensive manual review of public Spark, Python and Glue version migration guides, followed by development, testing, and verification. Spark Upgrades addresses four key areas of changes:

  • Spark SQL API methods and functions
  • Spark DataFrame API methods and operations
  • Python language updates (including module deprecations and syntax changes)
  • Spark SQL and Core configuration settings

The complexity of these upgrades becomes evident when you consider migrating from Spark 2.4.3 to Spark 3.3.0 involves over a hundred version-specific changes. Several factors contribute to the challenges of performing manual upgrades:

  • Highly expressive language with a mix of imperative and declarative programming styles, allows users to easily develop Spark applications. However, this increases the complexity of identifying impacted code during upgrades.
  • Lazy execution of transformations in a distributed Spark application improves performance but makes runtime verification of application upgrades challenging for users.
  • Spark configurations changes in default values or the introduction of new configurations across versions can impact application behavior in different ways, making it difficult for users to identify issues during upgrades.

For example, in Spark 3.2, Spark SQL TRANSFORM operator can’t support alias in inputs. In Spark 3.1 and earlier, you could write a script transform like SELECT TRANSFORM(a AS c1, b AS c2) USING 'cat' FROM TBL.

# Original code (Glue 2.0)
query = """
SELECT TRANSFORM(item as product_name, price as product_price, number as product_number)
   USING 'cat'
FROM goods
WHERE goods.price > 5
"""
spark.sql(query)

# Updated code (Glue 4.0)
query = """
SELECT TRANSFORM(item, price, number)
   USING 'cat' AS (product_name, product_price, product_number)
FROM goods
WHERE goods.price > 5
"""
spark.sql(query)

In Spark 3.1, loading and saving timestamps before 1900-01-01 00:00:00Z as INT96 in Parquet files causes errors. In Spark 3.0, this wouldn’t fail but could result in timestamp shifts due to calendar rebasing. To restore the old behavior in Spark 3.1, you would need to configure the Spark SQL configurations for spark.sql.legacy.parquet.int96RebaseModeInRead and spark.sql.legacy.parquet.int96RebaseModeInWrite to LEGACY.

# Original code (Glue 2.0)
data = [(1, "1899-12-31 23:59:59"), (2, "1900-01-01 00:00:00")]
schema = StructType([ StructField("id", IntegerType(), True), StructField("timestamp", TimestampType(), True) ])
df = spark.createDataFrame(data, schema=schema)
df.write.mode("overwrite").parquet("path/to/parquet_file") 

# Updated code (Glue 4.0)
qspark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY") 
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY")

data = [(1, "1899-12-31 23:59:59"), (2, "1900-01-01 00:00:00")]
schema = StructType([ StructField("id", IntegerType(), True), StructField("timestamp", TimestampType(), True) ])
df = spark.createDataFrame(data, schema=schema)
df.write.mode("overwrite").parquet("path/to/parquet_file")

Automated validation in your environment

After identifying the necessary changes, Spark Upgrades validates the upgraded application by running it as an AWS Glue job in your AWS account. The service iterates through multiple validation runs, up to 10, reviewing any errors encountered in each iteration and refining the upgrade plan until it achieves a successful run. You can run a Spark Upgrade Analysis in your development account using mock datasets supplied through Glue job parameters used for validation runs.

After Spark Upgrades has successfully validated the changes, it presents an upgrade plan for you to review. You can then accept and apply the changes to your job in the development account, before replicating them to your job in the production account. The Spark Upgrade plan includes the following:

  • An upgrade summary with an explanation of code updates made during the process
  • The final script that you can use in place of your current script
  • Logs from validation runs showing how issues were identified and resolved

You can review all aspects of the upgrade, including intermediate validation attempts and any error resolutions, before deciding to apply the changes to your production job. This approach ensures you have full visibility into and control over the upgrade process while benefiting from AI-driven automation.

Get started with generative AI Spark upgrades

Let’s walk through the process of upgrading an AWS Glue 2.0 job to AWS Glue 4.0. Complete the following steps:

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Select your AWS Glue 2.0 job, and choose Run upgrade analysis with AI.
  3. For Result path, enter s3://aws-glue-assets-<account-id>-<region>/scripts/upgraded/ (provide your own account ID and AWS Region).
  4. Choose Run.
  5. On the Upgrade analysis tab, wait for the analysis to be completed.

    While an analysis is running, you can view the intermediate job analysis attempts (up to 10) for validation under the Runs tab. Additionally, the Upgraded summary in S3 documents the upgrades made by the Spark Upgrade service so far, refining the upgrade plan with each attempt. Each attempt will display a different failure reason, which the service tries to address in the subsequent attempt through code or configuration updates.
    After a successful analysis, the upgraded script and a summary of changes will be uploaded to Amazon Simple Storage Service (Amazon S3).
  6. Review the changes to make sure they meet your requirements, then choose Apply upgraded script.

Your job has now been successfully upgraded to AWS Glue version 4.0. You can check the Script tab to verify the updated script and the Job details tab to review the modified configuration.

Understanding the upgrade process through an example

We now show a production Glue 2.0 job that we would like to upgrade to Glue 4.0 using the Spark Upgrade feature. This Glue 2.0 job reads a dataset, updated daily in an S3 bucket under different partitions, containing new book reviews from an online marketplace and runs SparkSQL to gather insights into the user votes for the book reviews.

Original code (Glue 2.0) – before upgrade

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
from collections import Sequence
from pyspark.sql.types import DecimalType
from pyspark.sql.functions import lit, to_timestamp, col

def is_data_type_sequence(coming_dict):
    return True if isinstance(coming_dict, Sequence) else False

def dataframe_to_dict_list(df):
    return [row.asDict() for row in df.collect()]

books_input_path = (
    "s3://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Books/"
)
view_name = "books_temp_view"
static_date = "2010-01-01"
books_source_df = (
    spark.read.option("header", "true")
    .option("recursiveFileLookup", "true")
    .option("path", books_input_path)
    .parquet(books_input_path)
)
books_source_df.createOrReplaceTempView(view_name)
books_with_new_review_dates_df = spark.sql(
    f"""
        SELECT 
        {view_name}.*,
            DATE_ADD(to_date(review_date), "180.8") AS next_review_date,
            CASE 
                WHEN DATE_ADD(to_date(review_date), "365") < to_date('{static_date}') THEN 'Yes' 
                ELSE 'No' 
            END AS Actionable
        FROM {view_name}
    """
)
books_with_new_review_dates_df.createOrReplaceTempView(view_name)
aggregate_books_by_marketplace_df = spark.sql(
    f"SELECT marketplace, count({view_name}.*) as total_count, avg(star_rating) as average_star_ratings, avg(helpful_votes) as average_helpful_votes, avg(total_votes) as average_total_votes  FROM {view_name} group by marketplace"
)
aggregate_books_by_marketplace_df.show()
data = dataframe_to_dict_list(aggregate_books_by_marketplace_df)
if is_data_type_sequence(data):
    print("data is valid")
else:
    raise ValueError("Data is invalid")

aggregated_target_books_df = aggregate_books_by_marketplace_df.withColumn(
    "average_total_votes_decimal", col("average_total_votes").cast(DecimalType(3, -2))
)
aggregated_target_books_df.show()

New code (Glue 4.0) – after upgrade

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from collections.abc import Sequence
from pyspark.sql.types import DecimalType
from pyspark.sql.functions import lit, to_timestamp, col

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
spark.conf.set("spark.sql.adaptive.enabled", "false")
spark.conf.set("spark.sql.legacy.allowStarWithSingleTableIdentifierInCount", "true")
spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", "true")
job = Job(glueContext)

def is_data_type_sequence(coming_dict):
    return True if isinstance(coming_dict, Sequence) else False

def dataframe_to_dict_list(df):
    return [row.asDict() for row in df.collect()]

books_input_path = (
    "s3://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Books/"
)
view_name = "books_temp_view"
static_date = "2010-01-01"
books_source_df = (
    spark.read.option("header", "true")
    .option("recursiveFileLookup", "true")
    .load(books_input_path)
)
books_source_df.createOrReplaceTempView(view_name)
books_with_new_review_dates_df = spark.sql(
    f"""
        SELECT 
        {view_name}.*,
            DATE_ADD(to_date(review_date), 180) AS next_review_date,
            CASE 
                WHEN DATE_ADD(to_date(review_date), 365) < to_date('{static_date}') THEN 'Yes' 
                ELSE 'No' 
            END AS Actionable
        FROM {view_name}
    """
)
books_with_new_review_dates_df.createOrReplaceTempView(view_name)
aggregate_books_by_marketplace_df = spark.sql(
    f"SELECT marketplace, count({view_name}.*) as total_count, avg(star_rating) as average_star_ratings, avg(helpful_votes) as average_helpful_votes, avg(total_votes) as average_total_votes  FROM {view_name} group by marketplace"
)
aggregate_books_by_marketplace_df.show()
data = dataframe_to_dict_list(aggregate_books_by_marketplace_df)
if is_data_type_sequence(data):
    print("data is valid")
else:
    raise ValueError("Data is invalid")

aggregated_target_books_df = aggregate_books_by_marketplace_df.withColumn(
    "average_total_votes_decimal", col("average_total_votes").cast(DecimalType(3, -2))
)
aggregated_target_books_df.show()

Upgrade summary

In Spark 3.2, spark.sql.adaptive.enabled is enabled by default. To restore the behavior before Spark 3.2, 
you can set spark.sql.adaptive.enabled to false.

No suitable migration rule was found in the provided context for this specific error. The change was made based on the error message, which indicated that Sequence could not be imported from collections module. In Python 3.10, Sequence has been moved to the collections.abc module.

In Spark 3.1, path option cannot coexist when the following methods are called with path parameter(s): DataFrameReader.load(), DataFrameWriter.save(), DataStreamReader.load(), or DataStreamWriter.start(). In addition, paths option cannot coexist for DataFrameReader.load(). For example, spark.read.format(csv).option(path, /tmp).load(/tmp2) or spark.read.option(path, /tmp).csv(/tmp2) will throw org.apache.spark.sql.AnalysisException. In Spark version 3.0 and below, path option is overwritten if one path parameter is passed to above methods; path option is added to the overall paths if multiple path parameters are passed to DataFrameReader.load(). To restore the behavior before Spark 3.1, you can set spark.sql.legacy.pathOptionBehavior.enabled to true.

In Spark 3.0, the `date_add` and `date_sub` functions accepts only int, smallint, tinyint as the 2nd argument; fractional and non-literal strings are not valid anymore, for example: `date_add(cast('1964-05-23' as date), '12.34')` causes `AnalysisException`. Note that, string literals are still allowed, but Spark will throw `AnalysisException` if the string content is not a valid integer. In Spark version 2.4 and below, if the 2nd argument is fractional or string value, it is coerced to int value, and the result is a date value of `1964-06-04`.

In Spark 3.2, the usage of count(tblName.*) is blocked to avoid producing ambiguous results. Because count(*) and count(tblName.*) will output differently if there is any null values. To restore the behavior before Spark 3.2, you can set spark.sql.legacy.allowStarWithSingleTableIdentifierInCount to true.

In Spark 3.0, negative scale of decimal is not allowed by default, for example, data type of literal like 1E10BD is DecimalType(11, 0). In Spark version 2.4 and below, it was DecimalType(2, -9). To restore the behavior before Spark 3.0, you can set spark.sql.legacy.allowNegativeScaleOfDecimal to true.

As seen in the updated Glue 4.0 (Spark 3.3.0) script diff compared to the Glue 2.0 (Spark 2.4.3) script and the resulting upgrade summary, a total of six different code and configuration updates were applied across the six attempts of the Spark Upgrade Analysis.

  • Attempt #1 included a Spark SQL configuration (spark.sql.adaptive.enabled) to restore the application behavior as a new feature for Spark SQL adaptive query execution is introduced starting Spark 3.2. Users can inspect this configuration change and can further enable or disable it as per their preference.
  • Attempt #2 resolved a Python language change between Python 3.7 and 3.10 with the introduction of a new abstract base class (abc) under the Python collections module for importing Sequence.
  • Attempt #3 resolved an error encountered due to a change in behavior of DataFrame API starting Spark 3.1 where path option cannot exist with other DataFrameReader operations.
  • Attempt #4 resolved an error caused by a change in the Spark SQL function API signature for DATE_ADD which now only accepts integers as the second argument starting from Spark 3.0.
  • Attempt #5 resolved an error encountered due to the change in behavior Spark SQL function API for count(tblName.*) starting Spark 3.2. The behavior was restored with the introduction of a new Spark SQL configuration spark.sql.legacy.allowStarWithSingleTableIdentifierInCount
  • Attempt #6 successfully completed the analysis and ran the new script on Glue 4.0 without any new errors. The final attempt resolved an error encountered due to the prohibited use of negative scale for cast(DecimalType(3, -6) in Spark DataFrame API starting Spark 3.0. The issue was addressed by enabling the new Spark SQL configuration spark.sql.legacy.allowNegativeScaleOfDecimal.

Important considerations for preview

As you begin using automated Spark upgrades during the preview period, there are several important aspects to consider for optimal usage of the service:

  • Service scope and limitations – The preview release focuses on PySpark code upgrades from AWS Glue versions 2.0 to version 4.0. At the time of writing, the service handles PySpark code that doesn’t rely on additional library dependencies. You can run automated upgrades for up to 10 jobs concurrently in an AWS account, allowing you to efficiently modernize multiple jobs while maintaining system stability.
  • Optimizing costs during the upgrade process – Because the service uses generative AI to validate the upgrade plan through multiple iterations, with each iteration running as an AWS Glue job in your account, it’s essential to optimize the validation job run configurations for cost-efficiency. To achieve this, we recommend specifying a run configuration when starting an upgrade analysis as follows:
    • Using non-production developer accounts and selecting sample mock datasets that represent your production data but are smaller in size for validation with Spark Upgrades.
    • Using right-sized compute resources, such as G.1X workers, and selecting an appropriate number of workers for processing your sample data.
    • Enabling Glue auto scaling when applicable to automatically adjust resources based on workload.

    For example, if your production job processes terabytes of data with 20 G.2X workers, you might configure the upgrade job to process a few gigabytes of representative data with 2 G.2X workers and auto scaling enabled for validation.

  • Preview best practices – During the preview period, we strongly recommend starting your upgrade journey with non-production jobs. This approach allows you to familiarize yourself with the upgrade workflow, and understand how the service handles different types of Spark code patterns.

Your experience and feedback are crucial in helping us enhance and improve this feature. We encourage you to share your insights, suggestions, and any challenges you encounter through AWS Support or your account team. This feedback will help us improve the service and add capabilities that matter most to you during preview.

Conclusion

This post demonstrates how automated Spark upgrades can assist with migrating your Spark applications in AWS Glue. It simplifies the migration process by using generative AI to automatically identify the necessary script changes across different Spark versions.

To learn more about this feature in AWS Glue, see Generative AI upgrades for Apache Spark in AWS Glue.

A special thanks to everyone who contributed to the launch of generative AI upgrades for Apache Spark in AWS Glue: Shuai Zhang, Mukul Prasad, Liyuan Lin, Rishabh Nair, Raghavendhar Thiruvoipadi Vidyasagar, Tina Shao, Chris Kha, Neha Poonia, Xiaoxi Liu, Japson Jeyasekaran, Suthan Phillips, Raja Jaya Chandra Mannem, Yu-Ting Su, Neil Jonkers, Boyko Radulov, Sujatha Rudra, Mohammad Sabeel, Mingmei Yang, Matt Su, Daniel Greenberg, Charlie Sim, McCall Petier, Adam Rohrscheib, Andrew King, Ranu Shah, Aleksei Ivanov, Bernie Wang, Karthik Seshadri, Sriram Ramarathnam, Asterios Katsifodimos, Brody Bowman, Sunny Konoplev, Bijay Bisht, Saroj Yadav, Carlos Orozco, Nitin Bahadur, Kinshuk Pahare, Santosh Chandrachood, and William Vambenepe.


About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his new road bike.

Keerthi Chadalavada is a Senior Software Development Engineer at AWS Glue, focusing on combining generative AI and data integration technologies to design and build comprehensive solutions for customers’ data and analytics needs.

Shubham Mehta is a Senior Product Manager at AWS Analytics. He leads generative AI feature development across services such as AWS Glue, Amazon EMR, and Amazon MWAA, using AI/ML to simplify and enhance the experience of data practitioners building data applications on AWS.

Pradeep Patel is a Software Development Manager on the AWS Glue team. He is passionate about helping customers solve their problems by using the power of the AWS Cloud to deliver highly scalable and robust solutions. In his spare time, he loves to hike and play with web applications.

Chuhan LiuChuhan Liu is a Software Engineer at AWS Glue. He is passionate about building scalable distributed systems for big data processing, analytics, and management. He is also keen on using generative AI technologies to provide brand-new experience to customers. In his spare time, he likes sports and enjoys playing tennis.

Vaibhav Naik is a software engineer at AWS Glue, passionate about building robust, scalable solutions to tackle complex customer problems. With a keen interest in generative AI, he likes to explore innovative ways to develop enterprise-level solutions that harness the power of cutting-edge AI technologies.

Mohit Saxena is a Senior Software Development Manager on the AWS Glue and Amazon EMR team. His team focuses on building distributed systems to enable customers with simple-to-use interfaces and AI-driven capabilities to efficiently transform petabytes of data across data lakes on Amazon S3, and databases and data warehouses on the cloud.

Expanded resource awareness in Amazon Q Developer

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/expanded-resource-awareness-in-amazon-q-developer/

Recently, Amazon Q Developer announced expanded support for account resource awareness with Amazon Q in the AWS Management Console along with the general availability of Amazon Q Developer in AWS Chatbot, enabling you to ask questions from Microsoft Teams or Slack. Additionally, Amazon Q will now provide context-aware assistance for your questions about resources in your account depending on where you are in the console. Amazon Q in the console gives you the ability to use natural language with the Amazon Q Developer chat capability to list resources in your AWS account, get specific resource details, and ask about related resources, launched in preview on April 30, 2024.

In this blog, I will highlight the new expanded functionality of this feature in Amazon Q Developer including understanding relationships between account resources, context-awareness, and the general availability of the AWS Chatbot integration with Microsoft Teams and Slack.

Expanded account resource awareness with Amazon Q Developer

Prior to the launch of the expanded support, you could ask Amazon Q Developer to list resources in your AWS Account with prompts such as “List all my EC2 instances in us-east-1” and the service would list all your Amazon Elastic Compute Cloud (Amazon EC2) instances. Now, with the expanded support, you can ask more complex questions about your AWS account resources. I will show a few examples in this section of this post.

For our first example, imagine that you’re a developer who is responsible for maintaining code as a part of the software development lifecycle (SDLC) and you frequently use AWS Lambda for development and Amazon Relational Database Service (RDS) in the backend as a part of your development process. With this new update, a developer could open a new Q chat in the AWS Management Console, and enter a prompt such as: “Which RDS clusters are due for an update?”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all RDS clusters that need updates in their account and Amazon Q listing those Databases.

Figure 1: Amazon Q Developer listing RDS clusters needing an update

As a result, the Amazon Q Developer console chat will return a list of all your Amazon RDS clusters that have available updates as shown in Figure 1 above.

Now, for another example, you want to update any Lambda functions in your AWS account that had a Simple Notification Service (SNS) topic as a trigger due to moving to a new SNS topic you recently created. To identify which SNS topics are still being used, you could enter a prompt such as “List all the SNS topics that trigger a lambda function.”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all SNS topics that trigger a lambda function and Amazon Q listing the SNS topics as an output.

Figure 2: Amazon Q listing SNS topics that are lambda triggers

As shown in the prior example, Amazon Q Developer was able to identify any SNS topics in the form of Amazon resource name (ARN) that was set to trigger a lambda function in the AWS account as intended.

Additionally, you can ask a follow up question in the same chat to investigate more. You can send a prompt such as “Which lambda function uses the arn:aws:sns:us-east-1:76859XXXX:FailoverHealthcheck SNS topic?”

User entering prompting Amazon Q Developer chat with a follow up question in the AWS management console about which Lambda is associated with an SNS topic.

Figure 3: Asking Q Developer a follow up question about a resource

From Figure 3 above, you can see that there is a Lambda function/endpoint associated with the SNS topic resource that Amazon Q Developer was able to identify.

Outside of the examples above, here are some other prompts/examples that can be explored for the expanded support:

– “Do I have any ECS clusters with pending tasks?”

– “Are there any ECS clusters in my account with services in DRAINING status?”

Amazon Q Developer understands where you are in the console

Amazon Q Developer in the AWS Management Console now provides context-aware assistance for your questions about resources in your account. This feature allows you to ask questions directly related to the console page you’re viewing, eliminating the need to specify the service or resource in your query. Q Developer uses the current page as additional context to provide more accurate and relevant responses, streamlining your interaction with AWS services and resources.

Prior to the update, a user would have to prompt, “What is the public IPv4 address of my instance i-08ccXXXXXX?”  Now, if you are viewing an EC2 instance in the console and prompt Amazon Q, “What is the public IPv4 address of my instance?” you will not need to specify the instance you are referring to.

User entering prompt Amazon Q Developer chat in the AWS management console about what the IP address is of the instance on the page.

Figure 4: Asking Amazon Q about an EC2 instance being viewed

In figure 4 above, Amazon Q’s console chat was able to use its context-awareness to pick up on what the IPv4 address was on the console page where I was currently working, despite me not specifying which instance I was referring to.

AWS ChatBot can now answer questions about AWS resources in Microsoft Teams and Slack

Recently, we announced the general availability of Amazon Q Developer in AWS Chatbot, which provides answers to customers’ AWS resource related queries in Microsoft Teams and Slack. This gives teams the ability to quickly find relevant resources to troubleshoot issues using natural language queries in the chat channels of Microsoft Teams or Slack.

For example, you could integrate the AWS Chatbot Service with Amazon Q Developer to allow you to enter a prompt in Slack such as “@aws show EC2 instances in running state in us-east-1”.

User entering prompt in slack to ask the AWS Chatbot about EC2 resources and Amazon Q responding

Figure 5: Amazon Q listing all EC2 resources in Slack

As shown in figure 5 above, Amazon Q was able to list all the EC2 resources and place them into a slack channel showing an example of the functionality in action.

Conclusion

Amazon Q Developer has enhanced its cloud resource management capabilities, enabling more intuitive and intelligent interactions with AWS resources. The new features allow developers to ask complex, context-aware questions about their cloud infrastructure directly through the AWS Management Console, Microsoft Teams, and Slack. Users can now easily discover new details about specific resources with natural language queries that provide precise, contextual information. These improvements represent a significant step forward in simplifying cloud resource management, making it faster and more user-friendly for development teams to understand, track, and maintain their AWS environments. To learn more about chatting with your AWS resources, check out Console documentation and AWS Chatbot documentation.

About the authors

Brendan Jenkins

Brendan Jenkins is a Tech Lead Solutions Architect at Amazon Web Services (AWS) working with Enterprise AWS customers providing them with technical guidance and helping achieve their business goals. He has an area of specialization in DevOps and Machine Learning technology.

Securing the RAG ingestion pipeline: Filtering mechanisms

Post Syndicated from Laura Verghote original https://aws.amazon.com/blogs/security/securing-the-rag-ingestion-pipeline-filtering-mechanisms/

Retrieval-Augmented Generative (RAG) applications enhance the responses retrieved from large language models (LLMs) by integrating external data such as downloaded files, web scrapings, and user-contributed data pools. This integration improves the models’ performance by adding relevant context to the prompt.

While RAG applications are a powerful way to dynamically add additional context to an LLM’s prompt and make model responses more relevant, incorporating data from external sources can pose security risks.

For example, let’s assume you crawl a public website and ingest the data into your knowledge base. Because it’s public data, you risk also ingesting malicious content that was injected into that website by threat actors with the goal of exploiting the knowledge base component of the RAG application. Through this mechanism, threat actors can intentionally change the model’s behavior.

Risks like these emphasize the need for security measures in the design and deployment of RAG systems in general. Security measures should be applied not only at inference time (that is, filtering model outputs), but also when ingesting external data into the knowledge base of the RAG application.

In this post, we explore some of the potential security risks of ingesting external data or documents into the knowledge base of your RAG application. We propose practical steps and architecture patterns that you can implement to help mitigate these risks.

Overview of security of the RAG ingestion workflow

Before diving into specifics of mitigating risk in the ingestion pipeline, let’s have a look at a generic RAG workflow and which aspects you should keep in mind when it comes to securing a RAG application. For this post, let’s assume that you’re using Amazon Bedrock Knowledge Bases to build a RAG application. Amazon Bedrock Knowledge Bases offers built-in, robust security controls for data protection, access control, network security, logging and monitoring, and input/output validation that help mitigate many of the security risks.

In a RAG workflow with Amazon Bedrock Knowledge Bases, you have the following environments:

  • An Amazon Bedrock service account, which is managed by the Amazon Bedrock service team.
  • An AWS account where you can store your RAG data (if you’re using an AWS service as your vector store).
  • A possible external environment, depending on the vector database you’ve chosen to store vector embeddings of your ingested content. If you choose Pinecone or Redis Enterprise Cloud for your vector database, you will use an environment external to AWS.
Figure 1: Visual representation of the knowledge base data ingestion flow

Figure 1: Visual representation of the knowledge base data ingestion flow

Looking at the workflow shown in Figure 1 for the ingestion of data into a knowledge base, an ingestion request is started by invoking the StartIngestionJob Bedrock API. From that point:

  1. If this request has the correct IAM permissions associated with it, it’s sent to the Bedrock API endpoint.
  2. This request is then passed to the knowledge base service component.
  3. The metadata collected related to the request is stored in the metadata Amazon DynamoDB database. This database is used solely to enumerate and characterize the data sources and their sync status. The API call includes metadata for the Amazon Simple Storage Service (Amazon S3) source location of the data to ingest, in addition to the vector store that will be used to store the embeddings.
  4. The process will begin to ingest customer-provided data from Amazon S3. If this data was encrypted using customer managed KMS keys, then these keys will be used to decrypt the data.
  5. As data is read from Amazon S3, chunks will be sent internally to invoke the chosen embedding model in Amazon Bedrock. A chunk refers to an excerpt from a data source that’s returned when the vector store that it’s stored in is queried. Using knowledge bases, you can chunk either with a fixed size (standard chunking), hierarchical chunking, semantic chunking, advanced parsing options for parsing non-textual information, or custom transformations. More information about chunking for knowledge bases can be found in How content chunking and parsing works for knowledge bases.
  6. The embeddings model in Amazon Bedrock will create the embeddings, which are then sent to your chosen vector store. Amazon Bedrock Knowledge Bases supports popular databases for vector storage, including the vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora, and MongoDB. If you don’t have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you. This option is only available through the console, not through the SDK or CLI.
  7. If credentials or secrets are required to access the vector store, they can be stored in AWS Secrets Manager where they will be automatically retrieved and used. Afterwards, the embeddings will be inserted into (or updated in) the configured vector store.
  8. Checkpoints for the in-progress ingestion jobs will be temporarily stored in a transient S3 bucket, encrypted with customer managed AWS Key Management Service (AWS KMS) keys. These checkpoints allow you to resume interrupted ingestion jobs from a previous successful checkpoint. Both the Aurora database and the Amazon OpenSearch Serverless database can be configured as public or private, and of course we recommend private databases. Changes in your ingestion data bucket (for example, uploading new files or new versions of files) will be reflected after the data source is synchronized; this synchronization is done incrementally. After the completion of an ingestion job, the data is automatically purged and deleted after a maximum of 8 days.
  9. The ingestion DynamoDB table stores information required for syncing the vector store. It stores metadata related to the chunks needed to keep track of data in the underlying vector database. The table is used so that the service can identify which chunks need to be inserted, updated, or deleted between one ingestion job and another.

When it comes to encryption at rest for the different environments:

  • Customer AWS accounts – The resources in these can be encrypted using customer managed KMS keys
  • External environmentsRedis Enterprise Cloud and Pinecone have their own encryption features
  • Amazon Bedrock service accounts – The S3 bucket (step 8) can be encrypted using customer managed KMS keys, but in the context of Amazon Bedrock, the DynamoDB tables of steps 3 and 9 can only be encrypted with AWS owned keys. However, the tables managed by Amazon Bedrock don’t contain personally identifiable information (PII) or customer-identifiable data.

Throughout the RAG ingestion workflow, data is encrypted in transit. Amazon Bedrock Knowledge Bases uses TLS encryption for communication with third-party vector stores where the provider permits and supports TLS encryption in transit. Customer data is not persistently stored in the Amazon Bedrock service accounts.

For identity and access management, it’s important to follow the principle of least privilege while creating the custom service role for Amazon Bedrock Knowledge Bases. As part of the role’s permissions, you create a trust relationship that allows Amazon Bedrock to assume this role and create and manage knowledge bases. For more information about the necessary permissions, see Providing secure access, usage, and implementation to generative AI RAG techniques.

Security risks of the RAG data ingestion pipeline and the need for ingest time filtering

RAG applications inherently rely on foundation models, introducing additional security considerations beyond the traditional application safeguards. Foundation models can analyze complex linguistic patterns and provide responses depending on the input context, and can be subject to malicious events such as jailbreaking, data poisoning, and inversion. Some of these LLM-specific risks are mapped out in documents such as the OWASP Top 10 for LLM Applications and MITRE ATLAS.

A risk that’s particularly relevant for the RAG ingestion pipeline, and one of the most common risks we see nowadays, is prompt injection. In prompt injection attacks, threat actors manipulate generative AI applications by feeding them malicious inputs disguised as legitimate user prompts. There are two forms of prompt injection: direct and indirect.

Direct prompt injections occur when a threat actor overwrites the underlying system prompt. This might allow them to probe backend systems by interacting with insecure functions and data stores accessible through the LLM. When it comes to securing generative AI applications against prompt injection, this type tends to be the one that customers focus on the most. To mitigate risks, you can use tools such as Amazon Bedrock Guardrails to set up inference-time filtering of the LLM’s completions.

Indirect prompt injections occur when an LLM accepts input from external sources that can be controlled by a threat actor, such as websites or files. This injection type is particularly important when you consider the ingestion pipeline of RAG applications, where a threat actor might embed a prompt injection in external content which is ingested into the database. This can enable the threat actor to manipulate additional systems that the LLM can access or return a different answer to the user. Additionally, indirect prompt injections might not be recognizable by humans. Security issues can result not only from the LLM’s responses based on its training data, but also from the data sources the RAG application has access to from its knowledge base. To mitigate these risks, you should focus on the intersection of the LLM, knowledge base, and external content ingested into the RAG application.

To give you a better idea of indirect prompt ingestion, let’s first discuss an example.

External data source ingestion risk: Examples of indirect prompt injection

Let’s say a threat actor crafts a document or injects content into a website. This content is designed to manipulate an LLM to generate incorrect responses. To a human, such a document could be indistinguishable from legitimate ones. However, the document could contain an invisible sequence, which, when used as a reference source for RAG, could manipulate the LLM into generating an undesirable response.

For example, let’s assume you have a file describing the process for downloading a company’s software. This file is ingested into a knowledge base for an LLM-powered chatbot. A user can ask the chatbot where to find the correct link to download software packages and then download the package by clicking on the link.

A threat actor could include a second link in the document using white text on a white background. This text is invisible to the reader and the company downloading the document to store in their knowledge base. However, it’s visible when parsed by the document parser and saved in the knowledge base. This could result in the LLM returning the hidden link, which could lead the user to download malware hosted by the threat actor on a site they manage, rather than legitimate software from the expected site.

If your application is connected to plugins or agents so that it can call APIs or execute code, the model could be manipulated to run code, open URLs chosen by the threat actor, and more.

If you look at Figure 2 that follows, you can see what the typical RAG workflow is and how an indirect prompt injection attack can happen (this example uses Amazon Bedrock Knowledge Bases).

Figure 2: Visual representation of the RAG workflow with both a generic file and a malicious file that looks identical to the generic one

Figure 2: Visual representation of the RAG workflow with both a generic file and a malicious file that looks identical to the generic one

As shown in Figure 2, for data ingestion (starting at the bottom right), File 1, the legitimate and unmodified file, is saved in the data source (typically an S3 bucket). During ingestion, the document is parsed by a document parser, split into chunks, converted into embeddings, and then saved in the vector store. When a user (top left) asks a question about the file, information from this file will be added as context to the user prompt. However, you might have a malicious File 2 instead, that looks exactly the same to a human reader but contains an invisible character sequence. After this sequence is inserted into the prompt sent to the LLM, it can influence the overall response of the environment.

Threat actors might analyze the following three aspects in the RAG workflow to create and place a malicious sequence:

  • The document parser is software designed to read and interpret the contents of a document. It analyzes the text and extracts relevant information based on predefined rules or patterns. By analyzing the document parser, threat actors can determine how they might inject invisible content into different document formats.
  • The text splitter (or chunker) splits text based on the subject matter of the content. Threat actors will analyze the text splitters to locate a proper injection position for their invisible sequence. Section-based splitters divide content according to tags that label different sections, which threat actors can use to place their invisible sequences within these delineated chunks. Length-based splitters split the content into fixed-length chunks with overlap (to help keep context between chunks).
  • The prompt template is a predefined structure that is used to generate specific outputs or guide interactions with LLMs. Prompt templates determine how the content retrieved from the vector database is organized alongside the user’s original prompt to form the augmented prompt. The template is crucial, because it impacts the overall performance of RAG-based applications. If threat actors are aware of the prompt template used in your application, they can take that into account when constructing their threat sequence.

Potential mitigations

Threat actors can release documents containing well-constructed and well-placed invisible sequences onto the internet, thereby posing a threat to RAG applications that ingest this external content. Therefore, whenever possible, only ingest data from trusted sources. However, if your application requires you to use and ingest data from untrusted sources, it’s recommended to process them carefully to mitigate risks such as indirect prompt injection. To harden your RAG ingestion pipeline, you can use the following mitigation techniques to place additional security measures on your RAG ingestion pipeline. These can be implemented individually or together.

  1. Configure your application to display the source content underlying its responses, allowing users to cross-reference the content with the response. This is possible using Amazon Bedrock Knowledge Bases by using citations. However, this method isn’t a prevention technique. Also, it might be less effective with complex content because it can require that users invest a lot of time in verification to be effective.
  2. Establish trust boundaries between the LLM, external sources, and extensible functionality (for example, plugins, agents, or downstream functions). Treat the LLM as an untrusted actor and maintain final user control on decision-making processes. This comes back to the principle of least privilege. Make sure your LLM has access only to data sources that it needs to have access to and be especially careful when connecting it to external plugins or APIs.
  3. Continuous evaluation plays a vital role in maintaining the accuracy and reliability of your RAG system. When evaluating RAG applications, you can use labeled datasets containing prompts and target answers. However, frameworks such as RAGAS propose automated metrics that enable reference-free evaluation, alleviating the need for human-annotated ground truth answers. Implementing a mechanism for RAG evaluation can help you discover irregularities in your model responses and in the data retrieved from your knowledge base. If you want to explore how to evaluate your RAG application in greater depth, see Evaluate the reliability of Retrieval Augmented Generation applications using Amazon, which provides further insights and guidance on this topic.
  4. You can manually monitor content that you intend to ingest into your vector database—especially when the data includes external content such as websites and files. A human in the loop could potentially protect against less sophisticated, visible threat sequences.

For more advice on mitigating risks in generative AI applications, see the mitigations listed in the OWASP Top 10 for LLMs and MITRE ATLAS.

Architectural pattern 1: Using format breakers and Amazon Textract as document filters

Figure 3: Visual representation of a potential workflow to remove threat sequences from your files is using a format breaker and Amazon Textract

Figure 3: Visual representation of a potential workflow to remove threat sequences from your files is using a format breaker and Amazon Textract

One potential workflow to remove potential threat sequences from your ingest files is to use a format breaker and Amazon Textract. This workflow specifically focuses on invisible threat vectors. The preceding Figure 3 shows a potential setup using AWS services that allows you to automate this.

  1. Let’s say you use an S3 bucket to ingest your files. Whichever file you want to upload into your knowledge base is initially uploaded in this bucket. The upload action in Amazon S3 automatically starts a workflow that will take care of the format break.
  2. A format break is a process used to sanitize and secure documents, by transforming them in a way that strips out potentially harmful elements such as macros, scripts, embedded objects, and other non-text content that could carry security risks. The format break in the ingest-time filter involves converting text content into PDF format and then to OCR format. To start, convert the text to PDF format. One of the options is to use an AWS Lambda function to convert text to PDF format. As an example, you can create such a function by putting the file renderers and PDF generator from LibreOffice into a Lambda function. This step is necessary to process the file using Amazon Textract because the service currently supports only PNG, JPEG, TIFF, and PDF formats.
  3. After the data is put into PDF format, you can save it into an S3 bucket. This upload to S3 can, in turn, trigger the next step in the format break: converting the PDF content to OCR format.
  4. You can process the PDF content using Amazon Textract, which will convert the text content to OCR format. Amazon Textract will render the PDF as an image. This involves extracting the text from the PDF, essentially creating a plain text version of the document. The OCR format makes sure that non-text elements, such as images or embedded files, aren’t carried over to the final document. Only the readable text is extracted, which significantly reduces the risk of hidden malicious content. This also removes white text on white backgrounds because that text is invisible when the PDF is rendered as an image before OCR conversion is performed. To use Amazon Textract to convert text to OCR format, create a Lambda function that will trigger Amazon Textract and input your PDF that was saved in Amazon S3.
  5. You can use Amazon Textract to process multipage documents in PDF format and detect printed and handwritten text from the Standard English alphabet and ASCII symbols. The service will extract printed text, forms, and tables in English, German, French, Spanish, Italian and Portuguese. This means that non-visible threat vectors won’t be detected or recognized by Amazon Textract and are automatically removed from the input. Amazon Textract operations return a Block object in the API response to the Lambda function.
  6. To ingest the information into a knowledge base, you need to transform the Amazon Textract output into a format that’s supported by your knowledge base. In this case, you would use code in your Lambda function to transform the Amazon Textract output into a plain text (.txt) file.
  7. The plain text file is then saved into an S3 bucket. This S3 bucket can then be used as a source for your knowledge base.
  8. You can automate the reflection of changes in your S3 bucket to your knowledge base by either having your Lambda function that created the Amazon S3 file run a start_ingestion_job() API call or use an Amazon S3 event trigger on the destination bucket to configure a new Lambda function to run when a file is uploaded to this S3 bucket. Synchronization is incremental, so changes from the previous synchronization are incorporated. More info on managing your data sources can be found in Connect to your data repository for your knowledge base.

In addition to invisible sequences, threat actors can add sophisticated threat sequences that are difficult to classify or filter. Manually checking each document for unusual content isn’t feasible at scale, and creating a filter or model that accurately detects misleading information in such documents is challenging.

One powerful characteristic of LLMs is that they can analyze complex linguistic patterns. An optional pathway is to add a filtering LLM to your knowledge base ingest pipeline to detect malicious or misleading content, susceptible code, or unrelated context that might mislead your model.

Again, it’s important to note that threat actors might deliberately choose content that’s difficult to classify or filter and that resembles normal content. More capable, general-purpose LLMs provide a larger surface for threat actors, because they aren’t tuned to detect these specific attempts. The question is: can we train models to be robust against a wide variety of threats? Currently, there’s no definitive answer, and it remains a highly researched topic. However, some models address specific use cases. For example, LLamaGuard, a fine-tuned version of Meta’s Llama model, predicts safety labels in 14 categories such as elections, privacy, and defamation. It can classify content in both LLM inputs (prompt classification) and LLM responses (response classification).

For document classification, relevant for filtering ingest data, even a small model like BERT can be used. BERT is an encoder-only language model with a bi-directional attention mechanism, making it strong in tasks requiring deep contextual understanding, such as text classification, named entity recognition (NER), and question answering (QA). It’s open source and can be fine-tuned for various applications. This includes use cases in cybersecurity, such as phishing detection in email messages or detecting prompt injection attacks. If you have the resources in-house and work on critical applications that need advanced filtering for specific threats, consider fine-tuning a model like BERT to classify documents that might contain undesirable material.

In addition to natural-language text, threat actors might use data encoding techniques to obfuscate or conceal undesirable payloads within documents. These techniques include encoded scripts, malware, or other harmful content disguised using methods like base64 encoding, hexadecimal encoding, morse code, uucode, ASCII art, and more.

An effective way to detect such sequences is by using the Amazon Comprehend DetectDominantLanguage API. If a document is written entirely in a supported language, DetectDominantLanguage will return a high confidence score, indicating the absence of encoded data. Conversely, if a document contains encoded strings, such as base64, the API will struggle to categorize this text, resulting in a low confidence score. To automate the detection process, you can route documents to a human review stage if the confidence score falls below a certain threshold (for example, 85 percent). This reduces the need for manual checks for potentially malicious encoded data.

Additionally, the encoding and decoding capabilities of LLMs can assist in decoding encoded data. Various LLMs understand encoding schemes and can interpret encoded data within documents or files. For example, Anthropic’s Claude 3 Haiku can decode a base64 encoded string such as TGVhcm5pbmcgaG93IHRvIGNhbGwgU2FnZU1ha2VyIGVuZHBvaW50cyBmcm9tIExhbWJkYSBpcyB2ZXJ5IHVzZWZ1bC4 into its original plaintext form: “Learning how to call Amazon SageMaker endpoints from Lambda is very useful.” While this example is benign, it demonstrates the ability of LLMs to detect and decode encoded data, which can then be stripped before ingestion into your vector store.

Figure 4: Visual representation of a potential workflow to trigger a human in the loop review in case threat sequences are detected in your ingest files

Figure 4: Visual representation of a potential workflow to trigger a human in the loop review in case threat sequences are detected in your ingest files

In the preceding Figure 4, you can see a workflow that shows how you can integrate the above features into your document processing workflow to detect malicious content in ingest documents:

  1. As your ingestion point, you can use an S3 bucket. Files that you want to upload into your knowledge base are first uploaded into this bucket. In this diagram, the files are assumed to be .txt files.
  2. The upload action in Amazon S3 automatically starts an AWS Step Functions workflow.
  3. Amazon EventBridge is used to trigger the Step Functions workflow.
  4. The first Lambda function in the workflow calls the Amazon Comprehend DetectDominantLanguage API, which flags documents if the confidence score of the language is below a certain threshold, indicating that the text might contain encoded data or data in other formats (such as a language Amazon Comprehend doesn’t recognize) that might be malicious.
  5. If this is the case, the document is sent to a foundation model in Amazon Bedrock that can translate or decode the data.
  6. Next, another Lambda function is triggered. This function invokes a SageMaker endpoint, where you can deploy a model, such as a fine-tuned version of BERT, to classify documents as suspicious or not.
  7. If no suspicious content is detected, nothing is done and the content in the bucket remains the same (no need to override content, to prevent unnecessary costs) and the workflow ends. If undesirable content is detected, the document is stored in a second S3 bucket for human review.
  8. If not, the workflow ends.

Additional considerations for RAG data ingestion pipeline security

In previous sections, we focused on filtering patterns and current recommendations to secure the RAG ingestion pipeline. However, content filters that address indirect prompt injection aren’t the only mitigation to keep in mind when building a secure RAG application. To effectively secure generative AI-powered applications, responsible AI considerations and traditional security recommendations are still crucial.

To moderate content in your ingest pipeline, you might want to remove toxic language and PII data from your ingest documents. Amazon Comprehend offers built-in features for toxic content detection and PII detection in text documents. The Toxicity Detection API can identify content in categories such as hate speech, insults, and sexual content. This feature is particularly useful for making sure that harmful or inappropriate content isn’t ingested into your system. You can use the Toxicity Detection API to analyze up to 10 text segments at a time, each with a size limit of 1 KB. You might need to split larger documents into smaller segments before processing. For detailed guidance on using Amazon Comprehend toxicity detection, see Amazon Comprehend Toxicity Detection. For more information on PII detection and redaction with Amazon Comprehend, we recommend Detecting and redacting PII using Amazon Comprehend.

Keep the principle of least privilege in mind for your RAG application. Think about which permissions your application has, and give it only the permissions it needs to successfully function. Your application sends data in the context or orchestrates tools on behalf of the LLM, so it’s important that these permissions are limited. If you want to dive deep into achieving least privilege at scale, we recommend Strategies for achieving least privilege at scale. This is especially important when your RAG applications involves agents that might call APIs or databases. Make sure you carefully grant permissions to prevent potential security issues such as an SQL injection attack on your database.

Develop a threat model for your RAG application. It’s recommended that you document potential security risks in your application and have mitigation strategies for each risk. This session from Re:Invent 2023 gives an overview of how to approach threat modeling a generative AI workload. In addition, you can use the Threat Composer tool, which comes with a sample generative AI application, to help you in threat modeling your applications.

Lastly, when deciding what data to ingest into your RAG application, make sure to ask the right questions about the origin of the content, such as who has access and edit rights to this content?” For example, anyone can edit a Wikipedia page. In addition, assess what the scope of your application is. Can the RAG application run code? Can it query a database? If so, this poses additional risks, so external data in your vector database should be carefully filtered.

Conclusion

In this blog post, you read about some of the security risks of RAG applications, with a specific focus on the RAG ingestion pipeline. Threat actors might engineer sophisticated methods to embed invisible content within websites or files. Without filtering or an evaluation mechanism, these might result in the LLM generating incorrect information, or worse, depending on the capabilities of the application (such as execute code, query a database, and so on). This makes it challenging to spot these threats when reviewing content.

You learned about some strategies and architectural patterns with filtering mechanisms to mitigate these risks. It’s important to note that the filtering mechanisms might not catch all undesirable content that should be removed from a file (for example, PII, base64 encoded data, and other undesirable sequences). Therefore, an evaluation mechanism and a human in the loop are crucial because there’s no model trained to detect such sequences for techniques like indirect prompt injection at this time (although there are models trained specifically to detect impolite language, but this doesn’t cover all possible cases).

Although there is currently no way to completely mitigate threats like injection attacks, these strategies and architectural patterns are a first step and form part of a layered approach to securing your application. In addition to these, make sure to evaluate your data regularly, consider having a human in the loop, and stay up to date on advancements in this space such as OWASP top 10 for LLM Applications or MITRE ATLAS

If you have feedback about this post, submit comments in the Comments section below.

Laura Verghote

Laura Verghote

Laura is a Senior Solutions Architect for public sector customers in EMEA. She works with customers to design and build solutions in the AWS Cloud, bridging the gap between complex business requirements and technical solutions. She joined AWS as a technical trainer and has wide experience delivering training content to developers, administrators, architects, and partners across EMEA.

Dave Walker

Dave Walker

Dave is a Principal Specialist Solutions Architect for Security and Compliance at AWS. Formerly a Security SME at Sun Microsystems, he has been helping companies and public sector organizations meet industry-specific and Critical National Infrastructure security requirements since 1993. He enjoys inventing security technologies and bending things to unexpected security purposes.

Isabelle Mos

Isabelle Mos

Isabelle is a Solutions Architect at Amazon Web Services (AWS), based in the United Kingdom. In her role, she collaborates closely with customers, offering guidance on AWS Cloud architecture and best practices. Her primary focus is on Machine Learning workloads.

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Post Syndicated from Karim Akhnoukh original https://aws.amazon.com/blogs/big-data/manage-access-controls-in-generative-ai-powered-search-applications-using-amazon-opensearch-service-and-aws-cognito/

Organizations of all sizes and types are using generative AI to create products and solutions. A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In semantic search, documents are stored as vectors, a numeric representation of the document content, in a vector database such as Amazon OpenSearch Service, and are retrieved by performing similarity search with a vector representation of the search query.

In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. They are looking for a reliable and scalable solution to implement robust access controls to make sure these documents are only accessible to individuals who have a legitimate business need and the appropriate level of authorization. The permission mechanism has to be secure, built on top of built-in security features, and scalable for manageability when the user base scales out. Maintaining proper access controls for these sensitive assets is paramount, because unauthorized access could lead to severe consequences, such as data breaches, compliance violations, and reputational damage.

In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona.

Common use cases

The following are industry-specific use cases for document access management across different departments:

  • In R&D and engineering, access to product design documents evolves from restricted to broader as development progresses
  • HR maintains open access to general policies while limiting access to sensitive employee information
  • Finance and accounting documents require varying levels of access for auditing and executive decision-making
  • Sales and marketing teams carefully manage customer data and strategies, implementing tiered access for different roles and departments

These examples demonstrate the need for dynamic, role-based access control to balance information sharing with confidentiality in various business contexts.

Solution overview

By combining the powerful vector search capabilities of OpenSearch Service with the access control features provided by Amazon Cognito, this solution enables organizations to manage access controls based on custom user attributes and document metadata.

This approach simplifies the management of access rights, making sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. Following this approach, you can manage the access to your organization’s documents at scale. The following diagram depicts the solution architecture.

Solution diagram

The solution workflow consists of the following steps:

  1. The user accesses a smart search portal and lands on a web interface deployed on AWS Amplify.
  2. The user authenticates through an Amazon Cognito user pool and an access token is returned to the client. This access token will be used to retrieve the key pair custom attributes assigned to the user. In our case, we created two custom attributes (custom:department and custom:access_level).
  3. For each user query, an API is invoked on Amazon API Gateway to process the request. Each invocation includes the user access token in the header.
  4. The API is integrated with AWS Lambda, which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG). The process starts by creating a vector based on the question (embedding) by invoking the embedding model.
  5. A query is sent to OpenSearch Service that includes the following:
    1. The embedding vector generated.
    2. User custom attributes retrieved by Lambda based on their access token, by calling the Amazon Cognito GetUser API.
    3. The query relies on the support of an efficient k-NN filter in OpenSearch Service to perform the search.
  6. Pre-filtered documents that relate to the user query are included in the prompt of the large language model (LLM) that summarizes the answer. Then, Lambda replies back to the web interface with the LLM completion (reply).
  7. If the user’s access needs to be modified (assigned attributes), an API call is made through API Gateway to a Lambda function that processes the request to add or update the custom attributes’ value for a specific user.
  8. New attributes are reflected in the user’s profile in Amazon Cognito.

Our solution is implemented and wrapped within AWS Cloud Development Kit (AWS CDK) stacks, which are available in the GitHub repo.

Our sample documents assume a fictional manufacturing company called Unicorn Robotics Factory, which develops robotic unicorns. The dataset contains over 900 documents that are a mix of engineering, roadmap, and business reporting documents. The following is an example of a document’s content:

**CONFIDENTIAL - UNICORNS ROBOTICS INTERNAL DOCUMENT**

**Project: "Galactic Unicorn"**

Unicorns Robotics is proud to announce the development of our latest project, the "Galactic Unicorn". 
This top-secret project aims to create a robotic unicorn that can travel through space and time, bringing magic and joy to children and adults alike.....

The associated metadata file for this document consists of the following:

{ "department": "research", "access_level": "confidential" }

Our solution in the GitHub repo takes care of loading the documents with associated metadata tags. For illustration purposes, we used the following mapping for the users and document access.

user access mapping

This solution is meant to delegate access management to the application tier, to simplify the implementation of use cases like generative AI-powered document search tools. However, if your use case requires a stricter approach to control document access, like multi-tenant environments or field-level security, you might want to use the fine-grained access control feature in OpenSearch Service. In our solution, we manage the access on the document level according to the assigned metadata.

Prerequisites

To deploy the solution, you need the following prerequisites:

Deploy the solution

To deploy the solution to your AWS account, refer to the Readme file in our GitHub repo.

Query documents with different personas

Now let’s test the application using different personas. In this example, we use the same users with their corresponding custom attributes as illustrated in the solution overview.

To start, let’s log in using the researcher account and run the search around a confidential document.

We ask, “What is the projected profit margin of the Galactic Unicorn project?” and get the result as shown in the following screenshot.

search using researcher access

The question invokes a query to OpenSearch Service using the custom attributes assigned to the researcher. The following code illustrates how the query is structured:

for attr, values in user_attributes.items():
        must_conditions.append(
            {
                "bool": {
                    "should": [{"term": {attr: value}} for value in values],
                    "minimum_should_match": 1,
                }
            }
        )

query = {
        "size": 5,
        "query": {
            "knn": {
                "doc_embedding": {
                    "vector": query_vector,
                    "k": 10,
                    "filter": {"bool": {"must": must_conditions}},
                }
            }
        },
    }

Let’s sign out and log in again with an engineer profile to test the same query. Based on the assigned attributes and document metadata, the result should look like that in the following screenshot.

search using engineer access

If you tried to query some support documents, you will get the desired answer, as shown in the following screenshot.

tech question by engineer

Modify user access

As depicted in the solution diagram, we’ve added a feature in the web interface to allow you to modify user access, which you could use to perform further tests. To do so, log in as a tool admin and choose Manage Attributes. Then modify the custom attribute value for a given user, as shown in the following screenshot.

access modification

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, but that’s not the case for all resources. The Amazon Simple Storage Service (Amazon S3) bucket, Amazon Cognito user pool, and OpenSearch Service domain will be retained by default. However, our AWS CDK code altered this default behavior by setting the RemovalPolicy to DESTROY for the mentioned resources. If you want to retain them, you can adjust the RemovalPolicy in the AWS CDK code for the different resources.

You can use the following command to clean up the resources deployed to your AWS account:

make destroy

Conclusion

This post illustrated how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and Amazon Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.

For demonstration purposes, the following points weren’t included in the AWS CDK code. However, they’re still applicable and you might want to work on them before deploying for production purposes:


About the Authors

Karim Akhnoukh is a Solutions Architect at AWS working with manufacturing customers in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Ahmed Ewis is a Senior Solutions Architect at AWS GenAI Labs. He helps customers build generative AI-based solutions to solve business problems. When not collaborating with customers, he enjoys playing with his kids and cooking.

Fortune Hui is a Solutions Architect at AWS Hong Kong, working with conglomerate customers. He helps customers and partners build big data platform and generative AI applications. In his free time, he plays badminton and enjoys whisky.

Threat modeling your generative AI workload to evaluate security risk

Post Syndicated from Danny Cortegaca original https://aws.amazon.com/blogs/security/threat-modeling-your-generative-ai-workload-to-evaluate-security-risk/

As generative AI models become increasingly integrated into business applications, it’s crucial to evaluate the potential security risks they introduce. At AWS re:Invent 2023, we presented on this topic, helping hundreds of customers maintain high-velocity decision-making for adopting new technologies securely. Customers who attended this session were able to better understand our recommended approach for qualifying security risk and maintaining a high security bar for the applications they build. In this blog post, we’ll revisit the key steps for conducting effective threat modeling on generative AI workloads, along with additional best practices and examples, including some typical deliverables and outcomes you should look for across each stage. Throughout this post we will link to specific examples that we created with the AWS Threat Composer tool. Threat Composer is an open source AWS tool you can use to document your threat model, available at no additional cost.

This post covers a practical approach for threat modeling a generative AI workload and assumes you know the basics of threat modeling. If you want to get an overview on threat modeling, we recommend that you check out this blog post. In addition, this post is part of a larger series on the security and compliance considerations of generative AI.

Why use threat modeling for generative AI?

Each new technology comes with its own learning curve when it comes to identifying and mitigating the unique security risks it presents. The adoption of generative AI into workloads is no different. These workloads, specifically the use of large language models (LLMs), introduce new security challenges because they can generate highly customized and non-deterministic outputs based on user prompts, which introduces the possibility for potential misuse or abuse. In addition, relies on access to large and customized data sets, often internal data sources which might contain sensitive information.

Although working with LLMs is a relatively new practice and has some unique and nuanced security risks and impacts, it’s crucial to remember that LLMs are only one portion of a larger workload. It’s important to apply the threat modeling approach to parts of the system, taking into account well-known threats such as injections or the compromise of credentials. Part 1 of the Securing generative AI AWS blog series, An introduction to the Generative AI Security Scoping Matrix, provides a great overview of what those nuances are, and how the risks differ depending on how you make use of LLMs in your organization.

The four stages of threat modeling for generative AI

As a quick refresher, threat modeling is a structured approach to identifying, understanding, addressing, and communicating the security risks in a given system or application. It is a fundamental element of the design phase that allows you to identify and implement appropriate mitigations and make fundamental security decisions as early as possible.

At AWS, threat modeling is a required input to initiating our Application Security (AppSec) process for the builder teams at AWS, and our builder teams get support from a Security Guardian to build threat models for their features or services.

A useful way of structuring the approach to threat modeling, created by expert Adam Shostack, involves answering four key questions. We’ll look into each one and how to apply them to your generative AI workload.

  1. What are we working on?
  2. What can go wrong?
  3. What are we going to do about it?
  4. Did we do a good enough job?

What are we working on?

This question aims to get a detailed understanding of your business context and application architecture. The detail that you’re looking for should already be captured as part of the comprehensive system documentation created by the builders of your generative AI solution. By starting from this documentation, you can streamline the threat modeling process and focus on identifying potential threats and vulnerabilities, rather than on re-creating foundational system knowledge.

Example outcomes or deliverables

At a minimum, builders should capture the key components of the solution, including data flows, assumptions, and design decisions. This lays the groundwork for identifying potential threats. Key elements to document are the following:

  • Data flow diagrams (DFDs) that clearly illustrate the critical data flows of the application, from request to response, detailing what happens at each component or “hop”
  • Well-articulated assumptions about how users are expected to interact with and ask questions of the system, or how the model will interact with other parts of the system. For example, in a RAG scenario where the model needs to retrieve data that is stored in other systems, how it will authenticate and translate that data into an appropriate response for the user
  • Documentation of key design decisions made by the business, including the rationale behind these decisions
  • Detailed business context about the application, such as whether it is considered a critical system, what types of data it handles (for example, confidential, high-integrity, high-availability), and the primary business concerns for the application (for example, data confidentiality, data integrity, system availability)

Figure 1 shows how Threat Composer allows you to input information about the application in the Application Information, Architecture, Dataflow, and Assumptions sections.

Figure 1: Threat composer dataflow diagram view for a generative AI chatbot example

Figure 1: Threat composer dataflow diagram view for a generative AI chatbot example

What can go wrong?

For this question, you identify possible threats to your application using the context and information you gathered for the previous question. To help you identify possible threats, make use of existing repositories of knowledge, especially those related to the new technologies you are adopting. These often have tangible examples that you can apply to your application. Useful resources are the OWASP top 10 for LLMs, MITRE ATLAS framework, and the AI Risk Repository. You can also use a structured framework such as STRIDE to aid you in your thinking. Use the information you received from the “What are we building?” question and apply the most relevant STRIDE categories to your thinking. For example, if your application hosts critical data that the business has no risk appetite for losing, then you might think about the various Information Disclosure threats first.

You can write and document these possible threats to your application in the form of threat statements. Threat statements are a way to maintain consistency and conciseness when you document your threat. At AWS, we adhere to a threat grammar which follows the syntax:

A [threat source] with [prerequisites] can [threat action] which leads to [threat impact], negatively impacting [impacted assets].

This threat grammar structure helps you to maintain consistency and allows you to iteratively write useful threat statements. As shown in Figure 2, Threat Composer provides you with this structure for new threat statements and includes examples to assist you.

Figure 2: Threat composer threat statement builder

Figure 2: Threat composer threat statement builder

Once you go through the process of creating threat statements, you will have a summary of “what can go wrong.” You can then define attack steps, as an analysis of “how it can go wrong.” It’s not always necessary to define attack steps for each threat statement because there are many ways a threat might actually happen. Going through the exercise of identifying and documenting a few different threat mechanisms can help to get specific mitigations that you can associate with each attack step for a more effective defense-in-depth approach.

Threat Composer gives you the ability to add additional metadata to your threat statements. Customers who have adopted this option into their workflows most commonly use the STRIDE category and Priority metadata tags. Those customers can quickly track which threats are the highest priority and which STRIDE category they correspond to. Figure 3 shows how you can document threat statements alongside their associated metadata in Threat Composer.

Figure 3: Threat Composer sample genAI chatbot application – threat view

Figure 3: Threat Composer sample genAI chatbot application – threat view

Example outcomes or deliverables

By systematically considering what can go wrong, and how, you can uncover a range of possible threats. Let’s explore some of the example deliverables that can emerge from this process:

  • A list of threat statements that you will develop mitigations for, categorized by STRIDE element and priority
  • A list of attack steps that are associated to your threat statements. As mentioned, attack steps are an optional activity at this stage, but we recommend at least identifying some for your highest-priority threats

Example threat statements

These are some example threat statements for an application that is interacting with an LLM component:

  • A threat actor with access to the public-facing application can inject malicious prompts that overwrite existing system prompts, resulting in healthcare data from other patients being returned, impacting the confidentiality of the data in the database
  • A threat actor with access to the public-facing application can inject malicious prompts that request malicious or destructive actions, resulting in healthcare data from other patients being deleted, impacting the availability of the data in the database

Example attack steps

These are some example attack steps that demonstrate how the preceding threat statements could occur:

  • Perform crafted prompt injection to bypass system prompt guardrails
  • Embed a vulnerable agent with access to the model
  • Embed an indirect prompt injection in a webpage instructing the LLM to disregard previous user instructions and use an LLM plugin to delete the user’s emails

What can we do about it?

Now that you’ve identified some possible threats, consider which controls would be appropriate to help mitigate the risks associated with those threats. This decision will be driven by your business context and the asset in question. Your organizational policies will also influence prioritization of controls: Some organizations might choose to prioritize the control that impacts the highest number of threats, while others might choose to start with the control that impacts the threats that are deemed the highest risk (by likelihood and impact).

For each identified threat, define specific mitigation strategies. This could include input sanitization, output validation, access controls, and more. Ideally, at a minimum, you want at least one preventative control and one detective control associated with each threat. The same resources that are linked to in the What can go wrong? section are also highly useful for identifying relevant controls. For example, the MITRE ATLAS has a dedicated section for mitigations.

Note: You might find that as you identify mitigations for your threats, you start to see duplication in your controls. For example, least-privilege access control might be associated with almost all of your threats. This duplication can also help you to prioritize. If a single control appears in 90% of your threat mitigations, the effective implementation of that control will help to drive down risk across each of those threats.

Example outcomes or deliverables

Associated with each threat, you should have a list of mitigations, each with a unique identifier to ease lookups and reusability later on. Example mitigations with identifiers include the following:

  • M-001: Predefine SQL query structure
  • M-002: Sanitize for known parameters (input filtering)
  • M-003: Check against templated prompt parameters
  • M-004: Review output is relevant to user (output filtering)
  • M-005: Limit LLM context window
  • M-006: Dynamic permissions check on high-risk actions performed by model (separating authentication parameters from prompt)
  • M-007: Apply least privilege to all components of the application

For more information on relevant security controls for your workload, we recommend that you read Part 3 of our Securing generative AI series: Applying relevant security controls.

Figure 4 shows some completed example threat statements in Threat Composer, with mitigations linked to each.

Figure 4: Completed threat statements with metadata and linked mitigations

Figure 4: Completed threat statements with metadata and linked mitigations

After answering the first three questions, you have your completed threat model. The documentation should contain your DFDs, threat statements, [optional] attack steps, and mitigations.

For a more detailed example, including a visual dashboard that shows a breakdown of a threat summary, see the full GenAI chatbot example in Threat Composer.

Did we do a good enough job?

A threat model is a living document. This post has discussed how creating a threat model helps you to identify technical controls for threats, but it’s also important to consider the non-technical benefits that the process of threat modeling provides.

For your final activity, you should validate both elements of the threat modeling activity.

Validate the effectiveness of the identified mitigation: Some of the mitigations you identify might be new, and some you might already have had in place. Regardless, it’s important to continuously test and verify that your security measures are working as intended. This could involve penetration testing or automated security scans. At AWS, threat models serve as inputs to automated test cases to be embedded in the pipeline. The threats defined are also used to define the scope of the penetration testing, to confirm whether those threats have been mitigated sufficiently.

Validate the effectiveness of the process: Threat modeling is fundamentally a human activity. It requires interaction across your business, builder teams, and security functions. Those closest to the creation and operations of the application should own the threat model document and revisit it often, with support from their security team (or Security Guardian equivalent). How often this is done will depend on your organizational policies and the criticality of the workload, though it is important to define triggers that will initiate a review of the threat model. Example triggers can include threat intelligence updates, new features that significantly change data flows, or new features that impact security-related aspects of the system (such as authentication or authorization, or logging). Validating your process periodically is especially important when you adopt new technologies because the threat landscape for these evolves faster than usual.

Performing a retrospective on the threat modeling process is also a good way to work through and discuss what worked well, what didn’t work well, and what changes you will commit to the next time the threat model is revisited.

Example outputs

These are some example outputs for this step of the process:

  • Automated test case definitions based on mitigations
  • A defined scope for penetration testing, and test cases based on threats
  • A living document for the threat model that is stored alongside application documentation (including a data flow diagram)
  • A retrospective overview, including lessons learned and feedback from the threat modeling participants, and what will be done next time to improve

Conclusion

In this blog post, we explored a practical and proactive approach to threat modeling for generative AI workloads. The key steps we covered provide a structured framework for conducting effective threat modeling, from understanding the business context and application architecture to identifying potential threats, defining mitigation strategies, and validating the overall effectiveness of the process.

By following this approach, organizations can better equip themselves to maintain a high security bar as they adopt generative AI technologies. The threat modeling process not only helps to mitigate known risks, but also fosters a culture of security-mindedness that is crucial for organizations to adopt. This can help your organization to unlock the full potential of these powerful technologies while maintaining the security and privacy of your systems and data.

Want to look deeper into additional areas of generative AI security? Check out the other posts in the Securing Generative AI series:

 

Danny Cortegaca

Danny Cortegaca

Danny is a Security Specialist Solutions Architect and is the Telco lead for AWS Industries. He joined AWS in 2021 and partners with some of the largest organizations in the world to help them navigate complex security and regulatory environments. He loves talking about application security with customers and has helped many adopt threat modeling into their practices.

Ana Malhotra

Ana Malhotra

Ana previously worked as a Security Specialist Solutions Architect at AWS and was the Healthcare and Life Sciences (HCLS) Security Lead for AWS Industry. She is no longer with AWS. As a former AWS Application Security Engineer, Ana loved talking all things AppSec, including people, process, and technology. In her free time, she enjoys tapping into her creative side with music and dance.

Kareem Abdol-Hamid

Kareem Abdol-Hamid

Kareem is a Senior Accelerated Compute Specialist for Startups. As an Accelerated Compute specialist, Kareem experiences novel challenges every day involving generative AI, High Performance Compute, and massively scaled workloads. In his free time, he plays piano and competes in the video game Street Fighter.

Enrich your AWS Glue Data Catalog with generative AI metadata using Amazon Bedrock

Post Syndicated from Manos Samatas original https://aws.amazon.com/blogs/big-data/enrich-your-aws-glue-data-catalog-with-generative-ai-metadata-using-amazon-bedrock/

Metadata can play a very important role in using data assets to make data driven decisions. Generating metadata for your data assets is often a time-consuming and manual task. By harnessing the capabilities of generative AI, you can automate the generation of comprehensive metadata descriptions for your data assets based on their documentation, enhancing discoverability, understanding, and the overall data governance within your AWS Cloud environment. This post shows you how to enrich your AWS Glue Data Catalog with dynamic metadata using foundation models (FMs) on Amazon Bedrock and your data documentation.

AWS Glue is a serverless data integration service that makes it straightforward for analytics users to discover, prepare, move, and integrate data from multiple sources. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API.

Solution overview

In this solution, we automatically generate metadata for table definitions in the Data Catalog by using large language models (LLMs) through Amazon Bedrock. First, we explore the option of in-context learning, where the LLM generates the requested metadata without documentation. Then we improve the metadata generation by adding the data documentation to the LLM prompt using Retrieval Augmented Generation (RAG).

AWS Glue Data Catalog

This post uses the Data Catalog, a centralized metadata repository for your data assets across various data sources. The Data Catalog provides a unified interface to store and query information about data formats, schemas, and sources. It acts as an index to the location, schema, and runtime metrics of your data sources.

The most common method to populate the Data Catalog is to use an AWS Glue crawler, which automatically discovers and catalogs data sources. When you run the crawler, it creates metadata tables that are added to a database you specify or the default database. Each table represents a single data store.

Generative AI models

LLMs are trained on vast volumes of data and use billions of parameters to generate outputs for common tasks like answering questions, translating languages, and completing sentences. To use an LLM for a specific task like metadata generation, you need an approach to guide the model to produce the outputs you expect.

This post shows you how to generate descriptive metadata for your data with two different approaches:

  • In-context learning
  • Retrieval Augmented Generation (RAG)

The solutions uses two generative AI models available in Amazon Bedrock: for text generation and Amazon Titan Embeddings V2 for text retrieval tasks.

The following sections describe the implementation details of each approach using the Python programming language. You can find the accompanying code in the GitHub repository. You can implement it step by step in Amazon SageMaker Studio and JupyterLab or your own environment. If you’re new to SageMaker Studio, check out the Quick setup experience, which allows you to launch it with default settings in minutes. You can also use the code in an AWS Lambda function or your own application.

Approach 1: In-context learning

In this approach, you use an LLM to generate the metadata descriptions. You employ prompt engineering techniques to guide the LLM on the outputs you want it to generate. This approach is ideal for AWS Glue databases with a small number of tables. You can send the table information from the Data Catalog as context in your prompt without exceeding the context window (the number of input tokens that most Amazon Bedrock models accept). The following diagram illustrates this architecture.

Approach 2: RAG architecture

If you have hundreds of tables, adding all of the Data Catalog information as context to the prompt may lead to a prompt that exceeds the LLM’s context window. In some cases, you may also have additional content such as business requirements documents or technical documentation you want the FM to reference before generating the output. Such documents can be several pages that typically exceed the maximum number of input tokens most LLMs will accept. As a result, they can’t be included in the prompt as they are.

The solution is to use a RAG approach. With RAG, you can optimize the output of an LLM so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, without the need to fine-tune the model. It is a cost-effective approach to improving LLM output, so it remains relevant, accurate, and useful in various contexts.

With RAG, the LLM can reference technical documents and other information about your data before generating the metadata. As a result, the generated descriptions are expected to be richer and more accurate.

The example in this post ingests data from a public Amazon Simple Storage Service (Amazon S3): s3://awsglue-datasets/examples/us-legislators/all. The dataset contains data in JSON format about US legislators and the seats that they have held in the U.S. House of Representatives and U.S. Senate. The data documentation was retrieved from and the Popolo specification http://www.popoloproject.com/.

The following architecture diagram illustrates the RAG approach.

 

The steps are as follows:

  1. Ingest the information from the data documentation. The documentation can be in a variety of formats. For this post, the documentation is a website.
  2. Chunk the contents of the HTML page of the data documentation. Generate and store vector embeddings for the data documentation.
  3. Fetch information for the database tables from the Data Catalog.
  4. Perform a similarity search in the vector store and retrieve the most relevant information from the vector store.
  5. Build the prompt. Provide instructions on how to create metadata and add the retrieved information and the Data Catalog table information as context. Because this is a rather small database, containing six tables, all of the information about the database is included.
  6. Send the prompt to the LLM, get the response, and update the Data Catalog.

Prerequisites

To follow the steps in this post and deploy the solution in your own AWS account, refer to the GitHub repository.

You need the following prerequisite resources:

 {
   "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Action": [
              "s3:GetObject",
              "s3:PutObject"
          ],
          "Resource": [
              "arn:aws:s3:::aws-gen-ai-glue-metadata-*/*"
          ]
        }
    ]
}
  • An IAM role for your notebook environment. The IAM role should have the appropriate permissions for AWS Glue, Amazon Bedrock, and Amazon S3. The following is an example policy. You can apply additional conditions to restrict it further for your own environment.
{
      "Version": "2012-10-17",
      "Statement": [
           {
                 "Sid": "GluePermissions",
                 "Effect": "Allow",
                 "Action": [
                      "glue:GetCrawler",
                      "glue:DeleteDatabase",
                      "glue:GetTables",
                      "glue:DeleteCrawler",
                      "glue:StartCrawler",
                      "glue:CreateDatabase",
                      "glue:UpdateTable",
                      "glue:DeleteTable",
                      "glue:UpdateCrawler",
                      "glue:GetTable",
                      "glue:CreateCrawler"
                 ],
                 "Resource": "*"
           },
           {
                 "Sid": "S3Permissions",
                 "Effect": "Allow",
                 "Action": [
                      "s3:PutObject",
                      "s3:GetObject",
                      "s3:CreateBucket",
                      "s3:ListBucket",
                      "s3:DeleteObject",
                      "s3:DeleteBucket"
                 ],
                 "Resource": "arn:aws:s3:::<bucket_name>"
           },
           {
                 "Sid": "IAMPermissions",
                 "Effect": "Allow",
                 "Action": "iam:PassRole",
                 "Resource": "arn:aws:iam::<account_ID>:role/GlueCrawlerRoleBlog"

           },
           {
                 "Sid": "BedrockPermissions",
                 "Effect": "Allow",
                 "Action": "bedrock:InvokeModel",
                 "Resource": [
                      "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
                      "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0"
                 ]
           }
      ]
}
  • Model access for Anthropic’s Claude 3 and Amazon Titan Text Embeddings V2 on Amazon Bedrock.
  • The notebook glue-catalog-genai_claude.ipynb.

Set up the resources and environment

Now that you have completed the prerequisites, you can switch to the notebook environment to run the next steps. First, the notebook will create the required resources:

  • S3 bucket
  • AWS Glue database
  • AWS Glue crawler, which will run and automatically generate the database tables

After you finish the setup steps, you will have an AWS Glue database called legislators.

The crawler creates the following metadata tables:

  • persons
  • memberships
  • organizations
  • events
  • areas
  • countries

This is a semi-normalized collection of tables containing legislators and their histories.

Follow the rest of the steps in the notebook to complete the environment setup. It should only take a few minutes.

Inspect the Data Catalog

Now that you have completed the setup, you can inspect the Data Catalog to familiarize yourself with it and the metadata it captured. On the AWS Glue console, choose Databases in the navigation pane, then open the newly created legislators database. It should contain six tables, as shown in the following screenshot:

You can open any table to inspect the details. The table description and comment for each column is empty because they aren’t completed automatically by the AWS Glue crawlers.

You can use the AWS Glue API to programmatically access the technical metadata for each table. The following code snippet uses the AWS Glue API through the AWS SDK for Python (Boto3) to retrieve tables for a chosen database and then prints them on the screen for validation. The following code, found in the notebook of this post, is used to get the data catalog information programmatically.

def get_alltables(database):
    tables = []
    get_tables_paginator = glue_client.get_paginator('get_tables')
    for page in get_tables_paginator.paginate(DatabaseName=database):
        tables.extend(page['TableList'])
    return tables

def json_serial(obj):
    if isinstance(obj, (datetime, date)):
        return obj.isoformat()
    raise TypeError ("Type %s not serializable" % type(obj))

database_tables =  get_alltables(database)

for table in database_tables:
    print(f"Table: {table['Name']}")
    print(f"Columns: {[col['Name'] for col in table['StorageDescriptor']['Columns']]}")

Now that you’re familiar with the AWS Glue database and tables, you can move to the next step to generate table metadata descriptions with generative AI.

Generate table metadata descriptions with Anthropic’s Claude 3 using Amazon Bedrock and LangChain

In this step, we generate technical metadata for a selected table that belongs to an AWS Glue database. This post uses the persons table. First, we get all the tables from the Data Catalog and include it as part of the prompt. Even though our code aims to generate metadata for a single table, giving the LLM wider information is useful because you want the LLM to detect foreign keys. In our notebook environment we install LangChain v0.2.1. See the following code:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from botocore.config import Config
from langchain_aws import ChatBedrock

glue_data_catalog = json.dumps(get_alltables(database),default=json_serial)


model_kwargs ={
    "temperature": 0.5, # You can increase or decrease this value depending on the amount of randomness you want injected into the response. A value closer to 1 increases the amount of randomness.
    "top_p": 0.999
}

model = ChatBedrock(
    client = bedrock_client,
    model_id=model_id,
    model_kwargs=model_kwargs
)

table = "persons"
response_get_table = glue_client.get_table( DatabaseName = database, Name = table )
pprint.pp(response_get_table)

user_msg_template_table="""
I'd like you to create metadata descriptions for the table called {table} in your AWS Glue data catalog. Please follow these steps:
1. Review the data catalog carefully
2. Use all the data catalog information to generate the table description
3. If a column is a primary key or foreign key to another table mention it in the description.
4. In your response, reply with the entire JSON object for the table {table}
5. Remove the DatabaseName, CreatedBy, IsRegisteredWithLakeFormation, CatalogId,VersionId,IsMultiDialectView,CreateTime, UpdateTime.
6. Write the table description in the Description attribute
7. List all the table columns under the attribute "StorageDescriptor" and then the attribute Columns. Add Location, InputFormat, and SerdeInfo
8. For each column in the StorageDescriptor, add the attribute "Comment". If a table uses a composite primary key, then the order of a given column in a table’s primary key is listed in parentheses following the column name.
9. Your response must be a valid JSON object.
10. Ensure that the data is accurately represented and properly formatted within the JSON structure. The resulting JSON table should provide a clear, structured overview of the information presented in the original text.
11. If you cannot think of an accurate description of a column, say 'not available'
Here is the data catalog json in <glue_data_catalog></glue_data_catalog> tags.
<glue_data_catalog>
{data_catalog}
</glue_data_catalog>
Here is some additional information about the database in <notes></notes> tags.
<notes>
Typically foreign key columns consist of the name of the table plus the id suffix
<notes>
"""
messages = [
    ("system", "You are a helpful assistant"),
    ("user", user_msg_template_table),
]

prompt = ChatPromptTemplate.from_messages(messages)

chain = prompt | model | StrOutputParser()

# Chain Invoke

TableInputFromLLM = chain.invoke({"data_catalog": {glue_data_catalog}, "table":table})
print(TableInputFromLLM)

In the preceding code, you instructed the LLM to provide a JSON response that fits the TableInput object expected by the Data Catalog update API action. The following is an example response:

{
  "Name": "persons",
  "Description": "This table contains information about individual persons, including their names, identifiers, contact details, and other relevant personal data.",
  "StorageDescriptor": {
    "Columns": [
      {
        "Name": "family_name",
        "Type": "string",
        "Comment": "The family name or surname of the person."
      },
      {
        "Name": "name",
        "Type": "string",
        "Comment": "The full name of the person."
      },
      {
        "Name": "links",
        "Type": "array<struct<note:string,url:string>>",
        "Comment": "An array of links related to the person, containing a note and URL."
      },
      {
        "Name": "gender",
        "Type": "string",
        "Comment": "The gender of the person."
      },
      {
        "Name": "image",
        "Type": "string",
        "Comment": "A URL or path to an image of the person."
      },
      {
        "Name": "identifiers",
        "Type": "array<struct<scheme:string,identifier:string>>",
        "Comment": "An array of identifiers for the person, each with a scheme and identifier value."
      },
      {
        "Name": "other_names",
        "Type": "array<struct<lang:string,note:string,name:string>>",
        "Comment": "An array of other names the person may be known by, including the language, a note, and the name itself."
      },

      {
        "Name": "sort_name",
        "Type": "string",
        "Comment": "The name to be used for sorting or alphabetical ordering."
      },
      {
        "Name": "images",
        "Type": "array<struct<url:string>>",
        "Comment": "An array of URLs or paths to additional images of the person."
      },
      {
        "Name": "given_name",
        "Type": "string",
        "Comment": "The given name or first name of the person."
      },
      {
        "Name": "birth_date",
        "Type": "string",
        "Comment": "The date of birth of the person."
      },
      {
        "Name": "id",
        "Type": "string",
        "Comment": "The unique identifier for the person (likely a primary key)."
      },
      {
        "Name": "contact_details",
        "Type": "array<struct<type:string,value:string>>",
        "Comment": "An array of contact details for the person, including the type (e.g., email, phone) and the value."
      },
      {
        "Name": "death_date",
        "Type": "string",
        "Comment": "The date of death of the person, if applicable."
      }
    ],
    "Location": "s3://<your-s3-bucket>/persons/",
    "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
    "SerdeInfo": {
      "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe",
      "Parameters": {
        "paths": "birth_date,contact_details,death_date,family_name,gender,given_name,id,identifiers,image,images,links,name,other_names,sort_name"
      }
    }
  },
  "PartitionKeys": [],
  "TableType": "EXTERNAL_TABLE"
}

You can also validate the JSON generated to make sure it conforms to the format expected by the AWS Glue API:

from jsonschema import validate

schema_table_input = {
    "type": "object",
    "properties" : {
            "Name" : {"type" : "string"},
            "Description" : {"type" : "string"},
            "StorageDescriptor" : {
            "Columns" : {"type" : "array"},
            "Location" : {"type" : "string"} ,
            "InputFormat": {"type" : "string"} ,
            "SerdeInfo": {"type" : "object"}
        }
    }
}
validate(instance=json.loads(TableInputFromLLM), schema=schema_table_input)

Now that you have generated table and column descriptions, you can update the Data Catalog.

Update the Data Catalog with metadata

In this step, use the AWS Glue API to update the Data Catalog:

response = glue_client.update_table(DatabaseName=database, TableInput= json.loads(TableInputFromLLM) )
print(f"Table {table} metadata updated!")

The following screenshot shows the persons table metadata with a description.

The following screenshot shows the table metadata with column descriptions.

Now that you have enriched the technical metadata stored in Data Catalog, you can improve the descriptions by adding external documentation.

Improve metadata descriptions by adding external documentation with RAG

In this step, we add external documentation to generate more accurate metadata. The documentation for our dataset can be found online as an HTML. We use the LangChain HTML community loader to load the HTML content:

from langchain_community.document_loaders import AsyncHtmlLoader

# We will use an HTML Community loader to load the external documentation stored on HTLM
urls = ["http://www.popoloproject.com/specs/person.html", "http://docs.everypolitician.org/data_structure.html",'http://www.popoloproject.com/specs/organization.html','http://www.popoloproject.com/specs/membership.html','http://www.popoloproject.com/specs/area.html']
loader = AsyncHtmlLoader(urls)
docs = loader.load()

After you download the documents, split the documents into chunks:

text_splitter = CharacterTextSplitter(
    separator='\n',
    chunk_size=1000,
    chunk_overlap=200,

)
split_docs = text_splitter.split_documents(docs)

embedding_model = BedrockEmbeddings(
    client=bedrock_client,
    model_id=embeddings_model_id
)

Next, vectorize and store the documents locally and perform a similarity search. For production workloads, you can use a managed service for your vector store such as Amazon OpenSearch Service or a fully managed solution for implementing the RAG architecture such as Amazon Bedrock Knowledge Bases.

vs = FAISS.from_documents(split_docs, embedding_model)
search_results = vs.similarity_search(
    'What standards are used in the dataset?', k=2
)
print(search_results[0].page_content)

Next, include the catalog information along with the documentation to generate more accurate metadata:

from operator import itemgetter
from langchain_core.callbacks import BaseCallbackHandler
from typing import Dict, List, Any


class PromptHandler(BaseCallbackHandler):
    def on_llm_start( self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> Any:
        output = "\n".join(prompts)
        print(output)

system = "You are a helpful assistant. You do not generate any harmful content."
# specify a user message
user_msg_rag = """
Here is the guidance document you should reference when answering the user:

<documentation>{context}</documentation>
I'd like to you create metadata descriptions for the table called {table} in your AWS Glue data catalog. Please follow these steps:

1. Review the data catalog carefully.
2. Use all the data catalog information and the documentation to generate the table description.
3. If a column is a primary key or foreign key to another table mention it in the description.
4. In your response, reply with the entire JSON object for the table {table}
5. Remove the DatabaseName, CreatedBy, IsRegisteredWithLakeFormation, CatalogId,VersionId,IsMultiDialectView,CreateTime, UpdateTime.
6. Write the table description in the Description attribute. Ensure you use any relevant information from the <documentation>
7. List all the table columns under the attribute "StorageDescriptor" and then the attribute Columns. Add Location, InputFormat, and SerdeInfo
8. For each column in the StorageDescriptor, add the attribute "Comment". If a table uses a composite primary key, then the order of a given column in a table’s primary key is listed in parentheses following the column name.
9. Your response must be a valid JSON object.
10. Ensure that the data is accurately represented and properly formatted within the JSON structure. The resulting JSON table should provide a clear, structured overview of the information presented in the original text.
11. If you cannot think of an accurate description of a column, say 'not available'
<glue_data_catalog>
{data_catalog}
</glue_data_catalog>
Here is some additional information about the database in <notes></notes> tags.
<notes>
Typically foreign key columns consist of the name of the table plus the id suffix
<notes>
"""
messages = [
    ("system", system),
    ("user", user_msg_rag),
]
prompt = ChatPromptTemplate.from_messages(messages)

# Retrieve and Generate
retriever = vs.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3},
)

chain = (  
    {"context": itemgetter("table")| retriever, "data_catalog": itemgetter("data_catalog"), "table": itemgetter("table")}
    | prompt
    | model
    | StrOutputParser()
)

TableInputFromLLM = chain.invoke({"data_catalog":glue_data_catalog, "table":table})
print(TableInputFromLLM)

The following is the response from the LLM:

{
  "Name": "persons",
  "Description": "This table contains information about individual persons, including their names, identifiers, contact details, and other personal information. It follows the Popolo data specification for representing persons involved in government and organizations. The 'person_id' column relates a person to an organization through the 'memberships' table.",
  "StorageDescriptor": {
    "Columns": [
      {
        "Name": "family_name",
        "Type": "string",
        "Comment": "The family or last name of the person."
      },
      {
        "Name": "name",
        "Type": "string",
        "Comment": "The full name of the person."
      },
      {
        "Name": "links",
        "Type": "array<struct<note:string,url:string>>",
        "Comment": "An array of links related to the person, with a note and URL for each link."
      },
      {
        "Name": "gender",
        "Type": "string",
        "Comment": "The gender of the person."
      },
      {
        "Name": "image",
        "Type": "string",
        "Comment": "A URL or path to an image representing the person."
      },
      {
        "Name": "identifiers",
        "Type": "array<struct<scheme:string,identifier:string>>",
        "Comment": "An array of identifiers for the person, with a scheme and identifier value for each."
      },
      {
        "Name": "other_names",
        "Type": "array<struct<lang:string,note:string,name:string>>",
        "Comment": "An array of other names the person may be known by, with language, note, and name for each."
      },
      {
        "Name": "sort_name",
        "Type": "string",
        "Comment": "The name to be used for sorting or alphabetical ordering of the person."
      },
      {
        "Name": "images",
        "Type": "array<struct<url:string>>",
        "Comment": "An array of URLs or paths to additional images representing the person."
      },
      {
        "Name": "given_name",
        "Type": "string",
        "Comment": "The given or first name of the person."
      },
      {
        "Name": "birth_date",
        "Type": "string",
        "Comment": "The date of birth of the person."
      },
      {
        "Name": "id",
        "Type": "string",
        "Comment": "The unique identifier for the person. This is likely a primary key."
      },
      {
        "Name": "contact_details",
        "Type": "array<struct<type:string,value:string>>",
        "Comment": "An array of contact details for the person, with a type and value for each."
      },
      {
        "Name": "death_date",
        "Type": "string",
        "Comment": "The date of death of the person, if applicable."
      }
    ],
    "Location": "s3:<your-s3-bucket>/persons/",
    "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
    "SerdeInfo": {
      "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
    }
  }
}

Similar to the first approach, you can validate the output to make sure it conforms to the AWS Glue API.

Update the Data Catalog with new metadata

Now that you have generated the metadata, you can update the Data Catalog:

response = glue_client.update_table(DatabaseName=database, TableInput= json.loads(TableInputFromLLM) )
print(f"Table {table} metadata updated!")

Let’s inspect the technical metadata generated. You should now see a newer version in the Data Catalog for the persons table. You can access schema versions on the AWS Glue console.

Note the persons table description this time. It should differ slightly from the descriptions provided earlier:

  • In-context learning table description – “This table contains information about persons, including their names, identifiers, contact details, birth and death dates, and associated images and links. The ‘id’ column is the primary key for this table.”
  • RAG table description – “This table contains information about individual persons, including their names, identifiers, contact details, and other personal information. It follows the Popolo data specification for representing persons involved in government and organizations. The ‘person_id’ column relates a person to an organization through the ‘memberships’ table.”

The LLM demonstrated knowledge around the Popolo specification, which was part of the documentation provided to the LLM.

Clean up

Now that you have completed the steps described in the post, don’t forget to clean up the resources with the code provided in the notebook so you don’t incur unnecessary costs.

Conclusion

In this post, we explored how you can use generative AI, specifically Amazon Bedrock FMs, to enrich the Data Catalog with dynamic metadata to improve the discoverability and understanding of existing data assets. The two approaches we demonstrated, in-context learning and RAG, showcase the flexibility and versatility of this solution. In-context learning works well for AWS Glue databases with a small number of tables, whereas the RAG approach uses external documentation to generate more accurate and detailed metadata, making it suitable for larger and more complex data landscapes. By implementing this solution, you can unlock new levels of data intelligence, empowering your organization to make more informed decisions, drive data-driven innovation, and unlock the full value of your data. We encourage you to explore the resources and recommendations provided in this post to further enhance your data management practices.


About the Authors

Manos Samatas is a Principal Solutions Architect in Data and AI with Amazon Web Services. He works with government, non-profit, education and healthcare customers in the UK on data and AI projects, helping build solutions using AWS. Manos lives and works in London. In his spare time, he enjoys reading, watching sports, playing video games and socialising with friends.

Anastasia Tzeveleka is a Senior GenAI/ML Specialist Solutions Architect at AWS. As part of her work, she helps customers across EMEA build foundation models and create scalable generative AI and machine learning solutions using AWS services.

Amazon Q Developer plugins now generally available for the AWS Management Console

Post Syndicated from Shardul Vaidya original https://aws.amazon.com/blogs/devops/amazon-q-developer-plugins-now-generally-available/

Today, Amazon Web Services (AWS) announced the launch and general availability of Amazon Q Developer plugins for Datadog and Wiz in the AWS Management Console. When chatting with Amazon Q in the console, customers can access a subset of information from Datadog and Wiz services using natural language. Ask questions like @datadog do I have any active alerts? or @wiz what are my top 3 security issues today? to swiftly identify and fix problems without leaving the console.

Engineers and IT professionals can struggle with tool sprawl throughout an application’s operational lifecycle. Amazon Q Developer’s third-party plugin system works towards creating a single pane of glass for all your SaaS solutions.

In this post, we’ll explore:

  • How Q Developer plugins work
  • How to use these plugins to:
    • Understand the state of your infrastructure
    • Query and brainstorm on present issues
    • Generate code and CLI commands to use third-party systems
  • How to get started

Our goal is for you to gain a comprehensive understanding of how the third-party plugins will improve your operational productivity.

How do Q Developer plugins work

Amazon Q in the console uses the prefix you provide to select which plugin to query. This provides additional context on your request and the state of your infrastructure. Key processes include:

  • Intent recognition: Amazon Q Developer interprets your chat request’s intent. It searches through relevant APIs it can invoke and selects the correct workflow to get more context.
  • API invocation: Amazon Q Developer then calls the appropriate third-party APIs to gather relevant information. Neither the AWS context included in the chat nor any information from your prompt is passed to the third-party.
  • Response Generation: After obtaining the enriched context and original prompt, Amazon Q Developer composes a complete prompt. Amazon Q uses this to generate the best response.
  • Guardrails: The system checks the response against Amazon Q Developer guardrails to ensure it follows best practices.

This system enables Amazon Q Developer to, understand intent, request additional information, and provide rich assistance across your infrastructure and application operations.

Let’s see how each of the third-party plugins can help in a set of real-world use-cases.

Amazon Q Developer plugin for Datadog

Datadog, an AWS Advanced Technology Partner and observability and security platform for cloud applications, provides AWS customers with unified, real-time observability and security across their entire technology stack. Datadog unifies all of your telemetry in one place, so teams can troubleshoot, optimize, and secure resources at scale. If you use Datadog to
monitor your AWS infrastructure and applications, you can query a subset of information from Datadog without leaving the AWS console by prefixing your Amazon Q queries with @datadog.

Learn to use Datadog in your workloads

You can ask about how Datadog features work with certain AWS services, by asking questions like @datadog how do I use APM on my EC2 instance?
Gif of Q Developer plugin for Datadog answering a question about how to use APM on EC2

Retrieve and summarize cases and monitors

You can ask about specific cases, monitors, or specify properties of a case to get more information about it and include it in your conversation by asking questions like @datadog list my cases. With a follow up to quickly get a summary of your top cases, @datadog summarize my top cases

Gif of Q Developer plugin for Datadog answering a question about all the current cases in the connected instance of datadogGif of Q Developer plugin for Datadog answering a question summarizing the top cases in the connected instance of datadog

Check and list monitors in alarm

You can ask about specific application monitors as well, including which monitors are in alarm, Amazon Q Developer also allows follow-up questions about which alarmed monitors. You can start with a question like, @datadog list my current monitors

Gif of Q Developer plugin for Datadog listing out all the monitors in the connected instance of datadogGif of Q Developer plugin for Datadog stating that there are currently no monitors in an alarmed state in the connected instance of Datadog

And then follow it up with a question like, @datadog List some of the resources that are triggering the alarm

Amazon Q Developer plugin for Wiz

With Wiz, organizations can democratize security across the development lifecycle, empowering them to build fast and securely. As an AWS Security Competency Partner, Wiz is committed to effectively reducing risk for AWS customers by seamlessly integrating into AWS services. If you use Wiz to monitor your AWS infrastructure and applications, then you can query Wiz without leaving the console by prefixing your queries with @wiz.

View issues with critical severity

You can ask Q Developer to retrieve the specifics of your issues in Wiz, the plugin can currently return up to 10 issues and you can focus on a specific severity with a question like, @wiz list the issues with critical severity
With that response, we can also ask it to find the top issues, with a follow-up question like, @wiz can you specify the top 5?
Gif of Q Developer plugin for Wiz showing how many critical severity issues detected by the connected instance of Wiz

Find your critical resources

Wiz defines the security posture of your AWS resources based on their configuration and how many critical issues that are associated with them. Amazon Q Developer can ask Wiz which are the least secure resources with a question like, @wiz what are the critical resources in my AWS environment?
Gif of Q Developer plugin for Wiz listing out all the critical resources noted by the connected Wiz instance

List issues based on certain properties

Wiz tracks security issues that exist in your AWS account and you can ask Amazon Q Developer to list issues based on date, status, severity or type, with questions like, @wiz what issues are due next?
Gif of Q Developer plugin for Wiz listing the next few issues listed in the connected Wiz instance

Assess issues with security vulnerabilities

Wiz tracks external vulnerabilities and exposures that can potentially pose a security threat associated with your current resources and issues. Amazon Q Developer can ask Wiz which are the pertinent vulnerabilities with a question like, @wiz what are my issues that have been created in the last 7 days?
Gif of Q Developer plugin for Wiz listing the issues that Wiz lists are newest

Getting Started

To enable third-party Plugin capability in the Amazon Q Developer console:

  1. To use third-party plugins, subscribe to Amazon Q Developer Pro Tier if you don’t already have it. This activates plugins at an organizational level.
  2. If you don’t already have a Amazon Q Administrator Role/User, create one using either the AmazonQFullAccess / AmazonQDeveloperAccess managed policies, or follow the instructions in the Q Developer user guide for security and IAM permissions.
  3. Configure the plugins – To activate the plugins, you must configure their credentials to authenticate into the third-party system. This is possible through a new tab called “Plugins” in the Amazon Q Developer dashboard. The plugins require credentials from the third parties to authenticate and call APIs specific to your accounts. They’re stored in your AWS account in Secrets Manager.
    Image of the Amazon Q Developer dashboard in the AWS Management Console showing the new Plugin sidebar item

    1. Datadog – Follow the instructions in the Datadog API documentation to create a Datadog API key and copy over the Site URL, API Key, and application key to authorize Q Developer with your instance of Datadog.
      Image of the Amazon Q Developer dashboard showing the configuration screen for the Datadog plugin requesting the Site URL, API key, and application key for the instance of Datadog you wish to connect Image of the Datadog settings UI showing where to get the Site URL, API key, and application key
    2. Wiz – Follow the instructions in the Wiz Service account documentation to create a client ID, the client secret generated by wiz, and then retrieve the Wiz API endpoint URL to connect Amazon Q Developer to Wiz.
      Image of the Amazon Q Developer dashboard showing the configuration screen for the Wiz plugin; requesting the client ID, client secret, and the Wiz API endpoint URL for the instance you wish to connectImage of the Wiz UI settings UI showing where to get the client ID, client secret, and the API endpoint URL
  4. Query the new plugins – With the @datadog and @wiz prefixes, you can ask a wide variety of questions and get operational assistance leveraging from third-party SaaS products. This allows you to integrate data from all sources with lower overhead and friction.
  5. Iterate and refine – Try rephrasing or explicitly including more context about the request by mentioning dates or issue severity. Providing more relevant information helps Amazon Q Developer better understand your request.

For best results with third-party plugins, understand what you’re looking for and use terminology specfic to the third-party. Avoid overly broad queries to guide Amazon Q Developer effectively.

Conclusion

In this post, we introduced Amazon Q Developer’s third-party plugins in chat via the @datadog and @wiz prefixes highlighting the benefits of using plugins when trying to leverage generative AI across multiple services. By allowing Q Developer to understand and analyze the state of your infrastructure across services, third-party plugins unlock new boundaries for operational productivity gains.

Shardul Vaidya is a Worldwide Partner Solutions Architect with AWS, focused on helping partners and customers build and effectively use Generative AI powered developer experiences. Shardul joined AWS in 2020 as part of their early career talent Solutions Architect team and worked with over a hundred modernization and DevOps partners across the world. Outside of work, he’s a music lover and collects records.

Metasense V2: Enhancing, improving and productionisation of LLM powered data governance

Post Syndicated from Grab Tech original https://engineering.grab.com/metasense-v2

Introduction

In the initial article, LLM Powered Data Classification, we addressed how we integrated Large Language Models (LLM) to automate governance-related metadata generation. The LLM integration enabled us to resolve challenges in Gemini, such as restrictions on the customisation of machine learning classifiers and limitations of resources to train a customised model. Gemini is a metadata generation service built internally to automate the tag generation process using a third-party data classification service. We also focused on LLM-powered column-level tag classifications. The classified tags, combined with Grab’s data privacy rules, allowed us to determine sensitivity tiers of data entities. The affordability of the model also enables us to scale it to cover more data entities in the company. The initial model scanned more than 20,000 data entries, at an average of 300-400 entities per day. Despite its remarkable performance, we were aware that there was room for improvement in the areas of data classification and prompt evaluation.

Improving the model post-rollout

Since its launch in early 2024, our model has gradually grown to cover the entire data lake. To date, the vast majority of our data lake tables have undergone analysis and classification by our model. This has significantly reduced the workload for Grabbers. Instead of manually classifying all new or existing tables, Grabbers can now rely on our model to assign the appropriate classification tier accurately.

Despite table classification being automated, the data pipeline still requires owners to manually perform verification to prevent any misclassifications. While it is impossible to entirely eliminate human oversight from critical machine learning workflows, the team has dedicated substantial time post-launch to refining the model, thereby safely minimising the need for human intervention.

Utilising post-rollout data

Following the deployment of our model and receipt of extensive feedback from table owners, we have accumulated a large dataset to further enhance the model. This data, coupled with the dataset of manual classifications from the Data Governance Office to ensure compliance with information classification protocols, serves as the training and testing datasets for the second iteration of our model.

Model improvements with prompt engineering

Expanding the evaluation and testing data allowed us to uncover weaknesses in the previous model. For instance, we discovered that seemingly innocuous table columns like “business email” could contain entries with Personal Identifiable Information (PII) data.

An example of this would be a business that uses a personal email address containing a legal name—a discrepancy that would be challenging for even human reviewers to detect. Additionally, we discovered nested JSON structures occasionally included personal names, phone numbers, and email addresses hidden among other non-PII metadata. Lastly, we identified passenger communications with Grab occasionally mentioning legal names, phone numbers, and other PII, despite most of the content being non-PII.

Ultimately, we hypothesised the model’s main issue was model capacity. The model displayed difficulty focusing on large data samples containing a mixture of PII and non-PII data despite having a good understanding of what constitutes PII. Just like humans, when given high volumes of tasks to work on simultaneously, the model’s effectiveness is reduced. In the original model, 13 out of 21 tags were aimed at distinguishing different types of non-PII data. This took up significant model capacity and distracted the model from its actual task: identifying PII data.

To prevent the model from being overwhelmed, large tasks are divided into smaller, more manageable tasks, allowing the model to dedicate more attention to each task. The following measures were taken to free up model capacity:

  1. Splitting the model into two parts to make problem solving more manageable.
    • One part for adding PII tags.
    • Another part for adding all other types of tags.
  2. Reducing the number of tags for the first part from 21 to 8 by removing all non-PII tags. This simplifies the task of differentiating types of data.

  3. Using clear and concise language, removing unnecessary detail. This was done by reducing word count in prompt from 1,254 to 737 words for better data analysis.

  4. Splitting tables with more than 150 columns into smaller tables. Fewer table rows means that the LLM has sufficient capacity to focus on each column.

Enabling rapid prompt experimentation and deployment

In our quest to facilitate swift experimentation with various prompt versions, we have empowered a diverse team of data scientists and engineers to work together effectively on the prompts and service. This has been made possible by upgrading our model architecture to incorporate the LangChain and LangSmith frameworks.

LangChain introduces a novel framework that streamlines the process from raw input to the desired outcome by chaining interoperable components. LangSmith, on the other hand, is a unified DevOps platform that fosters collaboration among various team members and developers, including product managers, data scientists, and software engineers. It simplifies the processes of development, collaboration, testing, deployment, and monitoring for all involved.

Our new backend leverages LangChain to construct an updated model that supports classification tasks for both non-PII and PII tagging. Integration with LangSmith enables data scientists to directly develop prompt templates and conduct experiments via the LangSmith user interface. In addition, managing the evaluation dataset on LangSmith provides a clear view of the performance of prompts across multiple custom metrics.

The integration of LangChain and LangSmith has significantly improved our model architecture, fostering collaboration and continuous improvement. This has not only streamlined our processes but also enhanced the transparency of our performance metrics. By harnessing the power of these innovative tools, we are better equipped to deliver high-quality, efficient solutions.

The benefits of the LangChain and LangSmith framework enhancements in Metasense are summarised as follows:

Streamlined prompt optimisation process.

Data scientists can create, update, and evaluate prompts directly on the LangSmith user interface and save them in commit mode. For rapid deployment, the prompt identifier in service configurations can be easily adjusted.

Figure 1: Streamlined prompt optimisation process.

Transparent prompt performance metrics.

LangSmith’s capabilities allow us to effortlessly run evaluations on a dataset and obtain performance metrics across multiple dimensions, such as accuracy, latency, and error rate.

Assuring quality in perpetuity

With exceptionally low misclassification rates recorded, table owners can place greater trust in the model’s outputs and spend less time reviewing them. Nevertheless, as a prudent safety measure, we have set up alerts to monitor misclassification rates periodically, sounding an internal alarm if the rate crosses a defined threshold. A model improvement protocol has also been set in place for such alarms.

Conclusion

The integration of LLM into our metadata generation process has significantly improved our data classification capabilities, reducing manual workloads and increasing accuracy. Continuous improvements, including the adoption of LangChain and LangSmith frameworks, have streamlined prompt optimisation and enhanced collaboration among our team. With low misclassification rates and robust safety measures, our system is both reliable and scalable, fostering trust and efficiency. In conclusion, these advancements ensure we remain at the forefront of data governance, delivering high-quality solutions and valuable insights to our stakeholders.

We would like to express our sincere gratitude to Infocomm Media Development Authority (IMDA) for supporting this initative.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Post Syndicated from Raghu Kuppala original https://aws.amazon.com/blogs/big-data/write-queries-faster-with-amazon-q-generative-sql-for-amazon-redshift/

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. Amazon Q generative SQL brings the capabilities of generative AI directly into the Amazon Redshift query editor. Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. With over 85,000 queries executed in preview, Amazon Redshift announced the general availability in September 2024.

Amazon Q generative SQL for Amazon Redshift uses generative AI to analyze user intent, query patterns, and schema metadata to identify common SQL query patterns directly within Amazon Redshift, accelerating the query authoring process for users and reducing the time required to derive actionable data insights. It provides a conversational interface where users can submit queries in natural language within the scope of their current data permissions. Generative SQL uses query history for better accuracy, and you can further improve accuracy through custom context, such as table descriptions, column descriptions, foreign key and primary key definitions, and sample queries. Custom context enhances the AI model’s understanding of your specific data model, business logic, and query patterns, allowing it to generate more relevant and accurate SQL recommendations. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata.

Within this feature, user data is secure and private. Your data is not shared across accounts. Your queries, data and database schemas are not used to train a generative AI foundational model (FM). Your input is used as contextual prompts to the FM to answer only your queries.

In this post, we show you how to enable the Amazon Q generative SQL feature in the Redshift query editor and use the feature to get tailored SQL commands based on your natural language queries. With Amazon Q, you can spend less time worrying about the nuances of SQL syntax and optimizations, allowing you to concentrate your efforts on extracting invaluable business insights from your data.

Solution overview

At a high level, the feature works as follows:

  1. For generating the SQL code, you can write your query request in plain English within the conversational interface in the Redshift query editor.
  2. The query editor sends the query context to the underlying Amazon Q generative SQL platform, which uses generative AI to generate SQL code recommendations based on your Redshift metadata.
  3. You receive the generated SQL code suggestions within the same chat interface.

The following diagram illustrates this workflow.

Your content processed by generative SQL is not stored or used by AWS for service improvement.

Amazon Q generative SQL uses a large language model (LLM) and Amazon Bedrock to generate the SQL query. AWS uses different techniques, such as prompt engineering and Retrieval Augmented Generation (RAG), to query the model based on your context:

  • The database you’re connected to
  • The schema you’re working on
  • Your query history
  • Optionally, the query history of other users connected to the same endpoint

Amazon Q generative SQL is conversational, and you can ask it to refine a previously generated query.

In the following sections, we demonstrate how to enable the generative SQL feature in the Redshift query editor and use it to generate SQL queries using natural language.

Prerequisites

To get started, you need an Amazon Redshift Serverless endpoint or an Amazon Redshift provisioned cluster. For this post, we use Redshift Serverless. Refer to Easy analytics and cost-optimization with Amazon Redshift Serverless to get started.

Enable the Amazon Q generative SQL feature in the Redshift query editor

If you’re using the feature for the first time, you need to enable the Amazon Q generative SQL feature in the Redshift query editor.

To enable the feature, complete the following steps:

  1. On the Amazon Redshift console, open the Redshift Serverless dashboard.
  2. Choose Query data.

You can also choose Query Editor V2 in the navigation pane of the Amazon Redshift console.

When you open the Redshift query editor, you will see the new icon for Amazon Q next to the database dropdown menu on the top of the query editor console.

If you choose the Amazon Q icon, you will see the message “Amazon Redshift query editor V2 now supports generative SQL functionality. Contact your administrator to activate this feature in Settings.” If you’re not the administrator, you need to work with the account administrator to enable this feature.

  1. If you’re the administrator, choose the hyperlink in the message, or go to the settings icon and choose Generative SQL settings.
  2. In the Generative SQL settings section, select Q generative SQL, which will turn on Amazon Q generative SQL for all users of the account.

Amazon Q generative SQL is personalized to your database and, based on the updates or conversations you have had with the feature, will apply those learnings to other user conversations who connect to the same database with their own credentials. In the generative SQL settings, you can see the instructions to grant the sys:monitor role to a user or role.

  1. Choose Save.

You will receive a confirmation that the Amazon Q generative SQL settings have been successfully updated.

Load notebooks with sample TPC-DS data

The Redshift query editor comes with sample data and SQL notebooks that you can load into a sample database and corresponding schema. For this post, we use TPC-DS for a decision support benchmark.

We start by loading the TPC-DS data into the Redshift database. When you load this data, the schema tpcds is updated with sample data. We also use the provided notebooks with the tpcds schema to run queries to build a query history.

Complete the following steps:

  1. Connect to your Redshift Serverless workgroup or Redshift provisioned cluster.
  2. Navigate to the sample_data_dev database to view the sample databases available for running the generative SQL feature.
  3. Hover over the tpcds schema and choose Open sample notebooks.
  4. In the Create sample database pop-up message, choose Create.

In a few seconds, you will see the notification that the database sample_data_dev is created successfully and tpcds sample data is loaded successfully. Two sample notebooks for the schema are also generated.

  1. Choose Run all on each notebook tab.

This will take a few minutes to run and will establish a query history for the tpcds data.

This step is not mandatory for using the feature for your organization’s data warehouse.

Use Amazon Q to generate SQL queries from natural language

Now that the Amazon Q generative SQL feature is enabled and ready for use, open a new notebook and choose the Amazon Q icon to open a chat pane in the Redshift query editor.

Amazon Q generative SQL is personalized to your schema. It uses metadata from database schemas to improve the SQL query suggestions. Optionally, administrators can allow the use of the account’s query history to further improve the generated SQL. This can be enabled by running the following GRANT commands to provide access to your query history to other roles or users:

GRANT ROLE SYS:MONITOR to "IAMR:role-name";
GRANT ROLE SYS:MONITOR to "IAM:user-name";
GRANT ROLE SYS:MONITOR to "database-username";

This optional step allows users to make query monitoring history available to other users connected to the same database.

Let’s get started with some query examples.

  1. First, make sure you’re connected to sample_data_dev
  2. Let’s ask the query “What are the top 10 stores in sales in 1998?”

This generates a SQL query. Amazon Q generative SQL is also personalized to your data domain. You will notice that it joins to the Store table to retrieve store_name.

  1. Choose Add to notebook under the query to add the generated SQL.

Our query runs successfully and shows that the store able has the most sales.

  1. Amazon Q is personalized to your conversation. Suppose you want to know what the top selling item was for store able. You can ask this question “What was the unique identifier of the top selling item for the store ‘able’?”

The results show the top selling item. However, the query didn’t filter on the year.

  1. Let’s ask Amazon Q to give us the top selling item for store able in 1998. Instead of repeating the whole question again, you can simply ask “Can you filter by the year 1998?”

Now we have the top selling item for store able for 1998.

  1. To display the item description, you can ask the query “Can you modify the query to include its name and description?”

Amazon Q added the join to the item table and the query ran successfully.

Now that we have done some basic queries, let’s do some deeper analysis.

  1. Let’s ask Amazon Q “Can you give me aggregated store sales, for each county by quarter for all years?”

The answer is correct, but let’s ask a follow-up to include the state.

  1. Ask the follow-up question: “Can you include state?”

This answer looks good; you can also add an ORDER BY clause if you want the data sorted or ask Amazon Q to add that.

So far, we have only been looking at store_sales data. The TPC-DS data contains data for other sales channels, including web_sales and catalog_sales.

  1. Let’s ask Amazon Q “Can you give me the total sales for 1998, from different sales channels, using a union of the sales data from different channels?”

Let’s dive deeper into some other capabilities of Amazon Q generative SQL.

  1. Let’s try logging in with a different user and see how Amazon Q generative SQL interacts with that user. We have created User3 and granted the sys:monitor
  2. Logged in as User3, let’s ask the original question of “What are the top 10 stores in sales in 1998?”

Amazon Q generative SQL is able to use the query history and provide SQL recommendations for User3’s prompts because they have access to the system metadata provided through the role sys:monitor.

Safety features

Amazon Q generative SQL has built-in safety features to warn if a generated SQL statement will modify data and will only run based on user permissions. To test this, let’s ask Amazon Q to “delete data from web_sales table.”

Amazon Q gives a message “I detected that this query changes your database. Only run this SQL command if that is appropriate.”

Now, still logged in as User3, choose Run to try to delete the web_sales data.

As expected, User3 gets a permission denied error, because they don’t have the necessary privileges to delete the web_sales table.

Custom context

Custom context is a feature that allows you to provide domain-specific knowledge and preferences, giving you fine-grained control over the SQL generation process.

The custom context is defined in a JSON file, which can be uploaded by the query editor administrator or can be added directly in the Custom context section in Amazon Q generative SQL settings.

This JSON file contains information that helps Amazon Q generative SQL better understand the specific requirements and constraints of your domain, enabling it to generate more targeted and relevant SQL queries.

By providing a custom context, you can influence factors such as:

  • The terminology and vocabulary used in the generated SQL
  • The level of complexity and optimization of the SQL queries
  • The formatting and structure of the SQL statements
  • The data sources and tables that should be considered

The custom context feature empowers you to take a more active role in shaping the SQL generation process, leading to SQL queries that are better suited to your data and business requirements.

In this post, we use the BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) sample dataset, consisting of three tables. BIRD represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing.

You can load the following BIRD sample dataset into your Redshift data warehouse to experiment with using custom contexts.

For this post, we demonstrate with three custom contexts.

TablesToInclude

TablesToInclude specifies a set of tables that are considered for SQL generation. This field is crucial when you want to limit the scope of SQL queries to a defined subset of available tables. It can help optimize the generation process by reducing unnecessary table references.

Let’s ask Amazon Q “List the distinct translated title and the set code of all cards translated into Spanish.”

This SQL unnecessarily uses the public.cards table. The public.set_translations table contains the data sufficient to answer the question.

We can add the following TablesToInclude custom context JSON:

{
  "resources": [
    {
      "ResourceId":"Serverless:Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "TablesToInclude": [
        "bird.public.set_translations"
      ]
    }
  ]
}

After adding the custom context, the unwanted joins are eliminated and the correct SQL is generated.

ColumnAnnotations

ColumnAnnotations allows you to provide metadata or annotations specific to individual columns in your data tables. These annotations can offer valuable insights into the definitions and characteristics of the columns, which can be beneficial in guiding the SQL generation process.

Let’s ask Amazon Q to “Show me the unconverted mana cost and name for all the cards created by Rob Alexander.”

The generated SQL points to the column convertedmanacost, which doesn’t give a value for unconverted mana cost. The manacost column gives the unconverted mana cost.

Let’s add this using ColumnAnnotations in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "ColumnAnnotations":
         {"bird.public.cards": { "manaCost": "manaCost is the unconverted mana"} }
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

CuratedQueries

CuratedQueries provides a set of predefined question and answer pairs. In this set, the questions are written in natural language and the corresponding answers are the SQL queries that should be generated to address those questions.

These examples serve as a valuable reference point for Amazon Q generative SQL, helping it understand the types of queries it is expected to generate. You can guide Amazon Q generative SQL with the desired format, structure, and content of the SQL queries it should produce.

Let’s ask Amazon Q “List down the name of artists for cards in Chinese Simplified.”

Although the join key multiverseid exists, it is not correct.

Let’s add the following using CuratedQueries in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "CuratedQueries": [
        {
          "Question": "List down the name of artists for cards in Spanish.",
          "Answer": "SELECT artist FROM public.cards c JOIN public.foreign_data f ON c.uuid = f.uuid WHERE f.language = 'Spanish';"
        }
      ]
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

Additional features

In this section, we discuss the supporting features available with Amazon Q generative SQL feature for Redshift query editor:

Provide feedback

Amazon Q generative SQL allows you to provide feedback on the SQL queries it generates, helping improve the quality and relevance of the SQL over time. This feedback mechanism is accessible through the Amazon Q generative SQL interface, where you can indicate whether the generated SQL was helpful or not.

If you find the generated SQL to not be helpful, you can categorize the feedback into the following areas:

  • Incorrect Tables/Columns – This indicates that the SQL references the wrong tables or columns, or is missing essential tables or columns
  • Incorrect Predicates/Literals/Group By – This category covers issues with the SQL’s filter conditions, literal values, or grouping logic
  • Incorrect SQL Structure – This feedback suggests that the overall structure or syntax of the generated SQL is not correct
  • Other – This option allows you to provide feedback that doesn’t fit into the preceding categories

In addition to selecting the appropriate feedback category, you can also provide free text comments to elaborate on the specific issues or inaccuracies you found in the generated SQL. This additional information can be valuable for Amazon Q to better understand the problems and make improvements.

By actively providing this feedback, you play a crucial role in refining the generation capabilities of Amazon Q generative SQL. The feedback you provide helps the service learn from its mistakes, leading to more accurate and relevant SQL queries that better meet your needs over time.

This feedback loop is an important part of Amazon Q generative SQL’s continuous improvement, because it allows the service to adapt and evolve based on your specific requirements and use cases.

Regenerate SQL

The Regenerate SQL option will prompt Amazon Q to generate a new SQL query based on the same natural language prompt, using its learning and improvement capabilities to provide a potentially better-suited response.

Refresh database

By choosing Refresh database, you can instruct Amazon Q generative SQL to re-fetch and update the metadata information about the connected database.

This metadata includes:

  • Schema definitions – The structure and organization of your database schemas
  • Table definitions – The names, columns, and other properties of the tables in your database
  • Column definitions – The data types, names, and other characteristics of the columns within your database tables

Tips and techniques

To get more accurate SQL recommendations from Amazon Q generative SQL, keep in mind the following best practices:

  • Be as specific as possible. Instead of asking for total store sales, ask for total sales across all sales channels if that is what you need.
  • Add your schema to the path. For example:
    set search_path to tpcds;

  • Iterate when you have complex requests and verify the results. For example, ask which county has the most sales in 2000 and follow up with which item had the most sales.
  • Ask follow-up questions to make queries more specific.
  • If an incomplete response is generated, instead of rephrasing the entire request, provide specific instructions to Amazon Q as a continuation to the prior question.

Clean up

To avoid incurring future charges, delete the Redshift cluster you provisioned as part of this post.

Conclusion

Amazon Q generative SQL for Amazon Redshift simplifies query authoring and increases productivity by allowing you to express queries in natural language and receive SQL code recommendations. This post demonstrated how the Amazon Q generative SQL feature can accelerate data analysis by reducing the time required to write SQL queries. By using natural language processing and seamlessly converting it into SQL, you can boost productivity without requiring an in-depth understanding of your organization’s database structures. Importantly, the robust security measures of Amazon Redshift remain fully enforced, and the quality of the generated SQL continues to improve over time by enabling query history sharing across users.

Get started on your Amazon Q generative SQL journey with Amazon Redshift today by implementing the solution in this post or by referring to Interacting with Amazon Q generative SQL. For pricing information, refer to Amazon Q generative SQL pricing. Also, please try other Redshift generative AI features such as Amazon Redshift Integration with Amazon Bedrock and Amazon Redshift Serverless AI-driven scaling and optimization.


About the authors

Raghu Kuppala is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Enterprise customers architect their data workloads on AWS. With a strong background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based out of Tampa, FL and enjoys traveling, reading and playing tennis.

Xiao Qin is a senior applied scientist with the Learned Systems Group (LSG) at Amazon Web Services (AWS). He studies and applies machine learning techniques to solve data management problems. He is one of the developers that build the Amazon Q generative SQL capability.

Erol MurtezaogluErol Murtezaoglu, a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Phil Bates was a Senior Analytics Specialist Solutions Architect at AWS, before retiring, with over 25 years of data warehouse experience.

Build up-to-date generative AI applications with real-time vector embedding blueprints for Amazon MSK

Post Syndicated from Francisco Morillo original https://aws.amazon.com/blogs/big-data/build-up-to-date-generative-ai-applications-with-real-time-vector-embedding-blueprints-for-amazon-msk/

Businesses today heavily rely on advanced technology to boost customer engagement and streamline operations. Generative AI, particularly through the use of large language models (LLMs), has become a focal point for creating intelligent applications that deliver personalized experiences. However, static pre-trained models often struggle to provide accurate and up-to-date responses without real-time data.

To help address this, we’re introducing a real-time vector embedding blueprint, which simplifies building real-time AI applications by automatically generating vector embeddings using Amazon Bedrock from streaming data in Amazon Managed Streaming for Apache Kafka (Amazon MSK) and indexing them in Amazon OpenSearch Service.

In this post, we discuss the importance of real-time data for generative AI applications, typical architectural patterns for building Retrieval Augmented Generation (RAG) capabilities, and how to use real-time vector embedding blueprints for Amazon MSK to simplify your RAG architecture. We cover the key components required to ingest streaming data, generate vector embeddings, and store them in a vector database. This will enable RAG capabilities for your generative AI models.

The importance of real-time data with generative AI

The potential applications of generative AI extend well beyond chatbots, encompassing various scenarios such as content generation, personalized marketing, and data analysis. For example, businesses can use generative AI for sentiment analysis of customer reviews, transforming vast amounts of feedback into actionable insights. In a world where businesses continuously generate data—from Internet of Things (IoT) devices to application logs—the ability to process this data swiftly and accurately is paramount.

Traditional large language models (LLMs) are trained on vast datasets but are often limited by their reliance on static information. As a result, they can generate outdated or irrelevant responses, leading to user frustration. This limitation highlights the importance of integrating real-time data streams into AI applications. Generative AI applications need contextually rich, up-to-date information to make sure they provide accurate, reliable, and meaningful responses to end users. Without access to the latest data, these models risk delivering suboptimal outputs that fail to meet user needs. Using real-time data streams is crucial for powering next-generation generative AI applications.

Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is the process of optimizing the output of an LLM so it references an authoritative knowledge base outside of its training data sources before generating a response. LLMs are trained on vast volumes of data and use billions of parameters to generate original output for tasks such as answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It’s a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

At the core of RAG is the ability to fetch the most relevant information from a continuously updated vector database. Vector embeddings are numerical representations that capture the relationships and meanings of words, sentences, and other data types. They enable more nuanced and effective semantic searches than traditional keyword-based systems. By converting data into vector embeddings, organizations can build robust retrieval mechanisms that enhance the output of LLMs.

At the time of writing, many processes for creating and managing vector embeddings occur in batch mode. This approach can lead to stale data in the vector database, diminishing the effectiveness of RAG applications and the responses that AI applications generate. A streaming engine capable of invoking embedding models and writing directly to a vector database can help maintain an up-to-date RAG vector database. This helps make sure generative AI models can fetch the more relevant information in real time, providing timely and more contextually accurate outputs.

Solution overview

To build an efficient real-time generative AI application, we can divide the flow of the application into two main parts:

  • Data ingestion – This involves ingesting data from streaming sources, converting it to vector embeddings, and storing them in a vector database
  • Insights retrieval – This involves invoking an LLM with user queries to retrieve insights, employing the RAG technique

Data ingestion

The following diagram outlines the data ingestion flow.

The workflow includes the following steps:

  1. The application processes feeds from streaming sources such as social media platforms, Amazon Kinesis Data Streams, or Amazon MSK.
  2. The incoming data is converted to vector embeddings in real time.
  3. The vector embeddings are stored in a vector database for subsequent retrieval.

Data is ingested from a streaming source (for example, social media feeds) and processed using an Amazon Managed Service for Apache Flink application. Apache Flink is an open source stream processing framework that provides powerful streaming capabilities, enabling real-time processing, stateful computations, fault tolerance, high throughput, and low latency. It processes the streaming data, performs deduplication, and invokes an embedding model to create vector embeddings.

After the text data is converted into vectors, these embeddings are persisted in an OpenSearch Service domain, serving as a vector database. Unlike traditional relational databases, where data is organized in rows and columns, vector databases represent data points as vectors with a fixed number of dimensions. These vectors are clustered based on similarity, allowing for efficient retrieval.

OpenSearch Service offers scalable and efficient similarity search capabilities tailored for handling large volumes of dense vector data. With features like approximate k-Nearest Neighbor (k-NN) search algorithms, dense vector support, and robust monitoring through Amazon CloudWatch, OpenSearch Service alleviates the operational overhead of managing infrastructure. This makes it a suitable solution for applications requiring fast and accurate similarity-based retrieval tasks using vector embeddings.

Insights retrieval

The following diagram illustrates the flow from the user side, where the user submits a query through the frontend and receives a response from the LLM model using the retrieved vector database documents as context.

The workflow includes the following steps:

  1. A user submits a text query.
  2. The text query is converted into vector embeddings using the same model used for data ingestion.
  3. The vector embeddings are used to perform a semantic search in the vector database, retrieving related vectors and associated text.
  4. The retrieved information, along with any previous conversation history, and the user prompt are compiled into a single prompt for the LLM.
  5. The LLM is invoked to generate a response based on the enriched prompt.

This process helps make sure the generative AI application can use the most up-to-date context when responding to user queries, providing relevant and timely insights.

Real-time vector embedding blueprints for generative applications

To facilitate the adoption of real-time generative AI applications, we are excited to introduce real-time vector embedding blueprints. This new blueprint includes a Managed Service for Apache Flink application that receives events from an MSK cluster, processes the events, and calls Amazon Bedrock using your embedding model of choice, while storing the vectors in an OpenSearch Service cluster. This new blueprint simplifies the data ingestion piece of the architecture with a low-code approach to integrate MSK streams with OpenSearch Service and Amazon Bedrock.

Implement the solution

To use real-time data from Amazon MSK as an input for generative AI applications, you need to set up several components:

  • An MSK stream to provide the real-time data source
  • An Amazon Bedrock vector embedding model to generate embeddings from the data
  • An OpenSearch Service vector data store to store the generated embeddings
  • An application to orchestrate the data flow between these components

The real-time vector embedding blueprint packages all these components into a preconfigured solution that’s straightforward to deploy. This blueprint will generate embeddings for your real-time data, store the embeddings in an OpenSearch Service vector index, and make the data available for your generative AI applications to query and process. You can access this blueprint using either the Managed Service for Apache Flink or Amazon MSK console. To get started with this blueprint, complete the following steps:

  1. Use an existing MSK cluster or create a new one.
  2. Choose your preferred Amazon Bedrock embedding model and make sure you have access to the model.
  3. Choose an existing OpenSearch Service vector index to store all embeddings or create a new vector index.
  4. Choose Deploy blueprint.

After the Managed Service for Apache Flink blueprint is up and running, all real-time data is automatically vectorized and available for generative AI applications to process.

For the detailed setup steps, see real-time vector embedding blueprint documentation

If you want to include additional data processing steps before the creation of vector embeddings, you can use the GitHub source code for this blueprint.

The real-time vector embedding blueprint reduces the time required and the level of expertise needed to set up this data integration, so you can focus on building and improving your generative AI application.

Conclusion

By integrating streaming data ingestion, vector embeddings, and RAG techniques, organizations can enhance the capabilities of their generative AI applications. Using Amazon MSK, Managed Service for Apache Flink, and Amazon Bedrock provides a solid foundation for building applications that deliver real-time insights. The introduction of the real-time vector embedding blueprint further simplifies the development process, allowing teams to focus on innovation rather than writing custom code for integration. With just a few clicks, you can configure the blueprint to continuously generate vector embeddings using Amazon Bedrock embedding models, then index those embeddings in OpenSearch Service for your MSK data streams. This allows you to combine the context from real-time data with the powerful LLMs on Amazon Bedrock to generate accurate, up-to-date AI responses without writing custom code. You can also improve the efficiency of data retrieval using built-in support for data chunking techniques from LangChain, an open source library, supporting high-quality inputs for model ingestion.

As businesses continue to generate vast amounts of data, the ability to process this information in real time will be a crucial differentiator in today’s competitive landscape. Embracing this technology allows organizations to stay agile, responsive, and innovative, ultimately driving better customer engagement and operational efficiency. Real-time vector embedding blueprint is generally available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Paris), Europe (London), Europe (Ireland) and South America (Sao Paulo) AWS Regions. Visit the Amazon MSK documentation for the list of additional Regions, which will be supported over the next few weeks.


About the authors

Francisco MorilloFrancisco Morillo is a Streaming Solutions Architect at AWS. Francisco works with AWS customers, helping them design real-time analytics architectures using AWS services, supporting Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon Managed Service for Apache Flink.

Anusha Dasarakothapalli is a Principal Software Engineer for Amazon Managed Streaming for Apache Kafka (Amazon MSK) at AWS. She started her software engineering career with Amazon in 2015 and worked on products such as S3-Glacier and S3 Glacier Deep Archive, before transitioning to MSK in 2022. Her primary areas of focus lie in streaming technology, distributed systems, and storage.

Shakhi Hali is a Principal Product Manager for Amazon Managed Streaming for Apache Kafka (Amazon MSK) at AWS. She is passionate about helping customers generate business value from real-time data. Before joining MSK, Shakhi was a PM with Amazon S3. In her free time, Shakhi enjoys traveling, cooking, and spending time with family.

Digish Reshamwala is a Software Development Manager for Amazon Managed Streaming for Apache Kafka (Amazon MSK) at AWS. He started his career with Amazon in 2022 and worked on product such as AWS Fargate, before transitioning to MSK in 2024. Before joining AWS, Digish worked at NortonLifelLock and Symantec in engineering roles. He holds an MS degree from University of Southern California. His primary areas of focus lie in streaming technology and distributed computing.