Tag Archives: artificial intelligence

Announcing AWS Security Reference Architecture Code Examples for Generative AI

2025-04-17 Ievgeniia Ieromenko

Post Syndicated from Ievgeniia Ieromenko original https://aws.amazon.com/blogs/security/announcing-aws-security-reference-architecture-code-examples-for-generative-ai/

Amazon Web Services (AWS) is pleased to announce the release of new Security Reference Architecture (SRA) code examples for securing generative AI workloads. The examples include two comprehensive capabilities focusing on secure model inference and RAG implementations, covering a wide range of security controls and best practices for AWS generative AI services.

These new code examples are available in the AWS SRA Examples Repository and include ready-to-deploy CloudFormation templates for implementing detective security controls such as network segmentation, identity management, encryption, prompt injection detection, and logging and monitoring. The solutions align with the AWS SRA Design Guidance page and demonstrate our commitment to helping customers secure their generative AI implementations.

Customers can get started with these examples by following the implementation instructions for each solution in the AWS SRA Examples Repository Solutions GenAI page. Additional documentation and implementation guidance is available in the AWS SRA Design Guidance Generative AI Architecture Deep Dive.

AWS strives to continuously provide security solutions that help customers meet their security architecture needs. Customers can reach out to the team by submitting an issue in the code repository.

If you have feedback about this post, submit comments in the Comments section below.

Introducing Amazon Nova Sonic: Human-like voice conversations for generative AI applications

2025-04-08 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-sonic-human-like-voice-conversations-for-generative-ai-applications/

Voice interfaces are essential to enhance customer experience in different areas such as customer support call automation, gaming, interactive education, and language learning. However, there are challenges when building voice-enabled applications.

Traditional approaches in building voice-enabled applications require complex orchestration of multiple models, such as speech recognition to convert speech to text, language models to understand and generate responses, and text-to-speech to convert text back to audio.

This fragmented approach not only increases development complexity but also fails to preserve crucial linguistic context such as tone, prosody, and speaking style that are essential for natural conversations. This can affect conversational AI applications that need low latency and nuanced understanding of verbal and non-verbal cues for fluid dialog handling and natural turn-taking.

To streamline the implementation of speech-enabled applications, today we are introducing Amazon Nova Sonic, the newest addition to the Amazon Nova family of foundation models (FMs) available in Amazon Bedrock.

Amazon Nova Sonic unifies speech understanding and generation into a single model that developers can use to create natural, human-like conversational AI experiences with low latency and industry-leading price performance. This integrated approach streamlines development and reduces complexity when building conversational applications.

Its unified model architecture delivers expressive speech generation and real-time text transcription without requiring a separate model. The result is an adaptive speech response that dynamically adjusts its delivery based on prosody, such as pace and timbre, of input speech.

When using Amazon Nova Sonic, developers have access to function calling (also known as tool use) and agentic workflows to interact with external services and APIs and perform tasks in the customer’s environment, including knowledge grounding with enterprise data using Retrieval-Augmented Generation.

At launch, Amazon Nova Sonic provides robust speech understanding for American and British English across various speaking styles and acoustic conditions, with additional languages coming soon.

Amazon Nova Sonic is developed with responsible AI at the forefront of innovation, featuring built-in protections for content moderation and watermarking.

Amazon Nova Sonic in action
The scenario for this demo is a contact center in the telecommunication industry. A customer reaches out to improve their subscription plan, and Amazon Nova Sonic handles the conversation.

With tool use, the model can interact with other systems and use agentic RAG with Amazon Bedrock Knowledge Bases to gather updated, customer-specific information such as account details, subscription plans, and pricing info.

The demo shows streaming transcription of speech input and displays streaming speech responses as text. The sentiment of the conversation is displayed in two ways: a time chart illustrating how it evolves, and a pie chart representing the overall distribution. There’s also an AI insights section providing contextual tips for a call center agent. Other interesting metrics shown in the web interface are the overall talk time distribution between the customer and the agent, and the average response time.

During the conversation with the support agent, you can observe through the metrics and hear in the voices how customer sentiment improves.

The video includes an example of how Amazon Nova Sonic handles interruptions smoothly, stopping to listen and then continuing the conversation in a natural way.

Now, let’s explore how you can integrate voice capabilities in your applications.

Using Amazon Nova Sonic
To get started with Amazon Nova Sonic, you first need to toggle model access in the Amazon Bedrock console, similar to how you would enable other FMs. Navigate to the Model access section of the navigation pane, find Amazon Nova Sonic under the Amazon models, and enable it for your account.

Amazon Bedrock provides a new bidirectional streaming API (InvokeModelWithBidirectionalStream) to help you implement real-time, low-latency conversational experiences on top of the HTTP/2 protocol. With this API, you can stream audio input to the model and receive audio output in real time, so that the conversation flows naturally.

You can use Amazon Nova Sonic with the new API with this model ID: amazon.nova-sonic-v1:0

After the session initialization, where you can configure inference parameters, the model operate through an event-driven architecture on both the input and output streams.

There are three key event types in the input stream:

System prompt – To set the overall system prompt for the conversation

Audio input streaming – To process continuous audio input in real-time

Tool result handling – To send the result of tool use calls back to the model (after tool use is requested in the output events)

Similarly, there are three groups of events in the output streams:

Automatic speech recognition (ASR) streaming – Speech-to-text transcript is generated, containing the result of realtime speech recognition.

Tool use handling – If there are a tool use events, they need to be handled using the information provided here, and the results sent back as input events.

Audio output streaming – To play output audio in real-time, a buffer is needed, because Amazon Nova Sonic model generates audio faster than real-time playback.

You can find examples of using Amazon Nova Sonic in the Amazon Nova model cookbook repository.

Prompt engineering for speech
When crafting prompts for Amazon Nova Sonic, your prompts should optimize content for auditory comprehension rather than visual reading, focusing on conversational flow and clarity when heard rather than seen.

When defining roles for your assistant, focus on conversational attributes (such as warm, patient, concise) rather than text-oriented attributes (detailed, comprehensive, systematic). A good baseline system prompt might be:

You are a friend. The user and you will engage in a spoken dialog exchanging the transcripts of a natural real-time conversation. Keep your responses short, generally two or three sentences for chatty scenarios.

More generally, when creating prompts for speech models, avoid requesting visual formatting (such as bullet points, tables, or code blocks), voice characteristic modifications (accent, age, or singing), or sound effects.

Things to know
Amazon Nova Sonic is available today in the US East (N. Virginia) AWS Region. Visit Amazon Bedrock pricing to see the pricing models.

Amazon Nova Sonic can understand speech in different speaking styles and generates speech in expressive voices, including both masculine-sounding and feminine-sounding voices, in different English accents, including American and British. Support for additional languages will be coming soon.

Amazon Nova Sonic handles user interruptions gracefully without dropping the conversational context and is robust to background noise. The model supports a context window of 32K tokens for audio with a rolling window to handle longer conversations and has a default session limit of 8 minutes.

The following AWS SDKs support the new bidirectional streaming API:

Python developers can use this new experimental SDK that makes it easier to use the bidirectional streaming capabilities of Amazon Nova Sonic. We’re working to add support to the other AWS SDKs.

I’d like to thank Reilly Manton and Chad Hendren, who set up the demo with the contact center in the telecommunication industry, and Anuj Jauhari, who helped me understand the rich landscape in which speech-to-speech models are being deployed.

To learn more, these articles that enter into the details of how to use the new bidirectional streaming API with compelling demos:

Whether you’re creating customer service solutions, language learning applications, or other conversational experiences, Amazon Nova Sonic provides the foundation for natural, engaging voice interactions. To get started, visit the Amazon Bedrock console today. To learn more, visit the Amazon Nova section of the user guide.

– Danilo

How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

AWS Weekly Roundup: Omdia recognition, Amazon Bedrock RAG evaluation, International Women’s Day events, and more (March 24, 2025)

2025-03-24 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-omdia-recognition-amazon-bedrock-rag-evaluation-international-womens-day-events-and-more-march-24-2025/

As we celebrate International Women’s Day (IWD) this March, I had the privilege of attending the ‘Women in Tech’ User Group meetup in Shenzhen last weekend. I was inspired to see over 100 women in tech from different industries come together to discuss AI ethics from a female perspective. Together, we explored strategies such as reducing gender bias in AI systems and promoting diverse representation in model training data. In the AWS Cloud Lab, participants used Amazon Bedrock with large language models (LLMs) to generate rose bloom videos, which was the most popular part of this meetup.

These gatherings are crucial to our efforts to engage more women in AI technology exploration and development, and to help make sure that the generative AI era evolves without gender bias. The collaborative spirit and technical curiosity displayed throughout the event is further proof that diverse teams truly build inclusive and effective solutions.

Speaking of vibrant community engagement, I also had the honor of presenting at Kubernetes Community Day (KCD) Beijing 2025 this weekend. The enthusiasm for container technologies was remarkable, with nearly 300 developers gathering to share experiences and best practices. During my keynote introducing the DoEKS project from Amazon Web Services (AWS), I was struck by the depth of interest in managed Kubernetes services. The audience’s questions revealed how widely adopted services such as Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Elastic Container Service (Amazon ECS) have become among Chinese developers building mission-critical applications.This strong community interest aligns perfectly with findings from the Omdia Universe: Cloud Container Management & Services 2024–25 report. In this comprehensive evaluation of container management solutions hosted on public clouds, AWS was recognized as a Leader. The report specifically highlights that AWS offers “widest range of options for working with Kubernetes or its own container management service, across cloud, edge, and on-premises environments.” You can read the full report about AWS offerings to learn more about our comprehensive container portfolio and how we’re helping builders deploy scalable, reliable containerized applications.

Last Week’s launches

In addition to the inspiring community events, here are some AWS launches that caught my attention.

Amazon Q Business browser extension gets upgrades – The Amazon Q Business browser extension now features significant enhancements designed to streamline browser-based tasks. Users gain access to their company’s indexed knowledge alongside web content, direct PDF support within the browser, image file attachment capabilities, and controls to remove irrelevant attachments from conversation context. The expanded context window accommodates larger web pages and more detailed prompts, resulting in more helpful responses. For advanced needs, the extension offers seamless transition to the full Amazon Q Business web experience with access to Actions and Amazon Q Apps. Review the Enhancing web browsing with Amazon Q Business in the documentation for detailed setup instructions and feature descriptions to learn more about this announcement.

Amazon Bedrock RAG evaluation is now generally available – Offering comprehensive assessment of both Bedrock Knowledge Bases and custom Retrieval Augmented Generation (RAG) systems through LLM-as-a-judge methodology. The service evaluates retrieval quality and end-to-end generation with metrics for relevance, correctness, and hallucination detection, and the newly added support for custom RAG pipeline evaluations lets you bring your own input-output pairs and retrieved contexts directly into the evaluation job, along with new citation precision metrics and Amazon Bedrock Guardrails integration for more flexible RAG system optimization. To learn more, visit the Amazon Bedrock Evaluations page and What is Amazon Bedrock? in the documentation.

Amazon Nova expands Tool Choice options for Converse API – We’ve enhanced Amazon Nova with expanded Tool Choice capabilities for the Converse API, giving developers more flexibility in building sophisticated AI applications. This update allows models to determine when to use tools to fulfill user requests more effectively. Learn more in the announcement about expands Tool Choice options.

Amazon Bedrock Guardrails adds policy-based enforcement for responsible AI – Our builders can now enforce responsible AI policies at scale with Amazon Bedrock Guardrails’ new AWS Identity and Access Management (IAM) policy-based enforcement capabilities. This feature helps you to specify required guardrails through IAM policies using the bedrock:GuardrailIdentifiercondition key, so that all model inference calls comply with your organization’s AI safety standards. When your teams make Amazon Bedrock Invoke or Converse API calls, requests are automatically rejected if they don’t include the mandated guardrails, providing consistent protection against undesirable content, sensitive information exposure, and model hallucinations. Refer to the Set up permissions to use Guaidrails for content filtering in the technical documentation and the Amazon Bedrock Guardrails product page to learn more about the announcement about policy based enforcement for responsible AI.

Next generation of Amazon Connect released – We’ve launched the next generation of Amazon Connect, featuring AI-powered interactions designed to strengthen customer relationships and improve business outcomes. This major update brings enhanced agent experiences, smarter customer interactions, and deeper operational insights to contact centers of all sizes. Learn more from the new launch post in the AWS Contact Center Blog.

Amazon Redshift Serverless introduces Current and Trailing release tracks – Amazon Redshift Serverless now offers two release tracks to give users more control over their update cadence. The Current track delivers the most up-to-date certified release with the latest features and security updates, while the Trailing track remains on the previous certified release. This dual-track approach allows organizations to validate new releases on select workgroups before implementing them across production environments. Users can easily switch between tracks through the Amazon Redshift console, providing the flexibility to balance innovation with stability for mission-critical workloads. This capability is available in all AWS Regions where Amazon Redshift Serverless is offered. Refer to Tracks for Amazon Redshift provisioned cluster and serverless work groups to learn more about the Current and Trailing tracks in Amazon Redshift Serverless.

AWS WAF now supports URI fragment field matching – AWS WAF has expanded its capability to include URI fragment field matching, allowing security teams to create rules that inspect and match against the fragment portion of URLs. This enhancement enables more precise security controls for web applications that use URI fragments to identify specific sections within pages. Security professionals can now implement more targeted protections, such as restricting access to sensitive page elements, detecting suspicious navigation patterns, and enhancing bot mitigation by analyzing fragment usage patterns characteristic of automated attacks. This feature is available in all AWS Regions where AWS WAF is supported. For more information about URI field for matching, visit the AWS WAF Developer Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS.

Other AWS news

Here are some other additional projects and blog posts that you might find interesting.

Build your generative AI skills at AWS Gen AI Lofts – AWS has established more than 10 global hubs offering training and networking for developers and startups in 2025, where you can gain practical, hands-on experience with the latest AI technologies. These revamped spaces feature dedicated zones where you can participate in workshops on prompt engineering, foundation model (FM) selection, and implementing AI in production environments. If you’re near San Francisco, New York, Tokyo, or other major tech hubs with AWS Gen AI Lofts, stop by to access these free resources and accelerate your generative AI development skills. Check out all of the AWS Gen AI Loft locations and events and to read 5 ways to build your AI skills on AWS Gen AI Loft to learn more.

AWS Lambda‘s architecture for billions of asynchronous invocations – A recent technical article reveals how AWS Lambda handles massive scale through sophisticated engineering approaches. The Lambda asynchronous invocation path employs multiple queuing strategies, consistent hashing for intelligent partitioning, and shuffle-sharding techniques to minimize noisy neighbor effects. The system relies on key observability metrics (AsyncEventReceived, AsyncEventAge, and AsyncEventDropped) to maintain optimal performance. These architectural decisions enable Lambda to process tens of trillions of monthly invocations across 1.5 million active customers while providing reliable scalability and performance isolation. For details read Handling billions of invocations – best practices from AWS Lambda in the AWS computing blog.

AWS is reducing prices by more than 11% for its high-memory U7i instances across all Regions and pricing models. The reduction applies to four instances: u7i-12tb.224xlarge, u7in-16tb.224xlarge, u7in-24tb.224xlarge, and u7in-32tb.224xlarge. The new On-Demand pricing, which covers shared, dedicated, and host tenancy options is retroactive, to March 1, 2025. For new Savings Plan purchases, pricing is effective immediately.

Create your AWS Builder ID and reserve your alias – Builder ID is a universal login credential that gives you access beyond the AWS Management Console to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer.

From community.aws
Here are some of my favorite posts from community.aws.

Model Context Protocol (MCP): why it matters – The recently introduced Model Context Protocol (MCP) creates a standardized way for AI applications to communicate with multiple FMs using consistent prompts and tools.

Build serverless GenAI Apps faster with Amazon Q Developer CLI agent – Discover how Amazon Q Developer CLI Agent revolutionizes cloud development by building a complete serverless generative AI application in minutes instead of days.

Automating code reviews with Amazon Q and GitHub actions – A new developer tutorial demonstrates how to integrate Amazon Q Developer with GitHub Actions to automatically analyze pull requests and provide AI-powered code feedback.

DeepSeek on AWS – A new technical guide demonstrates how to deploy DeepSeek’s powerful open-source AI models on AWS infrastructure. The tutorial provides step-by-step instructions for setting up these cutting-edge models using Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) instances with GPUs, or through integration with Amazon Bedrock. The guide covers optimization techniques, sample applications, and best practices for balancing performance with cost efficiency.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events.

Empowering Futures – Women Leading the Way in Tech and Non-Tech Careers – Whether you’re here to expand your professional circle, learn about the AWS Cloud or gain wisdom from inspiring speakers, this event has something for everyone. This is a public event open to everyone in the Seattle area—for free—on March 27, 2025.

AWS at KubeCon + CloudNativeCon London 2025 – Join us at KubeCon London on April 1 – April 4 , at Excel booth S300 for live product demonstrations that help you simplify Kubernetes operations, optimize costs and performance, harness the power of artificial learning and machine learning (AI/ML), and build scalable platform strategies.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Betty

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

How is the News Blog doing? Take this 1 minute survey!

AWS Pi Day 2025: Data foundation for analytics and AI

2025-03-14 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-pi-day-data-foundation-for-analytics-and-ai/

Every year on March 14 (3.14), AWS Pi Day highlights AWS innovations that help you manage and work with your data. What started in 2021 as a way to commemorate the fifteenth launch anniversary of Amazon Simple Storage Service (Amazon S3) has now grown into an event that highlights how cloud technologies are transforming data management, analytics, and AI.

This year, AWS Pi Day returns with a focus on accelerating analytics and AI innovation with a unified data foundation on AWS. The data landscape is undergoing a profound transformation as AI emerges in most enterprise strategies, with analytics and AI workloads increasingly converging around a lot of the same data and workﬂows. You need an easy way to access all your data and use all your preferred analytics and AI tools in a single integrated experience. This AWS Pi Day, we’re introducing a slate of new capabilities that help you build unified and integrated data experiences.

The next generation of Amazon SageMaker: The center of all your data, analytics, and AI
At re:Invent 2024, we introduced the next generation of Amazon SageMaker, the center of all your data, analytics, and AI. SageMaker includes virtually all the components you need for data exploration, preparation and integration, big data processing, fast SQL analytics, machine learning (ML) model development and training, and generative AI application development. With this new generation of Amazon SageMaker, SageMaker Lakehouse provides you with unified access to your data and SageMaker Catalog helps you to meet your governance and security requirements. You can read the launch blog post written by my colleague Antje to learn more details.

Core to the next generation of Amazon SageMaker is SageMaker Unified Studio, a single data and AI development environment where you can use all your data and tools for analytics and AI. SageMaker Unified Studio is now generally available.

SageMaker Unified Studio facilitates collaboration among data scientists, analysts, engineers, and developers as they work on data, analytics, AI workflows, and applications. It provides familiar tools from AWS analytics and artificial intelligence and machine learning (AI/ML) services, including data processing, SQL analytics, ML model development, and generative AI application development, into a single user experience.

SageMaker Unified Studio also brings selected capabilities from Amazon Bedrock into SageMaker. You can now rapidly prototype, customize, and share generative AI applications using foundation models (FMs) and advanced features such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows to create tailored solutions aligned with your requirements and responsible AI guidelines all within SageMaker.

Last but not least, Amazon Q Developer is now generally available in SageMaker Unified Studio. Amazon Q Developer provides generative AI powered assistance for data and AI development. It helps you with tasks like writing SQL queries, building extract, transform, and load (ETL) jobs, and troubleshooting, and is available in the Free tier and Pro tier for existing subscribers.

You can learn more about SageMaker Unified Studio in this recent blog post written by my colleague Donnie.

During re:Invent 2024, we also launched Amazon SageMaker Lakehouse as part of the next generation of SageMaker. SageMaker Lakehouse unifies all your data across Amazon S3 data lakes, Amazon Redshift data warehouses, and third-party and federated data sources. It helps you build powerful analytics and AI/ML applications on a single copy of your data. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with Apache Iceberg–compatible tools and engines. In addition, zero-ETL integrations automate the process of bringing data into SageMaker Lakehouse from AWS data sources such as Amazon Aurora or Amazon DynamoDB and from applications such as Salesforce, Facebook Ads, Instagram Ads, ServiceNow, SAP, Zendesk, and Zoho CRM. The full list of integrations is available in the SageMaker Lakehouse FAQ.

Building a data foundation with Amazon S3
Building a data foundation is the cornerstone of accelerating analytics and AI workloads, enabling organizations to seamlessly manage, discover, and utilize their data assets at any scale. Amazon S3 is the world’s best place to build a data lake, with virtually unlimited scale, and it provides the essential foundation for this transformation.

I’m always astonished to learn about the scale at which we operate Amazon S3: It currently holds over 400 trillion objects, exabytes of data, and processes a mind-blowing 150 million requests per second. Just a decade ago, not even 100 customers were storing more than a petabyte (PB) of data on S3. Today, thousands of customers have surpassed the 1 PB milestone.

Amazon S3 stores exabytes of tabular data, and it averages over 15 million requests to tabular data per second. To help you reduce the undifferentiated heavy lifting when managing your tabular data in S3 buckets, we announced Amazon S3 Tables at AWS re:Invent 2024. S3 Tables are the first cloud object store with built-in support for Apache Iceberg. S3 tables are specifically optimized for analytics workloads, resulting in up to threefold faster query throughput and up to tenfold higher transactions per second compared to self-managed tables.

Today, we’re announcing the general availability of Amazon S3 Tables integration with Amazon SageMaker Lakehouse Amazon S3 Tables now integrate with Amazon SageMaker Lakehouse, making it easy for you to access S3 Tables from AWS analytics services such as Amazon Redshift, Amazon Athena, Amazon EMR, AWS Glue, and Apache Iceberg–compatible engines such as Apache Spark or PyIceberg. SageMaker Lakehouse enables centralized management of fine-grained data access permissions for S3 Tables and other sources and consistently applies them across all engines.

For those of you who use a third-party catalog, have a custom catalog implementation, or only need basic read and write access to tabular data in a single table bucket, we’ve added new APIs that are compatible with the Iceberg REST Catalog standard. This enables any Iceberg-compatible application to seamlessly create, update, list, and delete tables in an S3 table bucket. For unified data management across all of your tabular data, data governance, and fine-grained access controls, you can also use S3 Tables with SageMaker Lakehouse.

To help you access S3 Tables, we’ve launched updates in the AWS Management Console. You can now create a table, populate it with data, and query it directly from the S3 console using Amazon Athena, making it easier to get started and analyze data in S3 table buckets.

The following screenshot shows how to access Athena directly from the S3 console.

When I select Query tables with Athena or Create table with Athena, it opens the Athena console on the correct data source, catalog, and database.

Since re:Invent 2024, we’ve continued to add new capabilities to S3 Tables at a rapid pace. For example, we added schema definition support to the CreateTable API and you can now create up to 10,000 tables in an S3 table bucket. We also launched S3 Tables into eight additional AWS Regions, with the most recent being Asia Pacific (Seoul, Singapore, Sydney) on March 4, with more to come. You can refer to the S3 Tables AWS Regions page of the documentation to get the list of the eleven Regions where S3 Tables are available today.

Amazon S3 Metadata—announced during re:Invent 2024— has been generally available since January 27. It’s the fastest and easiest way to help you discover and understand your S3 data with automated, effortlessly-queried metadata that updates in near real time. S3 Metadata works with S3 object tags. Tags help you logically group data for a variety of reasons, such as to apply IAM policies to provide fine-grained access, specify tag-based filters to manage object lifecycle rules, and selectively replicate data to another Region. In Regions where S3 Metadata is available, you can capture and query custom metadata that is stored as object tags. To reduce the cost associated with object tags when using S3 Metadata, Amazon S3 reduced pricing for S3 object tagging by 35 percent in all Regions, making it cheaper to use custom metadata.

AWS Pi Day 2025
Over the years, AWS Pi Day has showcased major milestones in cloud storage and data analytics. This year, the AWS Pi Day virtual event will feature a range of topics designed for developers and technical decision-makers, data engineers, AI/ML practitioners, and IT leaders. Key highlights include deep dives, live demos, and expert sessions on all the services and capabilities I discussed in this post.

By attending this event, you’ll learn how you can accelerate your analytics and AI innovation. You’ll learn how you can use S3 Tables with native Apache Iceberg support and S3 Metadata to build scalable data lakes that serve both traditional analytics and emerging AI/ML workloads. You’ll also discover the next generation of Amazon SageMaker, the center for all your data, analytics, and AI, to help your teams collaborate and build faster from a unified studio, using familiar AWS tools with access to all your data whether it’s stored in data lakes, data warehouses, or third-party or federated data sources.

For those looking to stay ahead of the latest cloud trends, AWS Pi Day 2025 is an event you can’t miss. Whether you’re building data lakehouses, training AI models, building generative AI applications, or optimizing analytics workloads, the insights shared will help you maximize the value of your data.

Tune in today and explore the latest in cloud data innovation. Don’t miss the opportunity to engage with AWS experts, partners, and customers shaping the future of data, analytics, and AI.

If you missed the virtual event on March 14, you can visit the event page at any time—we will keep all the content available on-demand there!

— seb

How is the News Blog doing? Take this 1 minute survey!

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

2025-03-14 G2 Krishnamoorthy

Post Syndicated from G2 Krishnamoorthy original https://aws.amazon.com/blogs/big-data/accelerate-analytics-and-ai-innovation-with-the-next-generation-of-amazon-sagemaker/

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker, the center for all your data, analytics, and AI. Amazon SageMaker brings together widely adopted AWS machine learning (ML) and analytics capabilities and addresses the challenges of harnessing organizational data for analytics and AI through unified access to tools and data with governance built in. It enables teams to securely find, prepare, and collaborate on data assets and build analytics and AI applications through a single experience, accelerating the path from data to value.

At the core of the next generation of Amazon SageMaker is Amazon SageMaker Unified Studio, a single data and AI development environment where you can find and access your organization’s data and act on it using the best tool for the job across virtually any use case. We are excited to announce the general availability of SageMaker Unified Studio.

In this post, we explore the benefits of SageMaker Unified Studio and how to get started.

Benefits of SageMaker Unified Studio

SageMaker Unified Studio brings together the functionality and tools from existing AWS Analytics and AI/ML services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI. From within the unified studio, you can discover data and AI assets from across your organization, then work together in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications. Governance features including fine-grained access control are built into SageMaker Unified Studio using Amazon SageMaker Catalog to help you meet enterprise security requirements across your entire data estate.

Unified access to your data is provided by Amazon SageMaker Lakehouse, a unified, open, and secure data lakehouse built on Apache Iceberg open standards. Whether your data is stored in Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, or third-party and federated data sources, you can access it from one place and use it with Iceberg-compatible engines and tools. In addition, SageMaker Lakehouse now integrates with Amazon S3 Tables, the first cloud object store with native Apache Iceberg support, so you can use SageMaker Lakehouse to create, query, and process S3 Tables efficiently using various analytics engines in SageMaker Unified Studio as well as Iceberg-compatible engines like Apache Spark and PyIceberg.

Capabilities from Amazon Bedrock are now generally available in SageMaker Unified Studio, allowing you to rapidly prototype, customize, and share generative AI applications in a governed environment. Users have an intuitive interface to access high-performing foundation models (FMs) in Amazon Bedrock, including the Amazon Nova model series, and the ability to create Agents, Flows, Knowledge Bases, and Guardrails with a few clicks.

Amazon Q Developer, the most capable generative AI assistant for software development, can be used within SageMaker Unified Studio to streamline tasks across the data and AI development lifecycle, including code authoring, SQL generation, data discovery, and troubleshooting.

A new integrated way of working

The general availability of SageMaker Unified Studio represents another meaningful step in our journey to offer our customers a streamlined way to work with their data, whether for analytics or AI. Many of our customers have told us that you are building data-driven applications to guide business decisions, improve agility, and drive innovation, but that these applications are complex to build because they require collaboration across teams and the integration of data and tools. Not only is it time consuming for users to learn multiple development experiences, but because data, code, and other development artifacts are stored separately, it is challenging for users to understand how they interact with each other and to use them cohesively. Configuring and governing access is also a cumbersome manual process. To overcome these hurdles, many organizations are building bespoke integrations between services, tools, and homegrown access management systems. However, what you need is the flexibility to adopt the best services for your use case while empowering your data teams with a unified development experience.

“When we build data-driven applications for our customers, we want a unified platform where the technologies work together in an integrated way. Amazon SageMaker Unified Studio streamlines our solution delivery processes through comprehensive analytics capabilities, a unified studio experience, and a lakehouse that integrates data management across data warehouses and data lakes. Amazon SageMaker Unified Studio reduces the time-to-value for our customers’ data projects by up to 40%, helping us with our mission to accelerate our customers’ digital transformation journey.”

—Akihiro Suzue, Head of Solutions Sector, NTT DATA; Yuji Shono, Senior Manager, Apps & Data Technology Department, NTT DATA; Yuki Saito, Manager, Digital Success Solutions Division, NTT DATA

Millions of organizations trust AWS and utilize our comprehensive set of purpose-built analytics, AI/ML, and generative AI capabilities to power data-driven applications without compromising on performance, scale, or cost. Our goal for the next generation of Amazon SageMaker, including SageMaker Unified Studio, is to make data and AI workers more productive by providing access to all your data and tools in a single development environment.

Building from a single data and AI development environment

Let’s explore a common business challenge: increasing revenue through better lead generation. Consider an organization implementing an intelligent digital assistant on their website to engage with customers—a process that traditionally requires multiple tools and data sources. With SageMaker Unified Studio, this entire process can now be carried out within a single data and AI development environment.

First, the data team uses the generative AI playground within SageMaker Unified Studio to quickly evaluate and select the best model for their customer interactions. They then create a project to house the tools and resources necessary for their use case and use Amazon Bedrock within the project to build and deploy a sophisticated virtual assistant that quickly begins qualifying leads through their website.

To identify the most promising opportunities, the team develops a segmentation strategy. The data engineer asks Amazon Q Developer to identify datasets that contain lead data and uses zero-ETL integrations to bring the data into SageMaker Lakehouse. The data analyst then discovers it and creates a comprehensive view of their market. They use the SQL query editor to build out marketing segments, which they then write back to SageMaker Lakehouse, where they are available to other team members.

Finally, the data scientist accesses the same dataset, which they use to train and deploy an automated lead scoring model using tools available from SageMaker AI. During the model development phase, they use Amazon Q Developer’s inline code authoring and troubleshooting capabilities to efficiently write error free-code in their JupyterLab notebook. The final model provides sales teams with the highest-value opportunities, which they can visualize in a business intelligence dashboard and take action on immediately.

Reducing time-to-value in a unified environment

What is remarkable about this example is that entire process happens in one integrated environment. Without SageMaker Unified Studio, the team would have had to work with multiple data sources, tools, and services, spending time learning multiple development environments, creating resources shares, and manually configuring access controls. The data engineer and data analyst would have worked in various data warehouses, data lakes, and analytics tools, the data scientist would have worked in an ML studio and notebook environment, and the application builder in a generative AI tool. Now, they’re able to build and collaborate with their data and tools available in one experience, dramatically reducing time-to-value.

That’s why we’re so excited about the next generation of Amazon SageMaker and the general availability of SageMaker Unified Studio. We believe that by putting everything you need for analytics and AI in one place, you can solve complex end-to-end problems more efficiently and get to innovative outcomes faster than ever before.

Getting started with SageMaker Unified Studio

To learn more, check out the following resources:

About the authors

G2 Krishnamoorthy is VP of Analytics, leading AWS data lake services, data integration, Amazon OpenSearch Service, and Amazon QuickSight. Prior to his current role, G2 built and ran the Analytics and ML Platform at Facebook/Meta, and built various parts of the SQL Server database, Azure Analytics, and Azure ML at Microsoft.

Rahul Pathak is VP of Relational Database Engines, leading Amazon Aurora, Amazon Redshift, and Amazon QLDB. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio. He has co-founded two companies, one focused on digital media analytics and the other on IP-geolocation.

Helping us help you: Practical applications of AI in the SOC

2025-03-11 Conner Goldstein

Post Syndicated from Conner Goldstein original https://blog.rapid7.com/2025/03/11/helping-us-help-you-practical-applications-of-ai-in-the-soc/

Helping us help you: Practical applications of AI in the SOC

Security teams can be understandably hesitant to integrate artificial intelligence (AI) into incident response workflows. A single mistaken action could lead to widespread disruption, monetary loss, or reputational harm. Meanwhile, attackers are increasingly leveraging AI to enhance the scale and sophistication of their operations. According to former CISA chief, Jen Easterly, “[it’s] not just teaching cyber bad guys new tricks — it’s also making it easier for anyone to become a bad guy.”

This escalation in AI-driven threats contributes to a more complex attack landscape, intensifying pressures on security teams already grappling with limited resources, and an ever-increasing volume of alerts. As a result, the risks of ignoring AI now outweigh the risks of embracing it.

Whether or not you’re a customer of Rapid7’s managed security offerings, it’s worth understanding how AI is already transforming security operations today – not as a vague promise of the future, but as a real, tangible advantage in the fight against cyber threats. Rapid7 has been at the forefront of this shift. Last summer, my colleague Laura Ellis detailed in her blog post how Rapid7 first infused AI into our MDR workflows; and just a few weeks ago Kelcey Morgan outlined some of the ways AI is essential to integrate into SOC workflows. Now, we’re taking it even further, and customers are seeing the impact firsthand.

Below, we explore some of the key ways AI is actively driving secure, efficient, and transparent outcomes within Rapid7’s global Security Operations Center (SOC), and how customers of our Managed Threat Complete service are benefitting from these advancements firsthand.

AI-Powered Auto-Triage

Currently Available

What it is: AI-driven models that automatically analyze and close low-risk alerts, allowing analysts to focus on real threats. Using a layered ensemble approach, these machine learning models harness the collective expertise of Rapid7’s MDR analysts to instantly identify and resolve low-risk security alerts, as well as highlight potentially dangerous alerts. This allows our analysts to quickly identify and respond to the greatest threats to our customers’ networks.

Real-world impact: In a recent incident, a customer’s MDR environment generated over 8,000 benign alerts in a short time span. While Rapid7’s 24x7x365 SOC could have manually processed them, our AI models accurately triaged and identified them as benign without human intervention – freeing up analysts to focus on actual threats.

Why it matters: AI allows our SOC to reallocate human expertise to more complex investigations, reducing fatigue and response times while improving detection accuracy. Customers get faster, higher-quality security outcomes without being overwhelmed by false positives.

NEW: AI Alert Triage Decisioning Transparency

What it is: Complete transparency into alerts closed by the SOC with the assistance of AI-powered auto-triage capabilities.

Real-world impact: Transparency in auto-triage decisions is crucial for maintaining trust and security oversight. If an alert for potentially malicious certutil activity is closed as benign via our AI-powered Alert Triage capability, customers can review what input was relevant in driving the AI model’s rationale. Likewise, if a PowerShell execution on a critical server is escalated, they can see exactly why, based on factors like anomalous command sequences or credential access attempts. This visibility eliminates black-box decision-making, allowing security teams to confidently verify and act on AI-driven decisions.

Why it matters: Without visibility into auto-triage decisions, security teams risk over-reliance on automation without understanding its reasoning – potentially leading to missed threats or unnecessary escalations. By ensuring transparency, Rapid7’s AI-Powered Alert Triage empowers customers with insight into decision logic, helping them maintain security control, verify actions, and confidently respond to threats. This aligns with Rapid7’s TRISM Framework, which emphasizes trust in AI-driven security environments to ensure customers can harness AI without compromising visibility or control.

Helping us help you: Practical applications of AI in the SOC

AI-Generated Incident Reports

Currently Available

What it is: AI-powered automation that initiates detailed incident reports, including root cause analysis and impacted systems, arming the SOC with foundational information to recommend next steps.

Real-world impact: Traditionally, analysts manually compile post-incident reports, a process that can take hours. With AI-driven automation, incident summaries are generated in minutes, pulling in relevant data, impact analysis, and remediation insights automatically. Analysts then validate and refine these reports before sharing them with customers.

Why it matters: Customers get faster, more actionable insights following security incidents, reducing downtime and allowing for quicker remediation. AI doesn’t replace expert analysis – it enhances it, giving security teams the information they need to act decisively.

AI-Powered MDR SOC Assistant

Currently Available

What it is: AI-driven assistants that provide real-time recommendations, enrichment, and decision support for Rapid7 SOC analysts during investigations.

Real-world impact: When Rapid7 SOC analysts investigate a suspicious event, AI automatically enriches it with historical attack patterns, threat intelligence, and behavioral context to provide suggested next steps. If similar cases exist in other environments, AI identifies patterns and highlights potential threats before they escalate.

Why it matters:The AI-Powered MDR SOC Assistant acts as an on-demand expert for Rapid7’s MDR analysts that speeds up investigations, helping analysts make data-driven decisions, and ensures no critical detail is overlooked. This translates to faster investigation and response times for customers.

AI-Driven Threat Detections

Currently Available

What it is: AI identifies subtle patterns and anomalies that might indicate emerging threats before they trigger traditional detection rules.

Real-world impact: AI-driven analytics help uncover a multi-stage attack in its earliest phase by detecting an unusual combination of process executions across multiple endpoints. Analysts are alerted to the activity and mitigate the threat before it can escalate into a full-blown breach.

Why it matters: Traditional security tools rely on known signatures or predefined rules. AI allows for earlier detection of nuanced threats, helping customers stay ahead of sophisticated attacks that might otherwise go unnoticed. Learn more about these detections.

The time to embrace AI is now

AI-powered SOC automations are no longer futuristic ideas – they are practical, real-world solutions already making security teams faster, smarter, and more effective. The question is no longer “Should we leverage AI?” but rather “How can we leverage AI responsibly and effectively within our Security Operations teams and workflows?”

As we outlined in our previous blog, the introduction of AI into security workflows is not about replacing humans – it’s about empowering them. At Rapid7, we’ve seen firsthand how AI can reduce noise, accelerate investigations, and help security teams stay ahead of evolving threats – and we’re just getting started.

Fortunately, security teams don’t have to navigate this new frontier alone. With Rapid7’s AI-enhanced MDR services, customers get the best of both worlds – AI-powered efficiency combined with expert human oversight. Whether through AI-Powered Alert Triage, AI-Generated Incident Reports, or AI-assisted investigation and threat detection, the message is clear: embracing AI isn’t just about adopting new technology – it’s about accelerating outcomes in an increasingly unpredictable digital world.

If you’re ready to explore how AI helps us to help you bolster your security operations, let’s talk.

Unlock the power of optimization in Amazon Redshift Serverless

2025-03-10 Ricardo Serafim

Post Syndicated from Ricardo Serafim original https://aws.amazon.com/blogs/big-data/unlock-the-power-of-optimization-in-amazon-redshift-serverless/

Amazon Redshift Serverless automatically scales compute capacity to match workload demands, measuring this capacity in Redshift Processing Units (RPUs). Although traditional scaling primarily responds to query queue times, the new AI-driven scaling and optimization feature offers a more sophisticated approach by considering multiple factors including query complexity and data volume. Intelligent scaling addresses key data warehouse challenges by preventing both over-provisioning of resources for performance and under-provisioning to save costs, particularly for workloads that fluctuate based on daily patterns or monthly cycles.

Amazon Redshift serverless now offers enhanced flexibility in configuring workgroups through two primary methods. Users can either set a base capacity, specifying the baseline RPUs for query execution, with options ranging from 8 to 1024 RPUs and each RPU providing 16 GB of memory, or they can opt for the price-performance target. Amazon Redshift Serverless AI-driven scaling and optimization can adapt more precisely to diverse workload requirements and employs intelligent resource management, automatically adjusting resources during query execution for optimal performance. Consider using AI-driven scaling and optimization if your current workload requires 32 to 512 base RPUs. We don’t recommend using this feature for less than 32 base RPU or more than 512 base RPU workloads.

In this post, we demonstrate how Amazon Redshift Serverless AI-driven scaling and optimization impacts performance and cost across different optimization profiles.

Options in AI-driven scaling and optimization

Amazon Redshift Serverless AI-driven scaling and optimization offers an intuitive slider interface, letting you balance price and performance goals. You can select from five optimization profiles, ranging from Optimized for Cost to Optimized for Performance, as shown in the following diagram. Your slider position determines how Amazon Redshift allocates resources and implements AI-driven scaling and optimizations, to achieve your desired price-performance target.

The slider offers the following options:

Optimized for Cost (1)
- Prioritizes cost savings over performance
- Allocates minimum resources in favor of saving on costs
- Best for workloads where performance isn’t time-critical
Cost-Balanced (25)
- Balances towards cost savings while maintaining reasonable performance
- Allocates moderate resources
- Suitable for mixed workloads with some flexibility in query time
Balanced (50)
- Provides equal emphasis on cost efficiency and performance
- Allocates optimal resources for most use cases
- Ideal for general-purpose workloads
Performance-Balanced (75)
- Favors performance while maintaining some cost control
- Allocates additional resources when needed
- Suitable for workloads requiring consistently fast query elapsed time
Optimized for Performance (100)
- Maximizes performance regardless of cost
- Provides maximum available resources
- Best for time-critical workloads requiring fastest possible query delivery

Which workloads to consider for AI-driven scaling and optimizations

The Amazon Redshift Serverless AI-driven scaling and optimization capabilities can be applied to almost every analytical workload. Amazon Redshift will assess and apply optimizations according to your price-performance target—cost, balance, or performance.

Most analytical workloads operate on millions or even billions of rows and generate aggregations and complex calculations. These workloads have high variability for query patterns and number of queries. The Amazon Redshift Serverless AI-driven scaling and optimization will improve the price, performance, or both because it learns the patterns (the repeatability of your workload) and will allocate more resources towards performance improvements if you’re performance-focused or fewer resources if you’re cost-focused.

Cost-effectiveness of AI-driven scaling and optimization

To effectively determine the effectiveness of Amazon Redshift Serverless AI-driven scaling and optimization we need to be able to measure your current state of price-performance. We encourage you to measure your current price-performance by using sys_query_history to calculate the total elapsed time of your workload and note the start time and end time. Then use sys_serverless_usage to calculate the cost. You can use the query from the Amazon Redshift documentation and add the same start and end times. This will establish your current price performance, and now you have a baseline to compare against.

If such measurement isn’t practical because your workloads are continuously running and it’s impractical for you to determine a fixed start and end time, then another way is to compare holistically, check your month over month cost, check your user sentiment towards performance, towards system stability, improvements in data delivery, or reduction in overall monthly processing times.

Benchmark conducted and results

We evaluated the optimization options using the TPCDS 3TB dataset from the AWS Labs GitHub repository (amazon-redshift-utils). We deployed this dataset across three Amazon Redshift Serverless workgroups configured as Optimized for Cost, Balanced, and Optimized for Performance. To create a realistic reporting environment, we configured three Amazon Elastic Compute Cloud (Amazon EC2) instances with JMeter (one per endpoint) and ran 15 selected TPCDS queries concurrently for approximately 1 hour, as shown in the following screenshot.

We disabled the result cache to make sure Amazon Redshift Serverless ran all queries directly, providing accurate measurements. This setup helped us capture authentic performance characteristics across each optimization profile. Also, we designed our test environment without setting the Amazon Redshift Serverless workgroup max capacity parameter—a key configuration that controls the maximum RPUs available to your data warehouse. By removing this limit, we could clearly showcase how different configurations affect scaling behavior in our test endpoints.

Our comprehensive test plan included running each of the 15 queries 355 times, generating 5,325 queries per test cycle. The AI-driven scaling and optimization needs multiple iterations to identify patterns and optimize RPUs, so we ran this workload 10 times. Through these repetitions, the AI learned and adapted its behavior, processing a total of 53,250 queries throughout our testing period.

The testing revealed how the AI-driven scaling and optimization system adapts and optimizes performance across three distinct configuration profiles: Optimized for Cost, Balanced, and Optimized for Performance.

Queries and elapsed time

Although we ran the same core workload repeatedly, we used variable parameters in JMeter to generate different values for the WHERE clause conditions. This approach created similar but not identical workloads, introducing natural variations that showed how the system handles real-world scenarios with varying query patterns.

Our elapsed time analysis demonstrates how each configuration achieved its performance objectives, as shown by the average consumption metrics for each endpoint, as shown in the following screenshot.

The results matched our expectations: the Optimized for Performance configuration delivered significant speed improvements, running queries approximately two times as the Balanced configuration and four times as the Optimized for Cost setup.

The following screenshots show the elapsed time breakdown for each test.

The following screenshot shows tenth and final test iteration demonstrates distinct performance differences across configurations.

To clarify more, we categorized our query elapsed times into three groups:

Short queries – Less than 10 seconds
Medium queries – From 10 seconds to 10 minutes
Long queries: More than 10 minutes

Considering our last test, the analysis shows:

Duration per configuration	Optimized for Cost	Balanced	Optimized for Performance
Short queries (<10 sec)	1488	1743	3290
Medium queries (10 sec – 10 min)	3633	3579	2035
Long queries (>10 min)	204	3	0
TOTAL	5325	5325	5325

The configuration’s capacity directly impacts query elapsed time. The Optimized for Cost configuration limits resources to save money, resulting in longer query times, making it best suited for workloads that aren’t time critical, where cost savings are prioritized. The Balanced configuration provides moderate resource allocation, striking a middle ground by effectively handling medium-duration queries and maintaining reasonable performance for short queries while nearly eliminating long-running queries. In contrast, the Optimized for Performance configuration allocates more resources, which increases costs but delivers faster query results, making it best for latency-sensitive workloads where query speed is critical.

Capacity used during the tests

Our comparison of the three configurations reveals how Amazon Redshift Serverless AI-driven scaling and optimization technology adapts resource allocation to meet user expectations. The monitoring showed both Base RPU variations and distinct scaling patterns across configurations—scaling up aggressively for faster performance or maintaining lower RPUs to optimize costs.

The Optimized for Cost configuration starts at 128 RPUs and increases to 256 RPUs after three tests. To maintain cost-efficiency, this setup limits the maximum RPU allocation during scaling, even when facing query queuing.

In the following table, we can observe the costs for this Optimized for Cost configuration.

Test#	Starting RPUs	Scaled up to	Cost incurred
1	128	1408	$254.17
2	128	1408	$258.39
3	128	1408	$261.92
4	256	1408	$245.57
5	256	1408	$247.11
6	256	1408	$257.25
7	256	1408	$254.27
8	256	1408	$254.27
9	256	1408	$254.11
10	256	1408	$256.15

The strategic RPU allocation by Amazon Redshift Serverless helps optimize costs, as demonstrated in tests 3 and 4, where we observed significant cost savings. This is shown in the following graph.

Although the optimization for cost changed the base RPU, the balanced configuration didn’t change the base RPUs but scaled up to 2176, further than the 1408 RPUs that were the maximum used by the cost optimization setup. The following table shows the figures for the Balanced configuration.

Test#	Starting RPUs	Scaled up to	Cost incurred
1	192	2176	$261.48
2	192	2112	$270.90
3	192	2112	$265.26
4	192	2112	$260.20
5	192	2112	$262.12
6	192	2112	$253.18
7	192	2112	$272.80
8	192	2112	$272.80
9	192	2112	$263.72
10	192	2112	$243.28

The Balanced configuration, averaging $262.57 per test, delivered significantly better performance while costing only 3% more than the Optimized for Cost configuration, which averaged $254.32 per test. As demonstrated in the previous section, this performance advantage is evident in the elapsed time comparisons. The following graph shows the costs for the Balanced configuration.

As expected from the Optimized for Performance configuration, the usage of resources was higher to attend the high performance. In this configuration, we can also observe that after two tests, the engine adapted itself to start with a higher number of RPUs to attend the queries faster.

Test#	Starting RPUs	Scaled Up to	Cost incurred
1	512	2753	$295.07
2	512	2327	$280.29
3	768	2560	$333.52
4	768	2991	$295.36
5	768	2479	$308.72
6	768	2816	$324.08
7	768	2413	$300.45
8	768	2413	$300.45
9	768	2107	$321.07
10	768	2304	$284.93

Despite a 19% cost increase in the third test, most subsequent tests remained below the $304.39 average cost.

The Optimized for Performance configuration maximizes resource usage to achieve faster query times, prioritizing speed over cost efficiency.

The final cost-performance analysis reveals compelling results:

The Balanced configuration delivered twofold better performance while costing only 3.25% more than the Optimized for Cost setup
The Optimized for Performance configuration achieved fourfold faster elapsed time with a 19.39% cost increase compared to the Optimized for Cost option.

The following chart illustrates our cost-performance findings:

It’s important to note that these results reflect our specific test scenario. Each workload has unique characteristics, and the performance and cost differences between configurations might vary significantly in other use cases. Our findings serve as a reference point rather than a universal benchmark. Additionally, we didn’t test two intermediate configurations available in Amazon Redshift Serverless: one between Optimized for Cost and Balanced, and another between Balanced and Optimized for Performance.

Conclusion

The test results demonstrate the effectiveness of Amazon Redshift Serverless AI-driven scaling and optimization across different workload requirements. These findings highlight how Amazon Redshift Serverless AI-driven scaling and optimization can help organizations find their ideal balance between cost and performance. Although our test results serve as a reference point, each organization should evaluate their specific workload requirements and price-performance targets. The flexibility of five different optimization profiles, combined with intelligent resource allocation, enables teams to fine-tune their data warehouse operations for optimal efficiency.

To get started with Amazon Redshift Serverless AI-driven scaling and optimization, we recommend:

Establishing your current price-performance baseline
Identifying your workload patterns and requirements
Testing different optimization profiles with your specific workloads
Monitoring and adjusting based on your results

By using these capabilities, organizations can achieve better resource utilization while meeting their specific performance and cost objectives.

Ready to optimize your Amazon Redshift Serverless workloads? Visit the AWS Management Console today to create your own Amazon Redshift Serverless AI-driven scaling and optimization to start exploring the different optimization profiles. For more information, check out our documentation on Amazon Redshift Serverless AI-driven scaling and optimization, or contact your AWS account team to discuss your specific use case.

About the Authors

Ricardo Serafim is a Senior Analytics Specialist Solutions Architect at AWS. He has been helping companies with Data Warehouse solutions since 2007.

Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.

Andre Hass is a Senior Technical Account Manager at AWS, specialized in AWS Data Analytics workloads. With more than 20 years of experience in databases and data analytics, he helps customers optimize their data solutions and navigate complex technical challenges. When not immersed in the world of data, Andre can be found pursuing his passion for outdoor adventures. He enjoys camping, hiking, and exploring new destinations with his family on weekends or whenever an opportunity arises.

DeepSeek-R1 now available as a fully managed serverless model in Amazon Bedrock

2025-03-10 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/deepseek-r1-now-available-as-a-fully-managed-serverless-model-in-amazon-bedrock/

As of January 30, DeepSeek-R1 models became available in Amazon Bedrock through the Amazon Bedrock Marketplace and Amazon Bedrock Custom Model Import. Since then, thousands of customers have deployed these models in Amazon Bedrock. Customers value the robust guardrails and comprehensive tooling for safe AI deployment. Today, we’re making it even easier to use DeepSeek in Amazon Bedrock through an expanded range of options, including a new serverless solution.

The fully managed DeepSeek-R1 model is now generally available in Amazon Bedrock. Amazon Web Services (AWS) is the first cloud service provider (CSP) to deliver DeepSeek-R1 as a fully managed, generally available model. You can accelerate innovation and deliver tangible business value with DeepSeek on AWS without having to manage infrastructure complexities. You can power your generative AI applications with DeepSeek-R1’s capabilities using a single API in the Amazon Bedrock’s fully managed service and get the benefit of its extensive features and tooling.

According to DeepSeek, their model is publicly available under MIT license and offers strong capabilities in reasoning, coding, and natural language understanding. These capabilities power intelligent decision support, software development, mathematical problem-solving, scientific analysis, data insights, and comprehensive knowledge management systems.

As is the case for all AI solutions, give careful consideration to data privacy requirements when implementing in your production environments, check for bias in output, and monitor your results. When implementing publicly available models like DeepSeek-R1, consider the following:

Data security – You can access the enterprise-grade security, monitoring, and cost control features of Amazon Bedrock that are essential for deploying AI responsibly at scale, all while retaining complete control over your data. Users’ inputs and model outputs aren’t shared with any model providers. You can use these key security features by default, including data encryption at rest and in transit, fine-grained access controls, secure connectivity options, and download various compliance certifications while communicating with the DeepSeek-R1 model in Amazon Bedrock.
Responsible AI – You can implement safeguards customized to your application requirements and responsible AI policies with Amazon Bedrock Guardrails. This includes key features of content filtering, sensitive information filtering, and customizable security controls to prevent hallucinations using contextual grounding and Automated Reasoning checks. This means you can control the interaction between users and the DeepSeek-R1 model in Bedrock with your defined set of policies by filtering undesirable and harmful content in your generative AI applications.
Model evaluation – You can evaluate and compare models to identify the optimal model for your use case, including DeepSeek-R1, in a few steps through either automatic or human evaluations by using Amazon Bedrock model evaluation tools. You can choose automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. Alternatively, you can choose human evaluation workflows for subjective or custom metrics such as relevance, style, and alignment to brand voice. Model evaluation provides built-in curated datasets, or you can bring in your own datasets.

We strongly recommend integrating Amazon Bedrock Guardrails and using Amazon Bedrock model evaluation features with your DeepSeek-R1 model to add robust protection for your generative AI applications. To learn more, visit Protect your DeepSeek model deployments with Amazon Bedrock Guardrails and Evaluate the performance of Amazon Bedrock resources.

Get started with the DeepSeek-R1 model in Amazon Bedrock
If you’re new to using DeepSeek-R1 models, go to the Amazon Bedrock console, choose Model access under Bedrock configurations in the left navigation pane. To access the fully managed DeepSeek-R1 model, request access for DeepSeek-R1 in DeepSeek. You’ll then be granted access to the model in Amazon Bedrock.

Next, to test the DeepSeek-R1 model in Amazon Bedrock, choose Chat/Text under Playgrounds in the left menu pane. Then choose Select model in the upper left, and select DeepSeek as the category and DeepSeek-R1 as the model. Then choose Apply.

Using the selected DeepSeek-R1 model, I run the following prompt example:

A family has $5,000 to save for their vacation next year. They can place the money in a savings account earning 2% interest annually or in a certificate of deposit earning 4% interest annually but with no access to the funds until the vacation. If they need $1,000 for emergency expenses during the year, how should they divide their money between the two options to maximize their vacation fund?

This prompt requires a complex chain of thought and produces very precise reasoning results.

To learn more about usage recommendations for prompts, refer to the README of the DeepSeek-R1 model in its GitHub repository.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDK. You can use us.deepseek.r1-v1:0 as the model ID.

Here is a sample of the AWS CLI command:

aws bedrock-runtime invoke-model \
     --model-id us.deepseek-r1-v1:0 \
     --body "{\"messages\":[{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"[n\"}]}],max_tokens\":2000,\"temperature\":0.6,\"top_k\":250,\"top_p\":0.9,\"stop_sequences\":[\"\\n\\nHuman:\"]}" \
     --cli-binary-format raw-in-base64-out \
     --region us-west-2 \
     invoke-model-output.txt

The model supports both the InvokeModel and Converse API. The following Python code examples show how to send a text message to the DeepSeek-R1 model using the Amazon Bedrock Converse API for text generation.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 8b Instruct.
model_id = "us.deepseek.r1-v1:0"

# Start a conversation with the user message.
user_message = "Describe the purpose of a 'hello world' program in one line."
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 2000, "temperature": 0.6, "topP": 0.9},
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

To enable Amazon Bedrock Guardrails on the DeepSeek-R1 model, select Guardrails under Safeguards in the left navigation pane, and create a guardrail by configuring as many filters as you need. For example, if you filter for “politics” word, your guardrails will recognize this word in the prompt and show you the blocked message.

4. Apply the Bedrock Guardrails to the DeepSeek-R1 model

You can test the guardrail with different inputs to assess the guardrail’s performance. You can refine the guardrail by setting denied topics, word filters, sensitive information filters, and blocked messaging until it matches your needs.

To learn more about Amazon Bedrock Guardrails, visit Stop harmful content in models using Amazon Bedrock Guardrails in the AWS documentation or other deep dive blog posts about Amazon Bedrock Guardrails on the AWS Machine Learning Blog channel.

Here’s a demo walkthrough highlighting how you can take advantage of the fully managed DeepSeek-R1 model in Amazon Bedrock:

Now available
DeepSeek-R1 is now available fully managed in Amazon Bedrock in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS Regions through cross-Region inference. Check the full Region list for future updates. To learn more, check out the DeepSeek in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give the DeepSeek-R1 model a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Updated on March 10, 2025 — Fixed a screenshot of model selection and model ID.

Get insights from multimodal content with Amazon Bedrock Data Automation, now generally available

2025-03-03 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/get-insights-from-multimodal-content-with-amazon-bedrock-data-automation-now-generally-available/

Many applications need to interact with content available through different modalities. Some of these applications process complex documents, such as insurance claims and medical bills. Mobile apps need to analyze user-generated media. Organizations need to build a semantic index on top of their digital assets that include documents, images, audio, and video files. However, getting insights from unstructured multimodal content is not easy to set up: you have to implement processing pipelines for the different data formats and go through multiple steps to get the information you need. That usually means having multiple models in production for which you have to handle cost optimizations (through fine-tuning and prompt engineering), safeguards (for example, against hallucinations), integrations with the target applications (including data formats), and model updates.

To make this process easier, we introduced in preview during AWS re:Invent Amazon Bedrock Data Automation, a capability of Amazon Bedrock that streamlines the generation of valuable insights from unstructured, multimodal content such as documents, images, audio, and videos. With Bedrock Data Automation, you can reduce the development time and effort to build intelligent document processing, media analysis, and other multimodal data-centric automation solutions.

You can use Bedrock Data Automation as a standalone feature or as a parser for Amazon Bedrock Knowledge Bases to index insights from multimodal content and provide more relevant responses for Retrieval-Augmented Generation (RAG).

Today, Bedrock Data Automation is now generally available with support for cross-region inference endpoints to be available in more AWS Regions and seamlessly use compute across different locations. Based on your feedback during the preview, we also improved accuracy and added support for logo recognition for images and videos.

Let’s have a look at how this works in practice.

Using Amazon Bedrock Data Automation with cross-region inference endpoints
The blog post published for the Bedrock Data Automation preview shows how to use the visual demo in the Amazon Bedrock console to extract information from documents and videos. I recommend you go through the console demo experience to understand how this capability works and what you can do to customize it. For this post, I focus more on how Bedrock Data Automation works in your applications, starting with a few steps in the console and following with code samples.

The Data Automation section of the Amazon Bedrock console now asks for confirmation to enable cross-region support the first time you access it. For example:

From an API perspective, the InvokeDataAutomationAsync operation now requires an additional parameter (dataAutomationProfileArn) to specify the data automation profile to use. The value for this parameter depends on the Region and your AWS account ID:

arn:aws:bedrock:<REGION>:<ACCOUNT_ID>:data-automation-profile/us.data-automation-v1

Also, the dataAutomationArn parameter has been renamed to dataAutomationProjectArn to better reflect that it contains the project Amazon Resource Name (ARN). When invoking Bedrock Data Automation, you now need to specify a project or a blueprint to use. If you pass in blueprints, you will get custom output. To continue to get standard default output, configure the parameter DataAutomationProjectArn to use arn:aws:bedrock:<REGION>:aws:data-automation-project/public-default.

As the name suggests, the InvokeDataAutomationAsync operation is asynchronous. You pass the input and output configuration and, when the result is ready, it’s written on an Amazon Simple Storage Service (Amazon S3) bucket as specified in the output configuration. You can receive an Amazon EventBridge notification from Bedrock Data Automation using the notificationConfiguration parameter.

With Bedrock Data Automation, you can configure outputs in two ways:

Standard output delivers predefined insights relevant to a data type, such as document semantics, video chapter summaries, and audio transcripts. With standard outputs, you can set up your desired insights in just a few steps.
Custom output lets you specify extraction needs using blueprints for more tailored insights.

To see the new capabilities in action, I create a project and customize the standard output settings. For documents, I choose plain text instead of markdown. Note that you can automate these configuration steps using the Bedrock Data Automation API.

For videos, I want a full audio transcript and a summary of the entire video. I also ask for a summary of each chapter.

To configure a blueprint, I choose Custom output setup in the Data automation section of the Amazon Bedrock console navigation pane. There, I search for the US-Driver-License sample blueprint. You can browse other sample blueprints for more examples and ideas.

Sample blueprints can’t be edited, so I use the Actions menu to duplicate the blueprint and add it to my project. There, I can fine-tune the data to be extracted by modifying the blueprint and adding custom fields that can use generative AI to extract or compute data in the format I need.

I upload the image of a US driver’s license on an S3 bucket. Then, I use this sample Python script that uses Bedrock Data Automation through the AWS SDK for Python (Boto3) to extract text information from the image:

import json
import sys
import time

import boto3

DEBUG = False

AWS_REGION = '<REGION>'
BUCKET_NAME = '<BUCKET>'
INPUT_PATH = 'BDA/Input'
OUTPUT_PATH = 'BDA/Output'

PROJECT_ID = '<PROJECT_ID>'
BLUEPRINT_NAME = 'US-Driver-License-demo'

# Fields to display
BLUEPRINT_FIELDS = [
    'NAME_DETAILS/FIRST_NAME',
    'NAME_DETAILS/MIDDLE_NAME',
    'NAME_DETAILS/LAST_NAME',
    'DATE_OF_BIRTH',
    'DATE_OF_ISSUE',
    'EXPIRATION_DATE'
]

# AWS SDK for Python (Boto3) clients
bda = boto3.client('bedrock-data-automation-runtime', region_name=AWS_REGION)
s3 = boto3.client('s3', region_name=AWS_REGION)
sts = boto3.client('sts')


def log(data):
    if DEBUG:
        if type(data) is dict:
            text = json.dumps(data, indent=4)
        else:
            text = str(data)
        print(text)

def get_aws_account_id() -> str:
    return sts.get_caller_identity().get('Account')


def get_json_object_from_s3_uri(s3_uri) -> dict:
    s3_uri_split = s3_uri.split('/')
    bucket = s3_uri_split[2]
    key = '/'.join(s3_uri_split[3:])
    object_content = s3.get_object(Bucket=bucket, Key=key)['Body'].read()
    return json.loads(object_content)


def invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id) -> dict:
    params = {
        'inputConfiguration': {
            's3Uri': input_s3_uri
        },
        'outputConfiguration': {
            's3Uri': output_s3_uri
        },
        'dataAutomationConfiguration': {
            'dataAutomationProjectArn': data_automation_arn
        },
        'dataAutomationProfileArn': f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-profile/us.data-automation-v1"
    }

    response = bda.invoke_data_automation_async(**params)
    log(response)

    return response

def wait_for_data_automation_to_complete(invocation_arn, loop_time_in_seconds=1) -> dict:
    while True:
        response = bda.get_data_automation_status(
            invocationArn=invocation_arn
        )
        status = response['status']
        if status not in ['Created', 'InProgress']:
            print(f" {status}")
            return response
        print(".", end='', flush=True)
        time.sleep(loop_time_in_seconds)


def print_document_results(standard_output_result):
    print(f"Number of pages: {standard_output_result['metadata']['number_of_pages']}")
    for page in standard_output_result['pages']:
        print(f"- Page {page['page_index']}")
        if 'text' in page['representation']:
            print(f"{page['representation']['text']}")
        if 'markdown' in page['representation']:
            print(f"{page['representation']['markdown']}")


def print_video_results(standard_output_result):
    print(f"Duration: {standard_output_result['metadata']['duration_millis']} ms")
    print(f"Summary: {standard_output_result['video']['summary']}")
    statistics = standard_output_result['statistics']
    print("Statistics:")
    print(f"- Speaket count: {statistics['speaker_count']}")
    print(f"- Chapter count: {statistics['chapter_count']}")
    print(f"- Shot count: {statistics['shot_count']}")
    for chapter in standard_output_result['chapters']:
        print(f"Chapter {chapter['chapter_index']} {chapter['start_timecode_smpte']}-{chapter['end_timecode_smpte']} ({chapter['duration_millis']} ms)")
        if 'summary' in chapter:
            print(f"- Chapter summary: {chapter['summary']}")


def print_custom_results(custom_output_result):
    matched_blueprint_name = custom_output_result['matched_blueprint']['name']
    log(custom_output_result)
    print('\n- Custom output')
    print(f"Matched blueprint: {matched_blueprint_name}  Confidence: {custom_output_result['matched_blueprint']['confidence']}")
    print(f"Document class: {custom_output_result['document_class']['type']}")
    if matched_blueprint_name == BLUEPRINT_NAME:
        print('\n- Fields')
        for field_with_group in BLUEPRINT_FIELDS:
            print_field(field_with_group, custom_output_result)


def print_results(job_metadata_s3_uri) -> None:
    job_metadata = get_json_object_from_s3_uri(job_metadata_s3_uri)
    log(job_metadata)

    for segment in job_metadata['output_metadata']:
        asset_id = segment['asset_id']
        print(f'\nAsset ID: {asset_id}')

        for segment_metadata in segment['segment_metadata']:
            # Standard output
            standard_output_path = segment_metadata['standard_output_path']
            standard_output_result = get_json_object_from_s3_uri(standard_output_path)
            log(standard_output_result)
            print('\n- Standard output')
            semantic_modality = standard_output_result['metadata']['semantic_modality']
            print(f"Semantic modality: {semantic_modality}")
            match semantic_modality:
                case 'DOCUMENT':
                    print_document_results(standard_output_result)
                case 'VIDEO':
                    print_video_results(standard_output_result)
            # Custom output
            if 'custom_output_status' in segment_metadata and segment_metadata['custom_output_status'] == 'MATCH':
                custom_output_path = segment_metadata['custom_output_path']
                custom_output_result = get_json_object_from_s3_uri(custom_output_path)
                print_custom_results(custom_output_result)


def print_field(field_with_group, custom_output_result) -> None:
    inference_result = custom_output_result['inference_result']
    explainability_info = custom_output_result['explainability_info'][0]
    if '/' in field_with_group:
        # For fields part of a group
        (group, field) = field_with_group.split('/')
        inference_result = inference_result[group]
        explainability_info = explainability_info[group]
    else:
        field = field_with_group
    value = inference_result[field]
    confidence = explainability_info[field]['confidence']
    print(f'{field}: {value or '<EMPTY>'}  Confidence: {confidence}')


def main() -> None:
    if len(sys.argv) < 2:
        print("Please provide a filename as command line argument")
        sys.exit(1)
      
    file_name = sys.argv[1]
    
    aws_account_id = get_aws_account_id()
    input_s3_uri = f"s3://{BUCKET_NAME}/{INPUT_PATH}/{file_name}" # File
    output_s3_uri = f"s3://{BUCKET_NAME}/{OUTPUT_PATH}" # Folder
    data_automation_arn = f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-project/{PROJECT_ID}"

    print(f"Invoking Bedrock Data Automation for '{file_name}'", end='', flush=True)

    data_automation_response = invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id)
    data_automation_status = wait_for_data_automation_to_complete(data_automation_response['invocationArn'])

    if data_automation_status['status'] == 'Success':
        job_metadata_s3_uri = data_automation_status['outputConfiguration']['s3Uri']
        print_results(job_metadata_s3_uri)


if __name__ == "__main__":
    main()

The initial configuration in the script includes the name of the S3 bucket to use in input and output, the location of the input file in the bucket, the output path for the results, the project ID to use to get custom output from Bedrock Data Automation, and the blueprint fields to show in output.

I run the script passing the name of the input file. In output, I see the information extracted by Bedrock Data Automation. The US-Driver-License is a match and the name and dates in the driver’s license are printed in output.

python bda-ga.py bda-drivers-license.jpeg

Invoking Bedrock Data Automation for 'bda-drivers-license.jpeg'................ Success

Asset ID: 0

- Standard output
Semantic modality: DOCUMENT
Number of pages: 1
- Page 0
NEW JERSEY

Motor Vehicle
 Commission

AUTO DRIVER LICENSE

Could DL M6454 64774 51685                      CLASS D
        DOB 01-01-1968
ISS 03-19-2019          EXP     01-01-2023
        MONTOYA RENEE MARIA 321 GOTHAM AVENUE TRENTON, NJ 08666 OF
        END NONE
        RESTR NONE
        SEX F HGT 5'-08" EYES HZL               ORGAN DONOR
        CM ST201907800000019 CHG                11.00

[SIGNATURE]



- Custom output
Matched blueprint: US-Driver-License-copy  Confidence: 1
Document class: US-drivers-licenses

- Fields
FIRST_NAME: RENEE  Confidence: 0.859375
MIDDLE_NAME: MARIA  Confidence: 0.83203125
LAST_NAME: MONTOYA  Confidence: 0.875
DATE_OF_BIRTH: 1968-01-01  Confidence: 0.890625
DATE_OF_ISSUE: 2019-03-19  Confidence: 0.79296875
EXPIRATION_DATE: 2023-01-01  Confidence: 0.93359375

As expected, I see in output the information I selected from the blueprint associated with the Bedrock Data Automation project.

Similarly, I run the same script on a video file from my colleague Mike Chambers. To keep the output small, I don’t print the full audio transcript or the text displayed in the video.

python bda.py mike-video.mp4
Invoking Bedrock Data Automation for 'mike-video.mp4'.......................................................................................................................................................................................................................................................................... Success

Asset ID: 0

- Standard output
Semantic modality: VIDEO
Duration: 810476 ms
Summary: In this comprehensive demonstration, a technical expert explores the capabilities and limitations of Large Language Models (LLMs) while showcasing a practical application using AWS services. He begins by addressing a common misconception about LLMs, explaining that while they possess general world knowledge from their training data, they lack current, real-time information unless connected to external data sources.

To illustrate this concept, he demonstrates an "Outfit Planner" application that provides clothing recommendations based on location and weather conditions. Using Brisbane, Australia as an example, the application combines LLM capabilities with real-time weather data to suggest appropriate attire like lightweight linen shirts, shorts, and hats for the tropical climate.

The demonstration then shifts to the Amazon Bedrock platform, which enables users to build and scale generative AI applications using foundation models. The speaker showcases the "OutfitAssistantAgent," explaining how it accesses real-time weather data to make informed clothing recommendations. Through the platform's "Show Trace" feature, he reveals the agent's decision-making process and how it retrieves and processes location and weather information.

The technical implementation details are explored as the speaker configures the OutfitAssistant using Amazon Bedrock. The agent's workflow is designed to be fully serverless and managed within the Amazon Bedrock service.

Further diving into the technical aspects, the presentation covers the AWS Lambda console integration, showing how to create action group functions that connect to external services like the OpenWeatherMap API. The speaker emphasizes that LLMs become truly useful when connected to tools providing relevant data sources, whether databases, text files, or external APIs.

The presentation concludes with the speaker encouraging viewers to explore more AWS developer content and engage with the channel through likes and subscriptions, reinforcing the practical value of combining LLMs with external data sources for creating powerful, context-aware applications.
Statistics:
- Speaket count: 1
- Chapter count: 6
- Shot count: 48
Chapter 0 00:00:00:00-00:01:32:01 (92025 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hooded sweatshirt with various logos and text, is sitting at a desk in front of a colorful background. He discusses the frequent release of new large language models (LLMs) and how people often test these models by asking questions like "Who won the World Series?" The man explains that LLMs are trained on general data from the internet, so they may have information about past events but not current ones. He then poses the question of what he wants from an LLM, stating that he desires general world knowledge, such as understanding basic concepts like "up is up" and "down is down," but does not need specific factual knowledge. The man suggests that he can attach other systems to the LLM to access current factual data relevant to his needs. He emphasizes the importance of having general world knowledge and the ability to use tools and be linked into agentic workflows, which he refers to as "agentic workflows." The man encourages the audience to add this term to their spell checkers, as it will likely become commonly used.
Chapter 1 00:01:32:01-00:03:38:18 (126560 ms)
- Chapter summary: The video showcases a man with a beard and glasses demonstrating an "Outfit Planner" application on his laptop. The application allows users to input their location, such as Brisbane, Australia, and receive recommendations for appropriate outfits based on the weather conditions. The man explains that the application generates these recommendations using large language models, which can sometimes provide inaccurate or hallucinated information since they lack direct access to real-world data sources.

The man walks through the process of using the Outfit Planner, entering Brisbane as the location and receiving weather details like temperature, humidity, and cloud cover. He then shows how the application suggests outfit options, including a lightweight linen shirt, shorts, sandals, and a hat, along with an image of a woman wearing a similar outfit in a tropical setting.

Throughout the demonstration, the man points out the limitations of current language models in providing accurate and up-to-date information without external data connections. He also highlights the need to edit prompts and adjust settings within the application to refine the output and improve the accuracy of the generated recommendations.
Chapter 2 00:03:38:18-00:07:19:06 (220620 ms)
- Chapter summary: The video demonstrates the Amazon Bedrock platform, which allows users to build and scale generative AI applications using foundation models (FMs). [speaker_0] introduces the platform's overview, highlighting its key features like managing FMs from AWS, integrating with custom models, and providing access to leading AI startups. The video showcases the Amazon Bedrock console interface, where [speaker_0] navigates to the "Agents" section and selects the "OutfitAssistantAgent" agent. [speaker_0] tests the OutfitAssistantAgent by asking it for outfit recommendations in Brisbane, Australia. The agent provides a suggestion of wearing a light jacket or sweater due to cool, misty weather conditions. To verify the accuracy of the recommendation, [speaker_0] clicks on the "Show Trace" button, which reveals the agent's workflow and the steps it took to retrieve the current location details and weather information for Brisbane. The video explains that the agent uses an orchestration and knowledge base system to determine the appropriate response based on the user's query and the retrieved data. It highlights the agent's ability to access real-time information like location and weather data, which is crucial for generating accurate and relevant responses.
Chapter 3 00:07:19:06-00:11:26:13 (247214 ms)
- Chapter summary: The video demonstrates the process of configuring an AI assistant agent called "OutfitAssistant" using Amazon Bedrock. [speaker_0] introduces the agent's purpose, which is to provide outfit recommendations based on the current time and weather conditions. The configuration interface allows selecting a language model from Anthropic, in this case the Claud 3 Haiku model, and defining natural language instructions for the agent's behavior. [speaker_0] explains that action groups are groups of tools or actions that will interact with the outside world. The OutfitAssistant agent uses Lambda functions as its tools, making it fully serverless and managed within the Amazon Bedrock service. [speaker_0] defines two action groups: "get coordinates" to retrieve latitude and longitude coordinates from a place name, and "get current time" to determine the current time based on the location. The "get current weather" action requires calling the "get coordinates" action first to obtain the location coordinates, then using those coordinates to retrieve the current weather information. This demonstrates the agent's workflow and how it utilizes the defined actions to generate outfit recommendations. Throughout the video, [speaker_0] provides details on the agent's configuration, including its name, description, model selection, instructions, and action groups. The interface displays various options and settings related to these aspects, allowing [speaker_0] to customize the agent's behavior and functionality.
Chapter 4 00:11:26:13-00:13:00:17 (94160 ms)
- Chapter summary: The video showcases a presentation by [speaker_0] on the AWS Lambda console and its integration with machine learning models for building powerful agents. [speaker_0] demonstrates how to create an action group function using AWS Lambda, which can be used to generate text responses based on input parameters like location, time, and weather data. The Lambda function code is shown, utilizing external services like OpenWeatherMap API for fetching weather information. [speaker_0] explains that for a large language model to be useful, it needs to connect to tools providing relevant data sources, such as databases, text files, or external APIs. The presentation covers the process of defining actions, setting up Lambda functions, and leveraging various tools within the AWS environment to build intelligent agents capable of generating context-aware responses.
Chapter 5 00:13:00:17-00:13:28:10 (27761 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hoodie with various logos and text, is sitting at a desk in front of a colorful background. He is using a laptop computer that has stickers and logos on it, including the AWS logo. The man appears to be presenting or speaking about AWS (Amazon Web Services) and its services, such as Lambda functions and large language models. He mentions that if a Lambda function can do something, then it can be used to augment a large language model. The man concludes by expressing hope that the viewer found the video useful and insightful, and encourages them to check out other videos on the AWS developers channel. He also asks viewers to like the video, subscribe to the channel, and watch other videos.

Things to know
Amazon Bedrock Data Automation is now available via cross-region inference in the following two AWS Regions: US East (N. Virginia) and US West (Oregon). When using Bedrock Data Automation from those Regions, data can be processed using cross-region inference in any of these four Regions: US East (Ohio, N. Virginia) and US West (N. California, Oregon). All these Regions are in the US so that data is processed within the same geography. We’re working to add support for more Regions in Europe and Asia later in 2025.

There’s no change in pricing compared to the preview and when using cross-region inference. For more information, visit Amazon Bedrock pricing.

Bedrock Data Automation now also includes a number of security, governance and manageability related capabilities such as AWS Key Management Service (AWS KMS) customer managed keys support for granular encryption control, AWS PrivateLink to connect directly to the Bedrock Data Automation APIs in your virtual private cloud (VPC) instead of connecting over the internet, and tagging of Bedrock Data Automation resources and jobs to track costs and enforce tag-based access policies in AWS Identity and Access Management (IAM).

I used Python in this blog post but Bedrock Data Automation is available with any AWS SDKs. For example, you can use Java, .NET, or Rust for a backend document processing application; JavaScript for a web app that processes images, videos, or audio files; and Swift for a native mobile app that processes content provided by end users. It’s never been so easy to get insights from multimodal data.

Here are a few reading suggestions to learn more (including code samples):

– Danilo

—

How is the News Blog doing? Take this 1 minute survey!

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

2025-02-22 Jon Handler

Post Syndicated from Jon Handler original https://aws.amazon.com/blogs/big-data/improve-search-results-for-ai-using-amazon-opensearch-service-as-a-vector-database-with-amazon-bedrock/

Artificial intelligence (AI) has transformed how humans interact with information in two major ways—search applications and generative AI. Search applications include ecommerce websites, document repository search, customer support call centers, customer relationship management, matchmaking for gaming, and application search. Generative AI use cases include chatbots with Retrieval-Augmented Generation (RAG), intelligent log analysis, code generation, document summarization, and AI assistants. AWS recommends Amazon OpenSearch Service as a vector database for Amazon Bedrock as the building blocks to power your solution for these workloads.

In this post, you’ll learn how to use OpenSearch Service and Amazon Bedrock to build AI-powered search and generative AI applications. You’ll learn about how AI-powered search systems employ foundation models (FMs) to capture and search context and meaning across text, images, audio, and video, delivering more accurate results to users. You’ll learn how generative AI systems use these search results to create original responses to questions, supporting interactive conversations between humans and machines.

The post addresses common questions such as:

What is a vector database and how does it support generative AI applications?
Why is Amazon OpenSearch Service recommended as a vector database for Amazon Bedrock?
How do vector databases help prevent AI hallucinations?
How can vector databases improve recommendation systems?
What are the scaling capabilities of OpenSearch as a vector database?

How vector databases work in the AI workflow

When you’re building for search, FMs and other AI models convert various types of data (text, images, audio, and video) into mathematical representations called vectors. When you use vectors for search, you encode your data as vectors and store those vectors in a vector database. You further convert your query into a vector and then query the vector database to find related items by minimizing the distance between vectors.

When you’re building for generative AI, you use FMs such as large language models (LLMs), to generate text, video, audio, images, code, and more from a prompt. The prompt might contain text, such as a user’s question, along with other media such as images, audio, or video. However, generative AI models can produce hallucinations—outputs that appear convincing but contain factual errors. To solve for this challenge, you employ vector search to retrieve accurate information from a vector database. You add this information to the prompt in a process called Retrieval-Augmented Generation (RAG).

Why is Amazon OpenSearch Service the recommended vector database for Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides FMs from leading AI companies, and the tools to customize these FMs with your data to improve their accuracy. With Amazon Bedrock, you get a serverless, no-fuss solution to adopt your selected FM and use it for your generative AI application.

Amazon OpenSearch Service is a fully managed service that you can use to deploy and operate OpenSearch in the AWS Cloud. OpenSearch is an open source search, log analytics, and vector database solution, composed of a search engine and vector database; and OpenSearch Dashboards, a log analytics, observability, security analytics, and dashboarding solution. OpenSearch Service can help you to deploy and operate your search infrastructure with native vector database capabilities, pre-built templates, and simplified setup. API calls and integration templates streamline connectivity with Amazon Bedrock FMs, while the OpenSearch Service vector engine can deliver as low as single-digit millisecond latencies for searches across billions of vectors, making it ideal for real-time AI applications.

OpenSearch is a specialized type of database technology that was originally designed for latency- and throughput-optimized matching and retrieval of large and small blocks of unstructured text with ranked results. OpenSearch ranks results based on a measure of similarity to the search query, returning the most similar results. This similarity matching has evolved over time. Before FMs, search engines used a word-frequency scoring system called term frequency/inverse document frequency (TF/IDF). OpenSearch Service uses TF/IDF to score a document based on the rarity of the search terms in all documents and how often the search terms appeared in the document it’s scoring.

With the rise of AI/ML, OpenSearch added the ability to compute a similarity score for the distance between vectors. To search with vectors, you add vector embeddings produced by FMs and other AI/ML technologies to your documents. To score documents for a query, OpenSearch computes the distance from the document’s vector to a vector from the query. OpenSearch further provides field-based filtering and matching and hybrid vector and lexical search, which you use to incorporate terms in your queries. OpenSearch hybrid search performs a lexical and a vector query in parallel, producing a similarity score with built-in score normalization and blending to improve the accuracy of the search result compared with lexical or vector similarity alone.

OpenSearch Service supports three vector engines: Facebook AI Similarity (FAISS), Non-Metric Space Library (NMSLib), and Apache Lucene. It supports exact nearest neighbor search, and approximate nearest neighbor (ANN) search with either hierarchical navigable small world (HNSW), or Inverted File (IVF) engines. OpenSearch Service supports vector quantization methods, including disk-based vector quantization so you can optimize cost, latency, and retrieval accuracy for your solution.

Use case 1: Improve your search results with AI/ML

To improve your search results with AI/ML, you use a vector-generating ML model, most frequently an LLM or multi-modal model that produces embeddings for text and image inputs. You use Amazon OpenSearch Ingestion, or a similar technology to send your data to OpenSearch Service with OpenSearch Neural Plugin to integrate the model, using a model ID, into an OpenSearch ingest pipeline. The ingest pipeline calls Amazon Bedrock to create vector embeddings for every document during ingestion.

To query OpenSearch Service as a vector database, you use an OpenSearch neural query to call Amazon Bedrock to create an embedding for the query. The neural query uses the vector database to retrieve nearest neighbors.

The service offers pre-built CloudFormation templates that construct OpenSearch Service integrations to connect to Amazon Bedrock foundation models for remote inference. These templates simplify the setup of the connector that OpenSearch Service uses to contact Amazon Bedrock.

After you’ve created the integration, you can refer to the model_id when you set up your ingest and search pipelines.

Use case 2: Amazon OpenSearch Serverless as an Amazon Bedrock knowledge base

Amazon OpenSearch Serverless offers an auto-scaled, high-performing vector database that you can use to build with Amazon Bedrock for RAG, and AI agents, without having to manage the vector database infrastructure. When you use OpenSearch Serverless, you create a collection—a collection of indexes for your application’s search, vector, and logging needs. For vector database use cases, you send your vector data to your collection’s indices, and OpenSearch Serverless creates a vector database that provides fast vector similarity and retrieval.

When you use OpenSearch Serverless as a vector database, you pay only for storage for your vectors and the compute needed to serve your queries. Serverless compute capacity is measured in OpenSearch Compute Units (OCUs). You can deploy OpenSearch Serverless starting at just one OCU for development and test workloads for about $175/month. OpenSearch Serverless scales up and down automatically to accommodate your ingestion and search workloads.

With Amazon OpenSearch Serverless, you get an autoscaled, performant vector database that is seamlessly integrated with Amazon Bedrock as a knowledge base for your generative AI solution. You use the Amazon Bedrock console to automatically create vectors from your data in up to five data stores, including an Amazon Simple Storage Service (Amazon S3) bucket and store them in an Amazon OpenSearch Serverless collection.

When you’ve configured your data source, and selected a model, select Amazon OpenSearch Serverless as your vector store, and Amazon Bedrock and OpenSearch Serverless will take it from there. Amazon Bedrock will automatically retrieve source data from your data source, apply the parsing and chunking strategies you have configured, and index vector embeddings in OpenSearch Serverless. An API call will synchronize your data source with OpenSearch Serverless vector store.

The Amazon Bedrock retrieve_and_generate() runtime API call makes it straightforward for you to implement RAG with Amazon Bedrock and your OpenSearch Serverless knowledge base.

response = bedrock_agent_runtime_client.retrieve_and_generate(
  input={
    'text': prompt,
  },
  retrieveAndGenerateConfiguration={
    'type': 'KNOWLEDGE_BASE',
    'knowledgeBaseConfiguration': {
      'knowledgeBaseId': knowledge_base_id,
      'modelArn': model_arn,
}})

Conclusion

In this post, you learned how Amazon OpenSearch Service and Amazon Bedrock work together to deliver AI-powered search and generative AI applications and why OpenSearch Service is the AWS recommended vector database for Amazon Bedrock. You learned how to add Amazon Bedrock FMs to generate vector embeddings for OpenSearch Service semantic search to bring meaning and context to your search results. You learned how OpenSearch Serverless provides a tightly integrated knowledge base for Amazon Bedrock that simplifies using foundation models for RAG and other generative AI. Get started with Amazon OpenSearch Service and Amazon Bedrock today to enhance your AI-powered applications with improved search capabilities with more reliable generative AI outputs.

About the author

Jon Handler is Director of Solutions Architecture for Search Services at Amazon Web Services, based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads for OpenSearch. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master’s of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Rapid7 Fills Gaps in the CVE Assessment Process with AI-Generated Vulnerability Scoring in Exposure Command

2025-02-19 Rapid7

Post Syndicated from Rapid7 original https://blog.rapid7.com/2025/02/19/rapid7-fills-gaps-in-the-cve-assessment-process-with-ai-generated-vulnerability-scoring-in-exposure-command/

Rapid7 Fills Gaps in the CVE Assessment Process with AI-Generated Vulnerability Scoring in Exposure Command

The National Vulnerability Database (NVD) announced in February 2024 that it would no longer provide common vulnerability scoring system (CVSS) scores for all CVEs. Due to resource constraints and an inability to keep up with the volume of newly-disclosed vulnerabilities, NVD shifted its focus to processing vulnerabilities more efficiently by relying on vendor-provided and third-party scores rather than scoring each CVE independently.

Many organizations rely on NVD’s CVSS scores as a consistent, centralized guide to measuring the potential risk of vulnerabilities. This is especially useful for teams that don’t have the resources to conduct their own in-depth vulnerability analysis given the pace at which new CVEs are cropping up.

To address this widening gap in vulnerability scoring and ensure our customers are making informed decisions with the most accurate understanding of their current risk posture we’re excited to announce the release of AI-Generated Risk Scoring in Exposure Command. By integrating an advanced machine learning model, Exposure Command supplements existing CVSS scores by providing AI-Generated Risk Scores for CVEs where NVD does not provide them, ensuring all vulnerabilities are provided an accurate score.

The need to evolve from traditional vulnerability management practices to continuous threat and Exposure Management

Moving beyond simple risk scoring methodologies is critical for modern vulnerability management teams to stay ahead of advanced threats. For many organizations, this means adopting a Risk-Based Vulnerability Management (RBVM) approach.

Put simply, this means incorporating not just a deep and accurate understanding of how risky a given CVE is in a vacuum, but also layering on additional context related to reachability and exploitability, asset criticality, and a real-world understanding of what threat actors are actively targeting in the wild. And how all these inputs relate to the organization’s specific environment.

AI-Generated CVSS scoring in Exposure Command feeds directly into our broader Active Risk scoring methodology. More importantly, it empowers Rapid7 to produce predictive CVSS scores by analyzing vulnerability information and comparing with previous expert vulnerability analysis.

The model generates each vector individually, and once combined to form a score, results in 76% of these generated scores being in the correct severity classification. Combined with Rapid7’s Active Risk calculator, this increases to 87% of scores returning the correct classification. The remaining scores are never more than one classification out.

This insight will feed directly into and improve the overall accuracy of our Active Risk scoring models, as well as, ensure severity scores are assigned and provided to security teams faster than humanly possible, making your entire security program more resilient to external change.

By leveraging AI/ML to generate predictive risk scores, security teams benefit from:

Enhanced accuracy: Our expertly designed model trained on historical NVD data accurately provides CVSS scores.
Predictive scoring: Get immediate insight into the severity of newly-disclosed CVEs that are left unscored, without the need for manual aggregation and analysis.
Improved security posture: Ensuring all CVEs are assigned an accurate severity score, organizations are equipped with the necessary context to effectively prioritize remediation efforts and in turn strengthen their organization’s security posture.

This release represents a major step forward in our mission to provide industry-leading cybersecurity solutions. We expect these enhancements will significantly improve your ability to assess and manage vulnerabilities, giving you the confidence to stay ahead of potential threats.For more detailed information and implementation guidelines, please refer to the release notes. If you’d like to learn more about the Rapid7 AI Engine and how we’re leveraging AI across the platform, download the eBook today!

AWS Weekly Roundup: DeepSeek-R1, S3 Metadata, Elastic Beanstalk updates, and more (February 3, 2024)

2025-02-03 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-deepseek-r1-s3-metadata-elastic-beanstalk-updates-and-more-february-3-2024/

Last week, I had an amazing time attending AWS Community Day Thailand in Bangkok. This event came at an exciting time, following the recent launch of the AWS Asia Pacific (Bangkok) Region. We had over 300 attendees and featured 15 speakers from the community, including an AWS Hero and 4 AWS Community Builders who shared their technical expertise and experiences.

The highlight was definitely Jeff Barr, AWS Vice President & Chief Evangelist, delivering an inspiring keynote titled “Next-Generation Software Development”, which set the perfect tone for the day. The day kicked off with welcoming remarks from Vatsun Thirapatarapong, AWS Country Manager for Thailand, and was made even more special thanks to the tremendous support from both the AWS User Group volunteers and the AWS Thailand team.

Here’s a photo capturing the excitement from the event:

Last week’s AWS Launches
There are 30+ launches last week and here are some launches that caught my attention:

DeepSeek-R1 models now available on AWS — Channy wrote on how you can now deploy DeepSeek-R1 models in Amazon Bedrock and Amazon SageMaker AI. This helps you to build and scale generative AI applications with minimal infrastructure investment.

Amazon S3 Tables increases table limit to 10,000 per bucket — S3 Tables now supports creating up to 10,000 tables in each table bucket, allowing you to scale up to 100,000 tables across 10 buckets within an AWS Region per account.

Amazon S3 Metadata now generally available — S3 Metadata provides automated and easily queried metadata that updates in near real-time, simplifying business analytics and real-time inference applications. It supports both system-defined and custom metadata, including integration with AWS analytics services.

AWS Amplify adds TypeScript Data client support for Lambda functions — Developers can now use the Amplify Data client within AWS Lambda functions, enabling consistent type-safe data operations across frontend and backend applications.

AWS Elastic Beanstalk adds Python 3.13, .NET 9, and PHP 8.4 support on Amazon Linux 2023 — AWS Elastic Beanstalk brings the latest language features and improvements to application deployments while benefiting from Amazon Linux 2023 enhanced security and performance features.

From community.aws
Here’s my top 5 personal favorites posts from community.aws:

Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock by Manu Mishra
Building Tile Slide by Yash
Farm, Build, Fight! The survival RPG you never knew you needed by AgentOk
PEP and PDP for Secure Authorization with Cognito by Jimmy Dahlqvist
Q-Bits: Simplifying VPC configurations setup with AWS CloudFormation using Amazon Q Developer by Suruchi Saxena

Upcoming AWS and community events
Check your calendars and sign up for upcoming AWS and community events:

AWS Korea re:Invent reCap Online, February 2-4 — A virtual event recapping key announcements and innovations from re:Invent 2023 for the Korean audience.
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs. Upcoming AWS Community Day is in Ahmedabad (February 8).
AWS Public Sector Day London, February 27 — Join public sector leaders and innovators to explore how AWS is enabling digital transformation in government, education, and healthcare.
AWS Innovate GenAI + Data Edition — A free online conference focusing on generative AI and data innovations. Available in multiple Regions: APJC and EMEA (March 6), North America (March 13), Greater China Region (March 14), and Latin America (April 8).

Browse more upcoming AWS led in-person and virtual developer-focused events.

AWS Community re:Invent re:Caps

Lastly, if you want to learn about top announcements and innovations from AWS re:Invent, the AWS Community shares a summary from a community perspective of these announcements so you can get up to speed. Download the AWS Community re:Invent re:Caps deck.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Donnie

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

DeepSeek-R1 models now available on AWS

2025-01-31 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-available-on-aws/

During this past AWS re:Invent, Amazon CEO Andy Jassy shared valuable lessons learned from Amazon’s own experience developing nearly 1,000 generative AI applications across the company. Drawing from this extensive scale of AI deployment, Jassy offered three key observations that have shaped Amazon’s approach to enterprise AI implementation.

First is that as you get to scale in generative AI applications, the cost of compute really matters. People are very hungry for better price performance. The second is actually quite difficult to build a really good generative AI application. The third is the diversity of the models being used when we gave our builders freedom to pick what they want to do. It doesn’t surprise us, because we keep learning the same lesson over and over and over again, which is that there is never going to be one tool to rule the world.

As Andy emphasized, a broad and deep range of models provided by Amazon empowers customers to choose the precise capabilities that best serve their unique needs. By closely monitoring both customer needs and technological advancements, AWS regularly expands our curated selection of models to include promising new models alongside established industry favorites. This ongoing expansion of high-performing and differentiated model offerings helps customers stay at the forefront of AI innovation.

This leads us to Chinese AI startup DeepSeek. DeepSeek launched DeepSeek-V3 on December 2024 and subsequently released DeepSeek-R1, DeepSeek-R1-Zero with 671 billion parameters, and DeepSeek-R1-Distill models ranging from 1.5–70 billion parameters on January 20, 2025. They added their vision-based Janus-Pro-7B model on January 27, 2025. The models are publicly available and are reportedly 90-95% more affordable and cost-effective than comparable models. Per Deepseek, their model stands out for its reasoning capabilities, achieved through innovative training techniques such as reinforcement learning.

Today, you can now deploy DeepSeek-R1 models in Amazon Bedrock and Amazon SageMaker AI. Amazon Bedrock is best for teams seeking to quickly integrate pre-trained foundation models through APIs. Amazon SageMaker AI is ideal for organizations that want advanced customization, training, and deployment, with access to the underlying infrastructure. Additionally, you can also use AWS Trainium and AWS Inferentia to deploy DeepSeek-R1-Distill models cost-effectively via Amazon Elastic Compute Cloud (Amazon EC2) or Amazon SageMaker AI.

With AWS, you can use DeepSeek-R1 models to build, experiment, and responsibly scale your generative AI ideas by using this powerful, cost-efficient model with minimal infrastructure investment. You can also confidently drive generative AI innovation by building on AWS services that are uniquely designed for security. We highly recommend integrating your deployments of the DeepSeek-R1 models with Amazon Bedrock Guardrails to add a layer of protection for your generative AI applications, which can be used by both Amazon Bedrock and Amazon SageMaker AI customers.

You can choose how to deploy DeepSeek-R1 models on AWS today in a few ways: 1/ Amazon Bedrock Marketplace for the DeepSeek-R1 model, 2/ Amazon SageMaker JumpStart for the DeepSeek-R1 model, 3/ Amazon Bedrock Cust om Model Import for the DeepSeek-R1-Distill models, and 4/ Amazon EC2 Trn1 instances for the DeepSeek-R1-Distill models.

Let me walk you through the various paths for getting started with DeepSeek-R1 models on AWS. Whether you’re building your first AI application or scaling existing solutions, these methods provide flexible starting points based on your team’s expertise and requirements.

1. The DeepSeek-R1 model in Amazon Bedrock Marketplace
Amazon Bedrock Marketplace offers over 100 popular, emerging, and specialized FMs alongside the current selection of industry-leading models in Amazon Bedrock. You can easily discover models in a single catalog, subscribe to the model, and then deploy the model on managed endpoints.

To access the DeepSeek-R1 model in Amazon Bedrock Marketplace, go to the Amazon Bedrock console and select Model catalog under the Foundation models section. You can quickly find DeepSeek by searching or filtering by model providers.

After checking out the model detail page including the model’s capabilities, and implementation guidelines, you can directly deploy the model by providing an endpoint name, choosing the number of instances, and selecting an instance type.

You can also configure advanced options that let you customize the security and infrastructure settings for the DeepSeek-R1 model including VPC networking, service role permissions, and encryption settings. For production deployments, you should review these settings to align with your organization’s security and compliance requirements.

With Amazon Bedrock Guardrails, you can independently evaluate user inputs and model outputs. You can control the interaction between users and DeepSeek-R1 with your defined set of policies by filtering undesirable and harmful content in generative AI applications. The DeepSeek-R1 model in Amazon Bedrock Marketplace can only be used with Bedrock’s ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock. To learn more, read Implement model-independent safety measures with Amazon Bedrock Guardrails.

Amazon Bedrock Guardrails can also be integrated with other Bedrock tools including Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to build safer and more secure generative AI applications aligned with responsible AI policies. To learn more, visit the AWS Responsible AI page.

Refer to this step-by-step guide on how to deploy the DeepSeek-R1 model in Amazon Bedrock Marketplace. To learn more, visit Deploy models in Amazon Bedrock Marketplace.

2. The DeepSeek-R1 model in Amazon SageMaker JumpStart
Amazon SageMaker JumpStart is a machine learning (ML) hub with FMs, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. To deploy DeepSeek-R1 in SageMaker JumpStart, you can discover the DeepSeek-R1 model in SageMaker Unified Studio, SageMaker Studio, SageMaker AI console, or programmatically through the SageMaker Python SDK.

In the Amazon SageMaker AI console, open SageMaker Unified Studio or SageMaker Studio. In case of SageMaker Studio, choose JumpStart and search for “DeepSeek-R1” in the All public models page.

You can select the model and choose deploy to create an endpoint with default settings. When the endpoint comes InService, you can make inferences by sending requests to its endpoint.

You can derive model performance and ML operations controls with Amazon SageMaker AI features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping to support data security.

As like Bedrock Marketpalce, you can use the ApplyGuardrail API in the SageMaker JumpStart to decouple safeguards for your generative AI applications from the DeepSeek-R1 model. You can now use guardrails without invoking FMs, which opens the door to more integration of standardized and thoroughly tested enterprise safeguards to your application flow regardless of the models used.

Refer to this step-by-step guide on how to deploy DeepSeek-R1 in Amazon SageMaker JumpStart. To learn more, visit Discover SageMaker JumpStart models in SageMaker Unified Studio or Deploy SageMaker JumpStart models in SageMaker Studio.

3. DeepSeek-R1-Distill models using Amazon Bedrock Custom Model Import
Amazon Bedrock Custom Model Import provides the ability to import and use your customized models alongside existing FMs through a single serverless, unified API without the need to manage underlying infrastructure. With Amazon Bedrock Custom Model Import, you can import DeepSeek-R1-Distill Llama models ranging from 1.5–70 billion parameters. As I highlighted in my blog post about Amazon Bedrock Model Distillation, the distillation process involves training smaller, more efficient models to mimic the behavior and reasoning patterns of the larger DeepSeek-R1 model with 671 billion parameters by using it as a teacher model.

After storing these publicly available models in an Amazon Simple Storage Service (Amazon S3) bucket or an Amazon SageMaker Model Registry, go to Imported models under Foundation models in the Amazon Bedrock console and import and deploy them in a fully managed and serverless environment through Amazon Bedrock. This serverless approach eliminates the need for infrastructure management while providing enterprise-grade security and scalability.

Refer to this step-by-step guide on how to deploy DeepSeek-R1 models using Amazon Bedrock Custom Model Import. To learn more, visit Import a customized model into Amazon Bedrock.

4. DeepSeek-R1-Distill models using AWS Trainium and AWS Inferentia
AWS Deep Learning AMIs (DLAMI) provides customized machine images that you can use for deep learning in a variety of Amazon EC2 instances, from a small CPU-only instance to the latest high-powered multi-GPU instances. You can deploy the DeepSeek-R1-Distill models on AWS Trainuim1 or AWS Inferentia2 instances to get the best price-performance.

To get started, go to Amazon EC2 console and launch a trn1.32xlarge EC2 instance with the Neuron Multi Framework DLAMI called Deep Learning AMI Neuron (Ubuntu 22.04).

Once you have connected to your launched ec2 instance, install vLLM, an open-source tool to serve Large Language Models (LLMs) and download the DeepSeek-R1-Distill model from Hugging Face. You can deploy the model using vLLM and invoke the model server.

To learn more, refer to this step-by-step guide on how to deploy DeepSeek-R1-Distill Llama models on AWS Inferentia and Trainium.

You can also visit the DeepSeek-R1-Distill-Llama-8B or deepseek-ai/DeepSeek-R1-Distill-Llama-70B model cards on Hugging Face. Choose Deploy and then Amazon SageMaker. From the AWS Inferentia and Trainium tab, copy the example code for deploy DeepSeek-R1-Distill Llama models.

Since the release of DeepSeek-R1, various guides of its deployment for Amazon EC2 and Amazon Elastic Kubernetes Service (Amazon EKS) have been posted. Here is some additional material for you to check out:

Things to know
Here are a few important things to know.

Pricing – For publicly available models like DeepSeek-R1, you are charged only the infrastructure price based on inference instance hours you select for Amazon Bedrock Markeplace, Amazon SageMaker JumpStart, and Amazon EC2. For the Bedrock Custom Model Import, you are only charged for model inference, based on the number of copies of your custom model is active, billed in 5-minute windows. To learn more, check out the Amazon Bedrock Pricing, Amazon SageMaker AI Pricing, and Amazon EC2 Pricing pages.
Data security – You can use enterprise-grade security features in Amazon Bedrock and Amazon SageMaker to help you make your data and applications secure and private. This means your data is not shared with model providers, and is not used to improve the models. This applies to all models—proprietary and publicly available—like DeepSeek-R1 models on Amazon Bedrock and Amazon SageMaker. To learn more, visit Amazon Bedrock Security and Privacy and Security in Amazon SageMaker AI.

Now available
DeepSeek-R1 is generally available today in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. You can also use DeepSeek-R1-Distill models using Amazon Bedrock Custom Model Import and Amazon EC2 instances with AWS Trainum and Inferentia chips.

Give DeepSeek-R1 models a try today in the Amazon Bedrock console, Amazon SageMaker AI console, and Amazon EC2 console, and send feedback to AWS re:Post for Amazon Bedrock and AWS re:Post for SageMaker AI or through your usual AWS Support contacts.

— Channy

Luma AI’s Ray2 video model is now available in Amazon Bedrock

2025-01-23 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/luma-ai-ray-2-video-model-is-now-available-in-amazon-bedrock/

As we preannounced at AWS re:Invent 2024, you can now use Luma AI Ray2 video model in Amazon Bedrock to generate high-quality video clips from text, creating captivating motion graphics from static concepts. AWS is the first and only cloud provider to offer fully managed models from Luma AI.

On January 16, 2025, Luma AI introduced Luma Ray2, the large–scale video generative model capable of creating realistic visuals with natural, coherent motion with strong understanding of text instructions. Luma Ray2 exhibits advanced capabilities as a result of being trained on Luma’s new multi-modal architecture. It scales to ten times compute of Ray1, enabling it to produce 5 second or 9 second video clips that show fast coherent motion, ultra-realistic details, and logical event sequences with 540p and 720p resolution.

With Luma Ray2 in Amazon Bedrock, you can add high-quality, realistic, production-ready videos generated from text in your generative AI application through a single API. Luma Ray2 video model understands the interactions between people, animals, and objects, and you can create consistent and physically accurate characters through state-of-the-art natural language instruction understanding and reasoning.

You can use Ray2 video generations for content creation, entertainment, advertising, and media use cases, streamlining the creative process, from concept to execution. You can generate smooth, cinematic, and lifelike camera movements that match the intended emotion of the scene. You can rapidly experiment with different camera angles and styles and deliver creative outputs for architecture, fashion, film, graphic design, and music.

Let’s take a look at the impressive video generations by Luma Ray2 that Luma has published.

Get started with Luma Ray2 model in Amazon Bedrock
Before getting started, if you are new to using Luma models, go to the Amazon Bedrock console and choose Model access on the bottom left pane. To access the latest Luma AI models, request access for Luma Ray2 in Luma AI.

To test the Luma AI model in Amazon Bedrock, choose Image/Video under Playgrounds in the left menu pane. Choose Select model, then select Luma AI as the category and Ray as the model.

For video generation models, you should have an Amazon Simple Storage Service (Amazon S3) bucket to store all generated videos. This bucket will be created in your AWS account, and Amazon Bedrock will have read and write permissions for it. Choose Confirm to create a bucket and generate a video.

I will generate a 5-second video with 720P and 24 frames per second with 16:9 aspect ratio for my prompt.

Here is an example prompt and generated video. You can download it stored in the S3 bucket.
a humpback whale swimming through space particles

Here are another featured examples to demonstrate Ray2 model.

Prompt 1: A miniature baby cat is walking and exploring on the surface of a fingertip

Prompt 2: A massive orb of water floating in a backlit forest

Prompt 3: A man plays saxophone by @ziguratt

Prompt 4: Macro closeup of a bee pollinating

To check out more examples and generated videos, visit the Luma Ray2 page.

By choosing View API request in the Bedrock console, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. You can use luma.ray-v2:0 as the model ID.

Here is a sample of the AWS CLI command:

aws bedrock-runtime invoke-model \
    --model-id luma.ray-v2:0 \
    --region us-west-2 \
    --body "{\"modelInput\":{\"taskType\":\"TEXT_VIDEO\",\"textToVideoParams\":{\"text\":\"a humpback whale swimming through space particles\"},\"videoGenerationConfig\":{\"seconds\":6,\"fps\":24,\"dimension\":\"1280x720\"}},\"outputDataConfig\":{\"s3OutputDataConfig\":{\"s3Uri\":\"s3://your-bucket-name\"}}}"
     invoke-model-output.txt

You can use Converse API examples to generate videos using AWS SDKs to build your applications using various programming languages.

Now available
Luma Ray2 video model is generally available today in Amazon Bedrock in the US West (Oregon) AWS Region. Check the full Region list for future updates. To learn more, check out the Luma AI in Amazon Bedrock product page and the Amazon Bedrock Pricing page.

Give Luma Ray2 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Unlocking the Power of AI in Cybersecurity: Key Takeaways from Our Latest Webinar

2025-01-10 Emma Burdett

Post Syndicated from Emma Burdett original https://blog.rapid7.com/2025/01/10/unlocking-the-power-of-ai-in-cybersecurity-key-takeaways-from-our-latest-webinar/

Unlocking the Power of AI in Cybersecurity: Key Takeaways from Our Latest Webinar

Today’s SOC teams have to face dramatic challenges that include overwhelming volumes of alerts, blurred perimeter protections, and resource constraints; meanwhile, AI is bursting into SOC workflows as one of the most important elements in addressing these issues more productively and letting teams truly focus on what matters most.

In our recent webinar, “Enhancing MDR with AI: Real-World Use Cases & Security Insights,” cybersecurity and AI experts shared their perspectives on how advancements in artificial intelligence are reshaping security operations. The session featured Hannah Coakley (Product Manager, Rapid7), Katie Wilbur (Senior Data Scientist, Rapid7), and Steven Warwick (Solutions Architect, AWS), who discussed the role of AI in addressing today’s most pressing challenges in SOC environments.

Here’s a snapshot of what we covered and why you’ll want to watch the full webinar.

AI-Powered Auto Triaging Enhances SOC Efficiency
AI models can categorize thousands of daily alerts, filtering out noise and prioritizing critical threats. This allows analysts to focus their attention on incidents that matter most, improving response times and reducing manual workloads.
Generative AI Speeds Up and Standardizes Reporting
Incident reporting, a traditionally time-intensive task, is streamlined with generative AI. By producing consistent first drafts, it saves time and ensures clarity in reports, enabling quicker decision-making in high-pressure environments.
Responsible AI Practices Build Trust and Transparency
Effective AI implementation requires keeping humans in the loop to verify outputs and reduce biases. Responsible AI supports analysts rather than replacing them, ensuring its use enhances security efforts while maintaining trust.

You’ll Also Learn

The challenges SOCs face with alert volume and how AI helps address this issue.
The trade-off between explainability and accuracy when selecting AI models for cybersecurity.
How rigorous testing ensures AI models adapt to evolving threats in the cybersecurity landscape.

These are just a few of the insights that came out of an engaging session on the future of AI in cybersecurity. For a deeper dive into how AI is transforming SOC workflows and reshaping the field, watch the full webinar.

Watch the full webinar here to find out how integrating AI into your SOC closes the security gap and enables your team to work at its best.

New Research: Enhancing Botnet Detection with AI using LLMs and Similarity Search

2025-01-08 Tom Caiazza

Post Syndicated from Tom Caiazza original https://blog.rapid7.com/2025/01/08/new-research-enhancing-botnet-detection-with-ai-using-llms-and-similarity-search/

New Research: Enhancing Botnet Detection with AI using LLMs and Similarity Search

As botnets continue to evolve, so do the techniques required to detect them. While Transport Layer Security (TLS) encryption is widely adopted for secure communications, botnets leverage TLS to obscure command-and-control (C2) traffic. These malicious actors often have identifiable characteristics embedded within their TLS certificates, opening a potential pathway for advanced detection techniques.

In first-of-its-kind research, Rapid7’s Dr. Stuart Millar, in collaboration with Kumar Shashwat, Francis Hahn and Prof. Xinming Ou, at the University of South Florida, studied the use of AI large language models (LLMs) to detect botnets’ use of TLS encryption by analyzing embedding similarities to weed out botnets within a sea of benign TLS certificates. The work was presented at AISec 2024 in Salt Lake City as part of the leading ACM CCS conference toward the end of last year, where previously Rapid7 collected the best paper award.

Botnets — networks of hacked devices that attackers control remotely — often use TLS encryption to hide their activity. This encryption keeps the traffic secure, making it challenging for traditional security tools to detect whether a device is part of a botnet. Millar and company found they could detect botnets by analyzing the unique characteristics in the TLS certificates that each server uses to identify itself, dramatically reducing the time and human effort required.

Large language models can represent text as embeddings, or numerical vectors that capture the meaning and structure of the text. These embeddings were used to create vector representations of the text in TLS certificates, such as the organization names and country codes listed on them. By projecting these representations into a vector space and then using a similarity search, any new certificate can first be compared to a known set of botnet and benign certificates, and then a decision made as to whether or not it is malicious.

They found that in using an open-source LLM called C-BERT, the model achieved an accuracy rate of 0.994, surpassing proprietary alternatives in accuracy, speed, and cost-efficiency. This means it could reliably distinguish between botnet and benign certificates far more effectively and efficiently than standard practices, which was confirmed through random sampling.

In order to simulate a real world scenario, the researchers tested the model on 150,000 TLS certificates. They found 13 certificates as potential botnets which, when verified against a malware detection service, yielded one certificate that was found to be malicious. This approach eliminated the time intensive and costly process of identifying malicious botnets manually.

The model was also able to identify zero-day botnets, or those that had not yet been documented before. By omitting certain known botnets during training and then testing with these omitted samples, they demonstrated that the model could still detect them, even without prior exposure.

Deploying this AI solution in a real-world environment offers cybersecurity teams a substantial advantage in botnet detection by reducing false positives and minimizing manual inspection. Future research aims to expand the range of certificate attributes used in embeddings, improve real-time processing capabilities, and integrate additional datasets for a broader scope. Explore the full research paper for an in-depth look at the methodology and results of an LLM-based approach to botnet TLS certificate detection.

Rapid7 Extends Cloud Security Capabilities with Updates to Exposure Command

2024-12-06 Ryan Blanchard

Post Syndicated from Ryan Blanchard original https://blog.rapid7.com/2024/12/06/rapid7-extends-cloud-security-capabilities-with-updates-to-exposure-command/

Rapid7 Extends Cloud Security Capabilities with Updates to Exposure Command

The cloud has become the backbone of modern innovation, powering everything from AI to remote work. But as organizations embrace the cloud, they also face an ever-expanding and increasingly complex attack surface. With purpose-built harvesting technology providing real-time visibility into everything running across multi-cloud environments, Exposure Command from Rapid7 ensures teams have an up-to-date inventory, mapping their cloud attack surface and enriching asset data with risk and business context.

To ensure teams can keep up with the torrid pace of innovation and overcome increased complexity, Rapid7 remains dedicated to investing in advancing the cloud security capabilities available within Exposure Command. To that end, we’ve made a few significant updates across AI resource coverage, third-party CNAPP enrichment and more. Let’s dive right in.

Extending coverage for securing AI/ML development in the cloud

AI and machine learning (ML) are transforming industries, but the speed of adoption can often leave organizations vulnerable. AI/ML workloads often process sensitive or proprietary data, requiring robust protections to ensure compliance with ever-evolving regulations. Safeguarding these environments isn’t just about securing the infrastructure; it’s about understanding the unique workflows and ensuring compliance at every step.

These workloads also introduce unique risks, such as model poisoning attacks or vulnerabilities in APIs, creating new vectors for data exfiltration and service disruption. Additionally, the dynamic nature of cloud-hosted AI services presents challenges in maintaining secure configurations as resources scale elastically, potentially exposing sensitive endpoints or misconfigured setups.

To that end, Exposure Command has expanded support for critical AI services like Amazon Comprehend and Polly, AWS’s natural language processing and text-to-speech services.This provides comprehensive visibility across an organization’s attack surface, aligning AI-specific risks with broader enterprise priorities.

Shifting left and securing the software supply chain

Developers are at the forefront of modern cloud environments, making “shift-left” strategies essential for effective security. By addressing risks during development rather than after deployment, teams can eliminate vulnerabilities before they become costly issues.

Exposure Command now offers more robust Infrastructure-as-Code (IaC) scanning and deeper CI/CD integration, with Terraform and CloudFormation support across hundreds of resource types. For development teams, integrations like GitLab, GitHub Actions, AWS CloudFormation, and Azure DevOps bring security checks directly into their workflows. Whether it’s identifying misconfigurations in AWS Glue Catalogs or assessing risks in SES configurations, these tools help teams secure their code without breaking their stride.

Bridging the hybrid cloud gap with native and third-party CNAPP connectors

For many organizations, the challenge isn’t just securing the cloud – it’s securing everything holistically. Hybrid environments that span on-prem systems and multiple cloud providers can create silos, leading to gaps in visibility and risk management. To tackle this, we’ve integrated InsightCloudSec data directly into Surface Command, empowering security teams with a unified view of their entire attack surface in one place.

But we didn’t stop at consolidating our own native CNAPP capabilities. Teams now get out-of-the-box integrations with popular cloud security tools like Wiz and Orca as well as CSP-native services like AWS Inspector, all making it easier than ever to identify risks across cloud-native and hybrid environments. Everything can now be seen in one place – from endpoint vulnerabilities to cloud misconfigurations and overly permissive roles – allowing for faster action with clarity and precision.

Tackling virtual desktop risks with custom registry keys

With the rise of remote work, virtual desktop infrastructures (VDIs) like AWS Workspaces have become essential. Yet, their dynamic nature makes tracking vulnerabilities a challenge. Exposure Command addresses this with features like custom registry keys for golden images, ensuring you can trace a risk back to its source and effectively prioritize remediation.

Commanding the cloud attack surface

The challenges of securing modern environments aren’t going away. Attack surfaces will continue to expand, threats will grow more sophisticated, and organizations will face increasing pressure to innovate securely.

Keep an eye out for more updates coming soon as we continue to invest in helping organizations effectively manage exposures from endpoint to cloud.

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

2024-12-04 G2 Krishnamoorthy

Post Syndicated from G2 Krishnamoorthy original https://aws.amazon.com/blogs/big-data/the-next-generation-of-amazon-sagemaker-the-center-for-all-your-data-analytics-and-ai/

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker, the center for all of your data, analytics, and AI.

The relationship between analytics and AI is rapidly evolving. Our customers are telling us that they are seeing their analytics and AI workloads increasingly converge around a lot of the same data, and this is changing how they are using analytics tools with their data. They aren’t using analytics and AI tools in isolation. They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications.

We want to make it streamlined for our customers to work with their data, whether for analytics or AI, help them get to AI-ready data faster, and improve productivity of all data and AI workers. The next generation of SageMaker is set to do just that.

Introducing the next generation of SageMaker

The rise of generative AI is changing how data and AI teams work together. For example, when a retail data analyst creates customer segmentation reports, those same datasets are now being used by AI teams to train recommendation engines. Or customer service teams analyzing call logs to track common issues are now using that data to train AI chatbots to handle routine inquiries. Our customers tell us that they need tools that help data and AI teams collaborate seamlessly, but they face real challenges: data is siloed and scattered across systems, they have to build and maintain complex data pipelines, and teams struggle to access and use data efficiently due to inconsistent access controls. Customers also need to make sure that their data practices remain secure, reliable, and compliant with regulations. They need data that’s not just accessible, but also trustworthy and properly governed to keep up with growing business demands and AI opportunities.

The next generation of SageMaker, an integrated experience for data, analytics, and AI, addresses these challenges and more. SageMaker brings together widely adopted AWS ML and analytics capabilities—virtually all of the components you need for data exploration, preparation, and integration; petabyte-scale big data processing; fast SQL analytics; model development and training; governance; and generative AI development. SageMaker helps you work faster and smarter with your data and build powerful analytics and AI solutions that are deeply rooted in your unique data assets, giving you an edge over the competition.

Unified tools: Collaborate and build faster with one data and AI development environment

The rapid evolution of data and AI roles demands a revolution in the services and tools that power your work, driving a need for collaboration and teamwork across your entire organization. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI. Collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics with Amazon Q Developer, the most capable generative AI assistant for software development, helping you along the way. All your favorite functionality and tools, like standalone studios, query editors, and visual tools, are now available in one place, helping you discover and prepare data with ease, author queries or code, and get to insights faster.

SageMaker also comes with built-in generative AI powered by Amazon Q Developer that guides you along the way of your data and AI journey, transforming complex tasks into intuitive conversations. Ask questions in plain English to find the right datasets, automatically generate SQL queries, or create data pipelines without writing code. This isn’t just about making data management effortless—it’s about using AI to make your data work harder for you, unlocking insights that might otherwise remain hidden, and enabling everyone in your organization to work with data confidently, regardless of their technical expertise.

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth, experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. Moving forward, we’ll refer to this set of AI/ML capabilities as SageMaker AI, and we’ll continue to innovate and expand on them to make sure the new SageMaker remains the premier center for building, training, and deploying AI models. With improved access and collaboration, you’ll be able to create and securely share analytics and AI artifacts and bring data and AI products to market faster.

Unified data: Reduce data silos with an open lakehouse to unify all your data

We see organizations embarking on digital transformations and needing to quickly adapt to ever-evolving customer demands. In doing so, a unified view across all their data is required—one that breaks down data silos and simplifies data usage for teams, without sacrificing the depth and breadth of capabilities that make AWS tools unbelievably valuable. This balance between unification and maintaining advanced capabilities is key to supporting our customers’ ongoing innovation and adaptability in a rapidly changing technological landscape.

Amazon SageMaker Lakehouse, now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses. SageMaker Lakehouse enables seamless data access directly in the new SageMaker Unified Studio and provides the flexibility to access and query your data with all Apache Iceberg-compatible tools on a single copy of analytics data. With this launch, you can query data regardless of where it is stored with support for a wide range of use cases, including analytics, ad-hoc querying, data science, machine learning, and generative AI. You’ll get a single unified view of all your data for your data and AI workers, regardless of where the data sits, breaking down your data siloes. We’ve simplified data architectures, saving you time and costs on unnecessary data movement, data duplication, and custom solutions.

Additionally, we are advancing towards a zero-ETL future by expanding integrations that make data from multiple operational, transactional, and application sources available in SageMaker Lakehouse and Amazon Redshift. Zero-ETL integrations simplify data movement and ingestion, enabling increased agility, reduced costs, and minimized operational overhead while providing near real-time insights for AI and ML initiatives. All the existing Amazon Redshift zero-ETL integrations are seamlessly available within SageMaker—you can move transactional data from databases like Amazon Aurora, Amazon Relational Database Service (Amazon RDS), and Amazon DynamoDB into Amazon Redshift without performance impact and ingest high-volume real-time data from Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK) with native streaming services integrations. We announced SageMaker Lakehouse and Amazon Redshift support for zero-ETL integrations from eight applications, including Salesforce, Zendesk, ServiceNow, Zoho CRM, Salesforce Pardot, SAP, Facebook Ads, and Instagram Ads. This new capability streamlines data replication and ingestion into a unified process, minimizing the need for custom data replication pipelines. With automatic pipeline maintenance, the solution minimizes the complexity of building in-house connectors, reduces implementation and operational costs, and accelerates insights by unifying data from diverse applications.

“We have spent the last 18 months working with AWS to transform our data foundation to use best-in-class solutions that are cost-effective as well. With advancements like SageMaker Unified Studio and SageMaker Lakehouse, we expect to accelerate our velocity of delivery through seamless access to data and services, thus enabling our engineers, analysts, and scientists to surface insights that provide material value to our business.”

– Lee Slezak, SVP of Data and Analytic, Lennar

Unified governance: Meet your enterprise security needs with built-in data and AI governance

When it comes to data and AI governance, discipline equals freedom. The right governance practices can enable your teams to move faster. Data teams struggle to find a unified approach that enables effortless discovery, understanding, and assurance of data quality and security across various sources. Our customers tell us that the fragmented nature of permissions and access controls, managed separately within individual data sources and tools, leads to inconsistent implementation and potential security risks.

SageMaker simplifies the discovery, governance, and collaboration for data and AI across your lakehouse, AI models, and applications. With Amazon SageMaker Catalog, built on Amazon DataZone, you can define and enforce access policies consistently using a single permission model with fine-grained access controls. This unified catalog enables engineers, data scientists, and analysts to securely discover and access approved data and models using semantic search with generative AI-created metadata. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.

Having confidence in your data is key. SageMaker Catalog provides comprehensive data quality capabilities, including data profiling, data quality recommendations, monitoring of data quality rules, and alerts. By combining rule-based and ML approaches, we help you reconcile entities and deliver high-quality data, giving you the tools to make confident business decisions. You’ll have trust in your data, with real-time visibility of data quality and data and ML lineage, allowing you to resolve hard-to-find quality challenges. Automate data profiling and data quality recommendations, monitor data quality rules, and receive alerts. Resolve hard-to-find data quality challenges by using rule-based and ML approaches to reconcile entities, enabling you to deliver high-quality data to make confident business decisions.

Beyond discovery and collaboration, SageMaker takes AI governance to the next level by providing robust safeguards and tools to develop responsible AI policies. This holistic approach not only streamlines operations, but also builds and maintains trust throughout the organization, setting a new standard for responsible and efficient AI development and deployment.

Innovate faster with the convergence of data, analytics and AI

The next generation of SageMaker delivers an integrated experience to access, govern, and act on all your data by bringing together widely adopted AWS data, analytics, and AI capabilities. Collaborate and build faster from a unified studio using familiar AWS tools for model development, generative AI, data processing, and SQL analytics, with Amazon Q Developer assisting you along the way. Access all your data, whether it’s stored in data lakes, data warehouses, or third-party or federated data sources. And move with confidence and trust with built-in governance to address enterprise security needs. The tools to transform your business are here. We’re excited to see what you’ll build next!

To learn more, check out the following AWS News blog announcements:

About the authors

Use Amazon Q Developer to build ML models in Amazon SageMaker Canvas

2024-12-04 Elizabeth Fuentes

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/use-amazon-q-developer-to-build-ml-models-in-amazon-sagemaker-canvas/

As a data scientist, I’ve experienced firsthand the challenges of making machine learning (ML) accessible to business analysts, marketing analysts, data analysts, and data engineers who are experts in their domains without ML experience. That’s why I’m particularly excited about today’s Amazon Web Services (AWS) announcement that Amazon Q Developer is now available in Amazon SageMaker Canvas. What catches my attention is how Amazon Q Developer helps connect ML expertise with business needs, making ML more accessible across organizations.

Amazon Q Developer helps domain experts build accurate, production-quality ML models through natural language interactions, even if they don’t have ML expertise. Amazon Q Developer guides these users by breaking down their business problems and analyzing their data to recommend step-by-step guidance for building custom ML models. It transforms users’ data to remove anomalies, and builds and evaluates custom ML models to recommend the best one, while providing users control and visibility into every step of the guided ML workflow. This empowers organizations to innovate faster with reduced time to market. It also reduces their reliance on ML experts so their specialists can focus on more complex technical challenges.

For example, a marketing analyst can state, “I want to predict home sales prices using home characteristics and past sales data”, and Amazon Q Developer will translate this into a set of ML steps, analyzing relevant customer data, building multiple models, and recommending the best approach.

Let’s see it in action
To start using Amazon Q Developer, I follow the Getting started with using Amazon SageMaker Canvas guide to launch the Canvas application. In this demo, I use natural language instructions to create a model to predict house prices for marketing and finance teams. From the SageMaker Canvas page, I select Amazon Q and then choose Start a new conversation.

In the new conversation I write:

I am an analyst and need to predict house prices for my marketing and finance teams.

Next, Amazon Q Developer explains the problem and recommends the appropriate ML model type. It also outlines the solution requirements, including the necessary dataset characteristics. Amazon Q Developer then asks if I want to upload my dataset or I want to choose a target column. I select it to upload my dataset.

In the next step, Amazon Q Developer lists the dataset requirements, which include relevant information about houses, current house prices, and the target variable for the regression model. It then recommended next steps, including: I want to upload my dataset, Select an existing dataset, Create a new dataset or I want to choose a target column. For this demo, I’ll use the canvas-sample-housing.csv sample dataset as my existing dataset.

After selecting and loading the dataset, Amazon Q Developer analyzes it and suggests median_house_value as the target column for the regression model. I accept by selecting I would like to predict the “median_house_value” column. Moving on to the next step, Amazon Q Developer details which dataset features (such as “location”, “housing_median_age”, and “total_rooms”) it will use to predict the median_house_value.

Before moving forward with model training, I ask about the data quality, because without good data we can’t build a reliable model. Amazon Q Developer responds with quality insights for my entire dataset.

I can ask specific questions about individual features and their distributions to better understand the data quality.

To my surprise, through the previous question, I discovered that the “households” column has a wide variation between extreme values, which could affect the model’s prediction accuracy. Therefore, I ask Amazon Q Developer to fix this outlier problem.

After the transformation is done, I can ask what steps Amazon Q Developer followed to make this change. Behind the scenes, Amazon Q Developer applies advanced data preparation steps using SageMaker Canvas data preparation capabilities, which I can review and see the steps so that I can visualize and replicate the process to get the final, prepared dataset for training the model.

After reviewing the data preparation steps, I select Launch my training job.

After the training job is launched, I can see its progress in the conversation, and the datasets created.

As a data scientist, I particularly appreciate that, with Amazon Q Developer, Ican see detailed metrics such as the confusion matrix and precision-recall scores for classification models and root mean square error (RMSE) for regression models. These are crucial elements I always look for when evaluating model performance and making data-driven decisions, and it’s refreshing to see them presented in a way that’s accessible to nontechnical users to build trust and enable proper governance while maintaining the depth that technical teams need.

You can access these metrics by selecting the new model from My Models or from the Amazon Q conversation menu:

Overview – This tab shows the Column impact analysis. In this case, median_income emerges as the primary factor influencing my model.
Scoring – This tab provides model accuracy insights, including RMSE metrics.
Advanced metrics – This tab displays the detailed Metrics table, Residuals and Error density for in-depth model evaluation.

Analyze My Model

After reviewing these metrics and validating the model’s performance, I can move to the final stages of the ML workflow:

Predictions – I can test my model using the Predictions tab to validate its real-world performance.
Deployment – I can create an endpoint deployment to make my model available for production use.

This simplifies the deployment process, a step that traditionally requires significant DevOps knowledge, into a straightforward operation that business analysts can handle confidently.

predictions and deploy

Things to know
Amazon Q Developer democratizes ML across organizations:

Empowering all skill levels with ML – Amazon Q Developer is now available in SageMaker Canvas, helping business analysts, marketing analysts, and data professionals who don’t have ML experience create solutions for business problems through a guided ML workflow. From data analysis and model selection to deployment, users can solve business problems using natural language, reducing dependence on ML experts such as data scientists and enabling organizations to innovate faster with reduced time to market.

Streamlining the ML workflow – With Amazon Q Developer available in SageMaker Canvas, users can prepare data, and build, analyze, and deploy ML models through a guided, transparent workflow. Amazon Q Developer provides advanced data preparation and AutoML capabilities that democratize ML, and allows non-ML experts to produce highly-accurate ML models.

Providing full visibility into the ML workflow – Amazon Q Developer provides full transparency by generating the underlying code and technical artifacts such as data transformation steps, model explainability, and accuracy measures. This allows cross-functional teams, including ML experts, to review, validate, and update the models as needed, facilitating collaboration in a secure environment.

Availability – Amazon Q Developer is now in preview release in Amazon SageMaker Canvas.

Pricing – Amazon Q Developer is now available in SageMaker Canvas at no additional cost to both Amazon Q Developer Pro Tier and Amazon Q Developer Free tier users. However, standard charges apply for resources such as SageMaker Canvas workspace instances and any resources used for building or deploying models. For detailed pricing information, visit the Amazon SageMaker Canvas Pricing.

To learn more about getting started visit the Amazon Q Developer product web page.

— Eli

Benefits of SageMaker Unified Studio

A new integrated way of working

Building from a single data and AI development environment

Reducing time-to-value in a unified environment

Getting started with SageMaker Unified Studio

About the authors

AI-Powered Auto-Triage

Currently Available

NEW: AI Alert Triage Decisioning Transparency

AI-Generated Incident Reports

Currently Available

AI-Powered MDR SOC Assistant

Currently Available

AI-Driven Threat Detections

Currently Available

The time to embrace AI is now

Options in AI-driven scaling and optimization

Which workloads to consider for AI-driven scaling and optimizations

Cost-effectiveness of AI-driven scaling and optimization

Benchmark conducted and results

Queries and elapsed time

Capacity used during the tests

Conclusion

About the Authors

How vector databases work in the AI workflow

Why is Amazon OpenSearch Service the recommended vector database for Amazon Bedrock?

Use case 1: Improve your search results with AI/ML

Use case 2: Amazon OpenSearch Serverless as an Amazon Bedrock knowledge base

Conclusion

About the author

#10 Deploy Stable Diffusion ComfyUI on AWS elastically and efficiently

#9 Let’s Architect! Designing Well-Architected systems

#8 Let’s Architect! Learn About Machine Learning on AWS

#7 Creating an organizational multi-Region failover strategy

#6 Building a three-tier architecture on a budget

#5 Announcing updates to the AWS Well-Architected Framework guidance

#4 Let’s Architect! Serverless developer experience in AWS

#3 London Stock Exchange Group uses chaos engineering on AWS to improve resilience

#2 Achieving Frugal Architecture using the AWS Well-Architected Framework Guidance

#1 How an insurance company implements disaster recovery of 3-tier applications

Thank you!

You’ll Also Learn

Extending coverage for securing AI/ML development in the cloud

Shifting left and securing the software supply chain

Bridging the hybrid cloud gap with native and third-party CNAPP connectors

Tackling virtual desktop risks with custom registry keys

Commanding the cloud attack surface

Introducing the next generation of SageMaker

Unified tools: Collaborate and build faster with one data and AI development environment

Unified data: Reduce data silos with an open lakehouse to unify all your data

Unified governance: Meet your enterprise security needs with built-in data and AI governance

Innovate faster with the convergence of data, analytics and AI

About the authors

The collective thoughts of the interwebz