Tag Archives: news

STH Q4 2024 Letter from the Editor Re-aligning

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/sth-q4-2024-letter-from-the-editor-re-aligning/

Every quarter, I like to do a small update to give our readers a behind-the-scenes look at what is happening. Often, there is a big difference between what folks see publicly and the inner workings of STH, so I like to peel that back. This quarter was a big lift on the growth side, and […]

The post STH Q4 2024 Letter from the Editor Re-aligning appeared first on ServeTheHome.

Stable Diffusion 3.5 Large is now available in Amazon Bedrock

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/stable-diffusion-3-5-large-is-now-available-in-amazon-bedrock/

As we preannounced at AWS re:Invent 2024, you can now use Stable Diffusion 3.5 Large in Amazon Bedrock to generate high-quality images from text descriptions in a wide range of styles to accelerate the creation of concept art, visual effects, and detailed product imagery for customers in media, gaming, advertising, and retail.

In October 2024, Stability AI introduced Stable Diffusion 3.5 Large, the most powerful model in the Stable Diffusion family at 8.1 billion parameters trained on Amazon SageMaker HyperPod, with superior quality and prompt adherence. Stable Diffusion 3.5 Large can accelerate storyboarding, concept art creation, and rapid prototyping of visual effects. You can quickly generate high-quality 1-megapixel images for campaigns, social media posts, and advertisements, saving time and resources while maintaining creative control.

Stable Diffusion 3.5 Large offers users nearly endless creative possibilities, including:

  • Versatile Styles – You can generate images in a wide range of styles and aesthetics, including 3-dimentional, photography, painting, line art, and virtually any visual style you can imagine.
  • Prompt Adherence – You can use Stable Diffusion 3.5 Large’s advanced prompt adherence to closely follow your text prompts, making it a top choice for efficient, high-quality performance.
  • Diverse Outputs – You can create images representative of the diverse world around you, featuring people with different skin tones and features, without the need for extensive prompting.

Today, Stable Image Ultra in Amazon Bedrock has been updated to include Stable Diffusion 3.5 Large in the model’s underlying architecture. Stable Image Ultra, powered by Stability AI’s most advanced models, including Stable Diffusion 3.5, sets a new standard in image generation. It excels in typography, intricate compositions, dynamic lighting, vibrant colors, and artistic cohesion.

With the latest update of Stable Diffusion models in Amazon Bedrock, you have a broader set of solutions to boost your creativity and accelerate image generation workflows.

Get started with Stable Diffusion 3.5 Large in Amazon Bedrock
Before getting started, if you are new to using Stability AI models, go to the Amazon Bedrock console and choose Model access on the bottom left pane. To access the latest Stability AI models, request access for Stable Diffusion 3.5 Large in Stability AI.

To test the Stability AI models in Amazon Bedrock, choose Image/Video under Playgrounds in the left menu pane. Then choose Select model and select Stability AI as the category and Stable Diffusion 3.5 Large as the model.

You can generate an image with your prompt. Here is a sample prompt to generate the image:

High-energy street scene in a neon-lit Tokyo alley at night, where steam rises from food carts, and colorful neon signs illuminate the rain-slicked pavement.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. You can use stability.sd3-5-large-v1:0 as the model ID.

To get the image with a single command, I write the output JSON file to standard output and use the jq tool to extract the encoded image so that it can be decoded on the fly. The output is written in the img.png file.

Here is a sample of the AWS CLI command:

$ aws bedrock-runtime invoke-model \
   --model-id stability.sd3-5-large-v1:0 \
   --body "{\"text_prompts\":[{\"text\":\"High-energy street scene in a neon-lit Tokyo alley at night, where steam rises from food carts, and colorful neon signs illuminate the rain-slicked pavement.\",\"weight\":1}],\"cfg_scale\":0,\"steps\":10,\"seed\":0,\"width\":1024,\"height\":1024,\"samples\":1}" \
   --cli-binary-format raw-in-base64-out \
   --region us-west-2 \
/dev/stdout | jq -r '.images[0]' | base64 --decode > img.jpg

Here’s how you can use Stable Image Ultra 1.1 to include Stable Diffusion 3.5 Large in the model’s underlying architecture with the AWS SDK for Python (Boto3). This simple application interactively asks for a text-to-image prompt and then calls Amazon Bedrock to generate the image with stability.stable-image-ultra-v1:1 as the model ID.

import base64
import boto3
import json
import os

MODEL_ID = "stability.stable-image-ultra-v1:1"

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

print("Enter a prompt for the text-to-image model:")
prompt = input()

body = {
    "prompt": prompt,
    "mode": "text-to-image"
}
response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=json.dumps(body))

model_response = json.loads(response["body"].read())

base64_image_data = model_response["images"][0]

i, output_dir = 1, "output"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
while os.path.exists(os.path.join(output_dir, f"img_{i}.png")):
    i += 1

image_data = base64.b64decode(base64_image_data)

image_path = os.path.join(output_dir, f"img_{i}.png")
with open(image_path, "wb") as file:
    file.write(image_data)

print(f"The generated image has been saved to {image_path}")

The application writes the resulting image in an output directory that is created if not present. To not overwrite existing files, the code checks for existing files to find the first file name available with the img_<number>.png format.

To learn more, visit the Invoke API examples using AWS SDKs to build your applications to generate an image using various programming languages.

Interesting examples
Here are a few images created with Stable Diffusion 3.5 Large.

Prompt: Full-body university students working on a tech project with the words Stable Diffusion 3.5 in Amazon Bedrock, cheerful cursive typography font in the foreground.
Prompt: Photo of three potions: the first potion is blue with the label "MANA", the second potion is red with the label "HEALTH", the third potion is green with the label "POISON". Old apothecary.
Prompt: Photography, pink rose flowers in the twilight, glowing, tile houses in the background. Prompt: 3D animation scene of an adventurer traveling the world with his pet dog.

Now available
Stable Diffusion 3.5 Large model is generally available today in Amazon Bedrock in the US West (Oregon) AWS Region. Check the full Region list for future updates. To learn more, check out the Stability AI in Amazon Bedrock product page and the Amazon Bedrock Pricing page.

Give Stable Diffusion 3.5 Large a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy

New Amazon EC2 High Memory U7inh instance on HPE Server for large in-memory databases

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-high-memory-u7inh-instance-on-hpe-server-for-large-in-memory-databases/

Today we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) U7inh instance, a new addition to EC2 High Memory family, built in collaboration with Hewlett Packard Enterprise (HPE). Amazon EC2 U7inh instance runs on the 16-socket HPE Compute Scale-up Server 3200, and are built on the AWS Nitro System to deliver a fully integrated and managed experience consistent with other EC2 instances.

Powered by the fourth generation Intel® Xeon® Scalable processors (Sapphire Rapids), U7inh instance supports 32 TB of memory and 1920 vCPUs. This instance offers the highest compute performance, largest compute and memory size in the Amazon Web Services (AWS) Cloud for running large, mission-critical database workloads, like SAP HANA.

In May 2024, we launched U7i instances to support up to 896 vCPUs and up to 32 TB of memory, which our enterprise customers could use to successfully migrate their large mission-critical in-memory databases to AWS and benefit from the flexibility, scalability, reliability, and cost advantages that AWS offers.

As customers continue to scale their business applications, they wanted the performance combined with the additional CPUs and memory along with SAP certification to generate real-time business insights. Other customers that currently run on-premises with HPE servers have also asked how we can help them migrate to AWS to take advantage of cloud benefits while continuing to use HPE hardware.

Here are the detailed specs of new U7inh instance:

Instance name vCPUs Memory (DDR5) EBS bandwidth Network bandwidth
U7inh-32tb.480xlarge 1920 32,768 GiB 160 Gbps 200 Gbps

U7inh instance offers up to two times vCPUs and 1.6 times EBS bandwidth in a single instance, compared with the largest U7i instance. You can run your largest in-memory database workloads like SAP HANA or seamlessly migrate workloads running on HPE hardware to AWS.

U7inh instance supports Amazon Linux, Red Hat Enterprise Linux, and SUSE Enterprise Linux Server. Operating system support for SAP HANA workloads on High Memory instances include: SUSE Linux Enterprise Server 15 SP3 for SAP and above and Red Hat Enterprise Linux 8.6/9.0 for SAP and above.

U7inh instance is SAP certified to run Business Suite on HANA (SoH), Business Suite S/4HANA, Business Warehouse on HANA (BW), and SAP BW/4HANA in production environments. U7inh instance is also certified for scale-out SAP HANA OLTP workloads such as S/4HANA and customers can deploy up to four U7inh instance (128TB) in a cluster for even larger SAP HANA workloads.

To learn more about how to migrate, visit Migrating SAP HANA on AWS to an EC2 High Memory Instance in the SAP HANA on AWS Guides and AWS Launch Wizard for SAP in the AWS Launch Wizard User Guide.

Now available
Amazon EC2 U7inh instance is available in the US East (N. Virginia) and US West (Oregon) AWS Regions.

To learn more, visit the U7i instance product page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

And that’s a wrap!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/and-thats-a-wrap/

After 20 years, and 3283 posts adding up to 1,577,106 words I am wrapping up my time as the lead blogger on the AWS News Blog.

It has been a privilege to be able to “live in the future” and to get to learn and write about so many of our innovations over the last two decades: message queuing, storage, on-demand computing, serverless, and quantum computing to name just a few and to leave many others out. It has also been a privilege to be able to meet and to hear from so many of you that have faithfully read and (hopefully) learned from my content over the years. I treasure those interactions and your kind words, and I keep both in mind when I write.

Next for Jeff
I began my career as a builder. Over the years I have written tens of thousands of lines of assembly code (6502, Z80, and 68000), Visual Basic, and PHP, along with hundreds of thousands of lines of C. However, over the years I’ve progressively spent less time building and more time talking about building. As each new service and feature whizzed past my eyes I would reminiscence about days and decades past, when I could actually use these goodies to create something cool. I went from being a developer who could market, to a marketer who used to be able to develop. There’s absolutely nothing wrong with that, but I like to build. The medium could be code, 3D printing, LEGO bricks, electronics components, or even cardboard –creating and innovating is what motivates and sustains me.

With that as my driving force, my goal for the next step of my career is to invest more time focused on learning and using fewer things, building cool stuff, and creating fresh, developer-focused content as I do so. I’m still working to figure out the form that this will take, so stay tuned. I am also going to continue to make my weekly appearances at AWS OnAir (our Friday Twitch show), and I will continue to speak at AWS community events around the globe.

Next for the Blog
As for the AWS News Blog, it has long been backed by an awesome team, both visible and invisible. Here we are at the recent AWS re:Invent celebration of the blog’s 20th anniversary (photo courtesy of Liz Fuentes with edits by Channy Yun to add those who were otherwise occupied):

During the celebration I told the team that I look forward to celebrating the 30 year anniversary with them at re:Invent 2034.

Going forward, the team will continue to grow and the goal remains the same: to provide our customers with carefully chosen, high-quality information about the latest and most meaningful AWS launches. The blog is in great hands and this team will continue to keep you informed even as the AWS pace of innovation continues to accelerate.

Thanks Again
Once again I need to thank all of you for the very kind words and gestures over the years. Once in your life, if you work hard and get really lucky, you get a unique opportunity to do something that really and truly matters to people. And I have been lucky.

Jeff;

AWS Weekly Roundup: Amazon EC2 F2 instances, Amazon Bedrock Guardrails price reduction, Amazon SES update, and more (December 16, 2024)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-ec2-f2-instances-amazon-bedrock-guardrails-price-reduction-amazon-ses-update-and-more-december-16-2024/

The week after AWS re:Invent builds on the excitement and energy of the event and is a good time to learn more and understand how the recent announcements can help you solve your challenges. As usual, we have you covered with our top announcements of AWS re:Invent 2024 post.

You can now watch keynotes and sessions on the AWS Event YouTube channel. This year Andy Jassy, now President and CEO at Amazon, returned to re:Invent and shared some thoughts in these videos.

Drawing on experiences Amazon has had building distributed systems at massive scale, Werner Vogels, VP and CTO at Amazon, shared critical lessons and strategies he has learned for managing complex systems in his keynote.

Last week’s launches
Here are the launches that got my attention.

Amazon Elastic Compute Cloud (Amazon EC2) – A new generation of FPGA-powered instances (F2) is now available. In contrast to a purpose-built chip designed with a single function in mind and then hard-wired to implement it, a field programmable gate array (FPGA) can be programmed in the field, after it has been plugged in to a socket on a PC board. We’re also introducing Amazon EC2 High Memory U7i instances with 6TiB and 8TiB of memory. U7i instances are ideal to run large in-memory databases such as SAP HANA, Oracle, and SQL Server. Graviton-based 8th generation instances now support bandwidth configurations for Amazon VPC and Amazon EBS.

Amazon Bedrock Guardrails – We are reducing pricing by up to 85% to help you implement safeguards for your generative AI applications. Also, we’re adding multilingual capabilities with support for Spanish and French languages.

Amazon Simple Email Services (SES) – Now offers Global Endpoints for multi-region sending resilience and announces the availability of Deterministic Easy DKIM (DEED), a new form of global identity which simplifies the use of DomainKeys Identified Mail (DKIM) management.

AWS CloudFormation – An enhanced version of the AWS Secrets Manager transform introducing automatic AWS Lambda upgrades.

Amazon Lex – Launches new multilingual streaming speech recognition models that enhance recognition accuracy through two specialized groupings: a European-based model (for Portuguese, Catalan, French, Italian, German, and Spanish) and a Asia Pacific-based model (for Chinese, Korean, and Japanese).

Amazon Connect – Now supports push notifications for mobile chat on iOS and Android devices. In this way, you can be proactively notified as soon as there is a new message from an agent or chatbot, even when not actively chatting. You can now also configure holidays and other variances to your contact center hours of operation.

AWS Security Hub – Now supports automated security checks aligned to the Payment Card Industry Data Security Standard (PCI DSS) v4.0.1, a compliance framework that provides a set of rules and guidelines for safely handling credit and debit card information.

AWS Resource ExplorerSupports 59 new resource types including Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Kendra, AWS Identity and Access Management (IAM) Access Analyzer, and Amazon SageMaker.

Amazon SageMaker AI – Inference optimized Amazon EC2 G6e instances (powered by NVIDIA L40S Tensor Core GPUs) and P5e (powered by NVIDIA H200 Tensor Core GPUs) are now available on Amazon SageMaker.

Amazon Redshift – Now supports automatically and incrementally refreshable materialized views on tables in a zero-ETL integration. Previously, in this case, you had to run a full refresh.

AWS Toolkit for Visual Studio Code – Now includes Amazon CloudWatch Logs Live Tail, an interactive log streaming and analytics capability that provides real-time visibility into your logs and makes it easier to develop and troubleshoot applications.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

Build a managed transactional data lake with Amazon S3 Tables – Just introduced at re:Invent 2024, Amazon S3 Tables is the first cloud object store with built-in Apache Iceberg support and the easiest way to store tabular data at scale. This post on the AWS Storage Blog provides an overview of S3 Tables and an example of how to build a transactional data lake with S3 Tables using Apache Spark on Amazon EMR.

Introducing Cross-Region Connectivity for AWS PrivateLink – More information on this recent launch that can be used to share and access Amazon Virtual Private Cloud (Amazon VPC) endpoint services across different AWS Regions.

Marc Brooker, VP/Distinguished Engineer at AWS, shared on his personal blog a few posts about what Amazon Aurora DSQL is, how it works, and how to make the best use of it:

That’s all for this week. Check back next Monday for another Weekly Roundup!

Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Now Available – Second-Generation FPGA-Powered Amazon EC2 instances (F2)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-second-generation-fpga-powered-amazon-ec2-instances-f2/

Equipped with up to eight AMD Field-Programmable Gate Arrays (FPGAs), AMD EPYC (Milan) processors with up to 192 cores, High Bandwidth Memory (HBM), up to 8 TiB of SSD-based instance storage, and up to 2 TiB of memory, the new F2 instances are available in two sizes, and are ready to accelerate your genomics, multimedia processing, big data, satellite communication, networking, silicon simulation, and live video workloads.

A Quick FPGA Recap
Here’s how I explained the FPGA model when we previewed the first generation of FPGA-powered Amazon Elastic Compute Cloud (Amazon EC2) instances

One of the more interesting routes to a custom, hardware-based solution is known as a Field Programmable Gate Array, or FPGA. In contrast to a purpose-built chip which is designed with a single function in mind and then hard-wired to implement it, an FPGA is more flexible. It can be programmed in the field, after it has been plugged in to a socket on a PC board. Each FPGA includes a fixed, finite number of simple logic gates. Programming an FPGA is “simply” a matter of connecting them up to create the desired logical functions (AND, OR, XOR, and so forth) or storage elements (flip-flops and shift registers). Unlike a CPU which is essentially serial (with a few parallel elements) and has fixed-size instructions and data paths (typically 32 or 64 bit), the FPGA can be programmed to perform many operations in parallel, and the operations themselves can be of almost any width, large or small.

Since that launch, AWS customers have used F1 instances to host many different types of applications and services. With a newer FPGA, more processing power, and more memory bandwidth, the new F2 instances are an even better host for highly parallelizable, compute-intensive workloads.

Each of the AMD Virtex UltraScale+ HBM VU47P FPGAs has 2.85 million system logic cells and 9,024 DSP slices (up to 28 TOPS of DSP compute performance when processing INT8 values). The FPGA Accelerator Card associated with each F2 instance provides 16 GiB of High Bandwidth Memory and 64 GiB of DDR4 memory per FPGA.

Inside the F2
F2 instances are powered by 3rd generation AMD EPYC (Milan) processors. In comparison to F1 instances, they offer up to 3x as many processor cores, up to twice as much system memory and NVMe storage, and up to 4x the network bandwidth. Each FPGA comes with 16 GiB High Bandwidth Memory (HBM) with up to 460 GiB/s bandwidth. Here are the instance sizes and specs:

Instance Name vCPUs
FPGAs
FPGA Memory
HBM / DDR4
Instance Memory
NVMe Storage
EBS Bandwidth
Network Bandwidth
f2.12xlarge 48 2 32 GiB /
128 GiB
512 GiB 1900 GiB
(2x 950 GiB)
15 Gbps 25 Gbps
f2.48xlarge 192 8 128 GiB /
512 GiB
2,048 GiB 7600 GiB
(8x 950 GiB)
60 Gbps 100 Gbps

The high-end f2.48xlarge instance supports the AWS Cloud Digital Interface (CDI) to reliably transport uncompressed live video between applications, with instance-to-instance latency as low as 8 milliseconds.

Building FPGA Applications
The AWS EC2 FPGA Development Kit contains the tools that you will use to develop, simulate, debug, compile, and run your hardware-accelerated FPGA applications. You can launch the kit’s FPGA Developer AMI on a memory-optimized or compute-optimized instance for development and simulation, then use an F2 instance for final debugging and testing.

The tools included in the developer kit support a variety of development paradigms, tools, accelerator languages, and debugging options. Regardless of your choice, you will ultimately create an Amazon FPGA Image (AFI) which contains your custom acceleration logic and the AWS Shell which implements access to the FPGA memory, PCIe bus, interrupts, and external peripherals. You can deploy AFIs to as many F2 instances as desired, share with other AWS accounts or publish on AWS Marketplace.

If you have already created an application that runs on F1 instances, you will need to update your development environment to use the latest AMD tools, then rebuild and validate before upgrading to F2 instances.

FPGA Instances in Action
Here are some cool examples of how F1 and F2 instances can support unique and highly demanding workloads:

Genomics – Multinational pharmaceutical and biotechnology company AstraZeneca used thousands of F1 instances to build the world’s fastest genomics pipeline, able to process over 400K whole genome samples in under two months. They will adopt Illumina DRAGEN for F2 to realize better performance at a lower cost, while accelerating disease discovery, diagnosis, and treatment.

Satellite Communication – Satellite operators are moving from inflexible and expensive physical infrastructure (modulators, demodulators, combiners, splitters, and so forth) toward agile, software-defined, FPGA-powered solutions. Using the digital signal processor (DSP) elements on the FPGA, these solutions can be reconfigured in the field to support new waveforms and to meet changing requirements. Key F2 features such as support for up to 8 FPGAs per instance, generous amounts of network bandwidth, and support for the Data Plan Development Kit (DPDK) using Virtual Ethernet can be used to support processing of multiple, complex waveforms in parallel.

AnalyticsNeuroBlade‘s SQL Processing Unit (SPU) integrates with Presto, Apache Spark, and other open source query engines, delivering faster query processing and market-leading query throughput efficiency when run on F2 instances.

Things to Know
Here are a couple of final things that you should know about the F2 instances:

Regions – F2 instances are available today in the US East (N. Virginia) and Europe (London) AWS Regions, with plans to extend availability to additional regions over time.

Operating Systems – F2 instances are Linux-only.

Purchasing Options – F2 instances are available in On-Demand, SpotSavings Plan, Dedicated Instance, and Dedicated Host form.

Jeff;

See what’s possible in Zabbix 7.2!

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/see-whats-possible-in-zabbix-7-2/29373/

Zabbix 7.2 is out now and available for download! The latest Zabbix major release introduces a range of new visualization features and widgets while adding a variety of updated monitoring features to support new use cases and scenarios. Read more to find out about the latest Zabbix features and improvements.

Top items widget

The previously deprecated Data overview widget has been converted to the new Top items widget. The Top items widget enables item selection via item patterns. The selected items are then displayed for hosts based on host and host group filters. This means that users are not limited to explicitly selected items or hosts, which enables dynamically matching items in rapidly changing environments.

 

Items can be matched using pattern matching in the Top items widget

The widget supports Bar, Indicator, Sparkline, and As-is value visualization as well as defining value thresholds, enabling value highlighting for values exceeding the defined threshold.

Top items widget supports As-is, Bar, Indicator, and Sparkline value visualization

Host card widget

The Host card widget adds the ability to display host information on Zabbix dashboards. The widget configuration supports selecting and ordering fields containing a variety of information about the host.

The Host card widget allows for selecting and ordering host information fields

The widget also supports a multi-column layout. Host information can be displayed in 1-3 columns, depending on how the widget is placed on the dashboard.

The host card widget layout can be customized by resizing the widget

Sparkline chart

Sparkline charts have been introduced in Zabbix 7.2 as an additional visualization option for existing widgets. The goal of a sparkline chart is to provide additional over-time context when viewing collected values in widgets, such as the Item value widget. Sparkline charts are supported in Top items, Top hosts, and Item value widgets.

Sparkline charts can be displayed in Item value, Top Items, and Top hosts widgets

NVIDIA GPU monitoring template and Zabbix agent 2 plugin

Starting with Zabbix release 7.2.1, the newly released NVIDIA GPU monitoring template and Zabbix agent 2 plugin will allow agent 2 to automatically discover NVIDIA GPUs on Windows and Linux environments and start monitoring items such as GPU temperature, power usage, memory, frequency, and much more. The list of discovered and supported metrics may vary depending on the GPU model.

GPU metrics can be automatically discovered and displayed on Zabbix dashboards

NETCONF monitoring with SSH item subsystem support

SSH subsystems are a set of remote commands predefined on the monitored endpoint. A common use case of an SSH subsystem is the NETCONF subsystem, used to manage network device configuration.

Zabbix 7.2 introduces a new parameter for the SSH monitoring item  –  ssh.run[unique short description,<ip>,<port>,<encoding>,<ssh options>,<subsystem>]

The subsystem parameter is used to specify an SSH subsystem and can be used to execute commands via SSH subsystems such as NETCONF or SFTP.

New and updated macros

  • New {*.TIMESTAMP} macros can be used to populate alerts with the UNIXTIME value of problem detection, recovery, and update timestamps.
  • The {EVENT.UPDATE.ACTIONJSON} macro resolves to a JSON array containing details of the actions performed during a problem update. This JSON value can be later used in integrations or scripts.
  • The {SERVICE.ID} macro resolves to the numeric ID of the service that triggered the action.
  • The {HOST.PORT} macro can now be used in the same locations as the {HOST.CONN} macro.
  • The new {FUNCTION.VALUE<1-9>} and {FUNCTION.RECOVERY.VALUE<1-9>} macros can be used in expression macros to display a value of the Nth item-based function in the trigger expression. This can be used to display values in map labels or graph names.

VMware monitoring improvements

VMware monitoring has received multiple improvements and fixes in Zabbix 7.2:

  • In addition to the previously supported VMware hypervisor discovery workflow, the template  VMware Hypervisor can now be manually linked to a stand-alone hypervisor host.
  • There is now a new item used to monitor the VMware virtual machine hypervisor maintenance status: vmware.vm.hv.maintenance[url,uuid]
  • VMware event collection has been improved by adding the support of pagination. This reduces memory consumption resulting from a large number of collected VMware events.

New and updated templates

Zabbix 7.2 introduces multiple new templates:

  • A variety of templates for LAMP stack monitoring by Zabbix agent active
  • NVIDIA GPU
  • Juniper MX series
  • Huawei OceanStor V6 Dorado
  • Nutanix Prism Element
  • Website certificate by Zabbix agent 2 active

The following existing templates have also received fixes and updates:

  • Dell iDrac and PowerEdge updated to use SNMP walk items
  • Proxmox VE by HTTP – new disk space usage items/triggers
  • MSSQL by ODBC performance counter query fixes
  • Linux and Nextcloud – removed unnecessary discard unchanged preprocessing from LLD rules
  • Microsoft 365 reports by HTTP description fixes

 

Additional changes and improvements

Additional changes and improvements introduced in Zabbix 7.2:

  • Added support for CP_SPIN CPU state on OpenBSD
  • Implemented new column configuration options in the Top hosts widget and support for binary item display
  • Added support for LLD Macro {#UNIT.SERVICETYPE} in systemd.unit.discovery for Zabbix agent 2
  • Updated maximum supported TimescaleDB version to 2.17
  • Updated maximum supported PostgreSQL version to 17
  • Added PubkeyAcceptedKeyTypes SSH public key algorithm configuration option
  • Items now become unsupported when there are no pollers
  • Removed support for Oracle DB
  • Removed the dependent item count limit
  • Added support of logarithmic Y-axis scaling in graphs
  • Increased the max number of rows for some widgets, such as Top hosts
  • Enabled usage of the mediatype.get method for users with the User role with a limited field scope
  • Added the ability to assign override host (Widget, Dashboard) for graph widget data sets
  • Implemented automatic selection of the first element of a broadcast-capable widget
  • Implemented a new filter in media type list view to filter out media types by their usage in action

Download and install Zabbix 7.2

You can find instructions and download the new version on the download page .

In order to  upgrade to Zabbix 7.2  you need to upgrade your repository package and download and install the new Zabbix component packages (Zabbix server, proxy, frontend, and other Zabbix components). When you start the Zabbix server, an automatic database schema upgrade will be performed. Zabbix agents are backward compatible, so installing the new agent versions is not required. Agent upgrade can be performed at a later time.  

You can find detailed step-by-step upgrade instructions on our Upgrade procedure page.  

Learn about new features and changes introduced in Zabbix 7.2 by visiting  the “What’s new in Zabbix 7.2” page .

A detailed description of the new features can be found in the “What’s new” documentation section .

Take a look at the release notes  to see the full list of new features and improvements. 

 

The post See what’s possible in Zabbix 7.2! appeared first on Zabbix Blog.

Introducing Buy with AWS: an accelerated procurement experience on AWS Partner sites, powered by AWS Marketplace

Post Syndicated from Prasad Rao original https://aws.amazon.com/blogs/aws/introducing-buy-with-aws-an-accelerated-procurement-experience-on-aws-partner-sites-powered-by-aws-marketplace/

Today, we are announcing Buy with AWS, a new way to discover and purchase solutions available in AWS Marketplace from AWS Partner sites. You can use Buy with AWS to accelerate and streamline your product procurement process on websites outside of Amazon Web Services (AWS). This feature provides you the ability to find, try, and buy solutions from Partner websites using your AWS account

AWS Marketplace is a curated digital store for you to find, buy, deploy, and manage cloud solutions from Partners. Buy with AWS is another step towards AWS Marketplace making it easy for you to find and procure the right Partner solutions, when and where you need them. You can conveniently find and procure solutions in AWS Marketplace, through integrated AWS service consoles, and now on Partner websites.

Accelerate cloud solution discovery and evaluation

You can now discover solutions from Partners available for purchase through AWS Marketplace as you explore solutions on the web beyond AWS.

Look for products that are “Available in AWS Marketplace” when browsing on Partner sites, then accelerate your evaluation process with fast access to free trials, demo requests, and inquiries for custom pricing.

For example, I want to evaluate Wiz to see how it can help with my cloud security requirements. While browsing the Wiz website, I come across a page where I see “Connect Wiz with Amazon Web Services (AWS)”.

Wiz webpage featuring Buy With AWS

I choose Try with AWS. It asks me to sign in to my AWS account if I’m not signed in already. I’m then presented with a Wiz and AWS co-branded page for me to sign up for the free trial.

Wiz and AWS co-branded page to sign up for free trial using Buy with AWS through AWS Marketplace

The discovery experience that you see will vary depending on type of the Partner website you’re shopping from. Wiz is an example of how Buy with AWS can be implemented by an independent software vendor (ISV). Now, let’s look at an example of an AWS Marketplace Channel Partner, or reseller, who operates a storefront of their own.

I browse to the Bytes storefront with product listings from AWS Marketplace. I have the option to filter and search from the curated product listings, which are available in AWS Marketplace, on the Bytes site.

Bytes storefront with product listings from AWS Marketplace

I choose View Details for Fortinet and see an option to Request Private Offer from AWS.

Bytes storefront with option to Request Private Offer for Fortinet from AWS Marketplace

As you can tell, on a Channel Partner site, you can browse curated product listings available in AWS Marketplace, filter products, and request custom pricing using your AWS account directly from their website.

Streamline product procurement on AWS Partner sites
I had a seamless experience using Buy with AWS to access a free trial for Wiz and browse through the Bytes storefront to request a private offer.

Now I want to try Databricks for one of the applications I’m building. I sign up for a Databricks trial through their website.

Database homepage after login with option to Upgrade

I chose Upgrade and see Databricks is available in AWS Marketplace, which gives me the option to Buy with AWS.

Option to upgrade to Databricks premium using Buy with AWS feature of AWS marketplace

I choose Buy with AWS, and after I sign in to my AWS account, I land on a Databricks and AWS Marketplace co-branded procurement page.

Databricks and AWS co-branded page to subscribe using Buy with AWS

I complete the purchase on the co-branded procurement page and continue to set up my Databricks account.

Databricks and AWS co-branded page after subscribing using Buy with AWS

As you can tell, I didn’t have to navigate the challenge of managing procurement processes for multiple vendors. I also didn’t have to speak with a sales representative or onboard a new vendor in my billing system, which would have required multiple approvals and delayed the overall process.

Access centralized billing and benefits through AWS Marketplace
Because Buy with AWS purchases are transacted through and managed in AWS Marketplace, you also benefit from the post-purchase experience of AWS Marketplace, including consolidated AWS billing, centralized subscription management, and access to cost optimization tools.

For example, through the AWS Billing and Cost Management console, I can centrally manage all my AWS purchases, including Buy with AWS purchases, from one dashboard. I can easily access and process invoices for all of my organization’s AWS purchases. I also need to have valid AWS Identity and Access Management (IAM) permissions to manage subscriptions and make a purchase through AWS Marketplace.

AWS Marketplace not only simplifies my billing but also helps in maintaining governance over spending by helping me manage purchasing authority and subscription access for my organization with centralized visibility and controls. I can manage my budget with pricing flexibility, cost transparency, and AWS cost management tools.

Buy with AWS for Partners
Buy with AWS enables Partners who sell or resell products in AWS Marketplace to create new solution discovery and buying experiences for customers on their own websites. By adding call to action (CTA) buttons to their websites such as “Buy with AWS”, “Try free with AWS”, “Request private offer”, and “Request demo”, Partners can help accelerate product evaluation and the path-to-purchase for customers.

By integrating AWS Marketplace APIs, Partners can display products from the AWS Marketplace catalog, allow customers to sort and filter products, and streamline private offers. Partners implementing Buy with AWS can access AWS Marketplace creative and messaging resources for guidance on building their own web experiences. Partners who implement Buy with AWS can access metrics for insights into engagement and conversion performance.

The Buy with AWS onboarding guide in the AWS Marketplace Management Portal details how Partners can get started.

Learn more
Visit the Buy with AWS page to learn more and explore Partner sites that offer Buy with AWS.

To learn more about selling or reselling products using Buy with AWS on your website, visit:

Prasad

Accelerate foundation model training and fine-tuning with new Amazon SageMaker HyperPod recipes

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/accelerate-foundation-model-training-and-fine-tuning-with-new-amazon-sagemaker-hyperpod-recipes/

Today, we’re announcing the general availability of Amazon SageMaker HyperPod recipes to help data scientists and developers of all skill sets to get started training and fine-tuning foundation models (FMs) in minutes with state-of-the-art performance. They can now access optimized recipes for training and fine-tuning popular publicly available FMs such as Llama 3.1 405B, Llama 3.2 90B, or Mixtral 8x22B.

At AWS re:Invent 2023, we introduced SageMaker HyperPod to reduce time to train FMs by up to 40 percent and scale across more than a thousand compute resources in parallel with preconfigured distributed training libraries. With SageMaker HyperPod, you can find the required accelerated compute resources for training, create the most optimal training plans, and run training workloads across different blocks of capacity based on the availability of compute resources.

SageMaker HyperPod recipes include a training stack tested by AWS, removing tedious work experimenting with different model configurations, eliminating weeks of iterative evaluation and testing. The recipes automate several critical steps, such as loading training datasets, applying distributed training techniques, automating checkpoints for faster recovery from faults, and managing the end-to-end training loop.

With a simple recipe change, you can seamlessly switch between GPU- or Trainium-based instances to further optimize training performance and reduce costs. You can easily run workloads in production on SageMaker HyperPod or SageMaker training jobs.

SageMaker HyperPod recipes in action
To get started, visit the SageMaker HyperPod recipes GitHub repository to browse training recipes for popular publicly available FMs.

You only need to edit straightforward recipe parameters to specify an instance type and the location of your dataset in cluster configuration, then run the recipe with a single line command to achieve state-of-art performance.

You need to edit the recipe config.yaml file to specify the model and cluster type after cloning the repository.

$ git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
$ cd sagemaker-hyperpod-recipes
$ pip3 install -r requirements.txt.
$ cd ./recipes_collections
$ vim config.yaml

The recipes support SageMaker HyperPod with Slurm, SageMaker HyperPod with Amazon Elastic Kubernetes Service (Amazon EKS), and SageMaker training jobs. For example, you can set up a cluster type (Slurm orchestrator), a model name (Meta Llama 3.1 405B language model), an instance type (ml.p5.48xlarge), and your data locations, such as storing the training data, results, logs, and so on.

defaults:
- cluster: slurm # support: slurm / k8s / sm_jobs
- recipes: fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora # name of model to be trained
debug: False # set to True to debug the launcher configuration
instance_type: ml.p5.48xlarge # or other supported cluster instances
base_results_dir: # Location(s) to store the results, checkpoints, logs etc.

You can optionally adjust model-specific training parameters in this YAML file, which outlines the optimal configuration, including the number of accelerator devices, instance type, training precision, parallelization and sharding techniques, the optimizer, and logging to monitor experiments through TensorBoard.

run:
  name: llama-405b
  results_dir: ${base_results_dir}/${.name}
  time_limit: "6-00:00:00"
restore_from_path: null
trainer:
  devices: 8
  num_nodes: 2
  accelerator: gpu
  precision: bf16
  max_steps: 50
  log_every_n_steps: 10
  ...
exp_manager:
  exp_dir: # location for TensorBoard logging
  name: helloworld 
  create_tensorboard_logger: True
  create_checkpoint_callback: True
  checkpoint_callback_params:
    ...
  auto_checkpoint: True # for automated checkpointing
use_smp: True 
distributed_backend: smddp # optimized collectives
# Start training from pretrained model
model:
  model_type: llama_v3
  train_batch_size: 4
  tensor_model_parallel_degree: 1
  expert_model_parallel_degree: 1
  # other model-specific params

To run this recipe in SageMaker HyperPod with Slurm, you must prepare the SageMaker HyperPod cluster following the cluster setup instruction.

Then, connect to the SageMaker HyperPod head node, access the Slurm controller, and copy the edited recipe. Next, you run a helper file to generate a Slurm submission script for the job that you can use for a dry run to inspect the content before starting the training job.

$ python3 main.py --config-path recipes_collection --config-name=config

After training completion, the trained model is automatically saved to your assigned data location.

To run this recipe on SageMaker HyperPod with Amazon EKS, clone the recipe from the GitHub repository, install the requirements, and edit the recipe (cluster: k8s) on your laptop. Then, create a link between your laptop and running the EKS cluster and subsequently use the HyperPod Command Line Interface (CLI) to run the recipe.

$ hyperpod start-job –recipe fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora \
--persistent-volume-claims fsx-claim:data \
--override-parameters \
'{
  "recipes.run.name": "hf-llama3-405b-seq8k-gpu-qlora",
  "recipes.exp_manager.exp_dir": "/data/<your_exp_dir>",
  "cluster": "k8s",
  "cluster_type": "k8s",
  "container": "658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121",
  "recipes.model.data.train_dir": "<your_train_data_dir>",
  "recipes.model.data.val_dir": "<your_val_data_dir>",
}'

You can also run recipe on SageMaker training jobs using SageMaker Python SDK. The following example is running PyTorch training scripts on SageMaker training jobs with overriding training recipes.

...
recipe_overrides = {
    "run": {
        "results_dir": "/opt/ml/model",
    },
    "exp_manager": {
        "exp_dir": "",
        "explicit_log_dir": "/opt/ml/output/tensorboard",
        "checkpoint_dir": "/opt/ml/checkpoints",
    },   
    "model": {
        "data": {
            "train_dir": "/opt/ml/input/data/train",
            "val_dir": "/opt/ml/input/data/val",
        },
    },
}
pytorch_estimator = PyTorch(
           output_path=<output_path>,
           base_job_name=f"llama-recipe",
           role=<role>,
           instance_type="p5.48xlarge",
           training_recipe="fine-tuning/llama/hf_llama3_405b_seq8k_gpu_qlora",
           recipe_overrides=recipe_overrides,
           sagemaker_session=sagemaker_session,
           tensorboard_output_config=tensorboard_output_config,
)
...

As training progresses, the model checkpoints are stored on Amazon Simple Storage Service (Amazon S3) with the fully automated checkpointing capability, enabling faster recovery from training faults and instance restarts.

Now available
Amazon SageMaker HyperPod recipes are now available in the SageMaker HyperPod recipes GitHub repository. To learn more, visit the SageMaker HyperPod product page and the Amazon SageMaker AI Developer Guide.

Give SageMaker HyperPod recipes a try and send feedback to AWS re:Post for SageMaker or through your usual AWS Support contacts.

Channy

AWS Education Equity Initiative: Applying generative AI to educate the next wave of innovators

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-education-equity-initiative-applying-generative-ai-to-educate-the-next-wave-of-innovators/

Building on the work that we and our partners have been doing for many years, Amazon is committing up to $100 million in cloud technology and technical resources to help existing, dedicated learning organizations reach more learners by creating new and innovative digital learning solutions, all as part of the AWS Education Equity Initiative.

The Work So Far
AWS and Amazon have a long-standing commitment to learning and education. Here’s a sampling of what we have already done:

AWS AI & ML Scholarship Program – This program has awarded $28 million in scholarships to approximately 6000 students.

Machine Learning University – MLU offers a free program helping community colleges and Historically Black Colleges and Universities (HBCUs) teach data management, artificial intelligence, and machine learning concepts. The program is designed to address opportunity gaps by supporting students who are historically underserved and underrepresented in technology disciplines.

Amazon Future Engineer – Since 2021, up to $46 million in scholarships has been awarded to 1150 students through this program. In the past year, more than 2.1 million students received over 17 million hours of STEM education, literacy, and career exploration courses through this and other Amazon philanthropic education programs in the United States. I was able to speak to one such session last year and it was an amazing experience:

Free Cloud Training – In late 2020 we set a goal of helping 29 million people grow their tech skills with free cloud computing training by 2025. We worked hard and met that target a year ahead of time!

There’s More To Do
Despite all of this work and progress, there’s still more to be done. The future is definitely not evenly distributed: over half a billion students cannot be reached by digital learning today.

We believe that Generative AI can amplify the good work that socially-minded edtech organizations, non-profits, and governments are already doing. Our goal is to empower them to build new and innovative digital learning systems that can amplify their work and allow them to reach a bigger audience.

With the launch of the AWS Education Equity Initiative, we want to help pave the way for the next generation of technology pioneers as they build powerful tools, train foundation models at scale, and create AI-powered teaching assistants.

We are committing up to $100 million in cloud technology and comprehensive technical advising over the next five years. The awardees will have access to the portfolio of AWS services and technical expertise so that they can build and scale learning management systems, mobile apps, chatbots, and other digital learning tools. As part of the application process, applicants will be asked to demonstrate how their proposed solution will benefit students from underserved and underrepresented communities.

As I mentioned earlier, our partners are already doing a lot of great work in this area. For example:

Code.org has already used AWS to scale their free computer science curriculum to millions of students in more than 100 countries. With this initiative, they will expand their use of Amazon Bedrock to provide an automated assessment of student projects, freeing up educator time that can be use for individual instruction and tailored learning.

Rocket Learning focuses on early childhood education in India. They will use Amazon Q in QuickSight to enhance learning outcomes for more than three million children.

I’m super excited about this initiative and look forward to seeing how it will help to create and educate the next generation of technology pioneers!

Jeff;

Solve complex problems with new scenario analysis capability in Amazon Q in QuickSight

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/solve-complex-problems-with-new-scenario-analysis-capability-in-amazon-q-in-quicksight/

Today, we announced a new capability of Amazon Q in QuickSight that helps users perform scenario analyses to find answers to complex problems quickly. This AI-assisted data analysis experience helps business users find answers to complex problems by guiding them step-by-step through in-depth data analysis—suggesting analytical approaches, automatically analyzing data, and summarizing findings with suggested actions—using natural language prompts. This new capability eliminates hours of tedious and error-prone manual work traditionally required to perform analyses using spreadsheets or other alternatives. In fact, Amazon Q in QuickSight enables business users to perform complex scenario analysis up to 10x faster than spreadsheets. This capability expands upon existing data Q&A capabilities of Amazon QuickSight so business professionals can start their analysis by simply asking a question.

How it works
Business users are often faced with complex questions that have traditionally required specialized training and days or weeks of time analyzing data in spreadsheets or other tools to address. For example, let’s say you’re a franchisee with multiple locations to manage. You might use this new capability in Amazon Q in QuickSight to ask, “How can I help our new Chicago store perform as well as the flagship store in New York?” Using an agentic approach, Amazon Q would then suggest analytical approaches needed to address the underlying business goal, automatically analyze data, and present results complete with visualizations and suggested actions. You can conduct this multistep analysis in an expansive analysis canvas, giving you the flexibility to make changes, explore multiple analysis paths simultaneously, and adapt to situations over time.

This new analysis experience is part of Amazon QuickSight meaning it can read from QuickSight dashboards which connect to sources such as Amazon Athena, Amazon Aurora, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Amazon OpenSearch Service. Specifically, this new experience is part of Amazon Q in QuickSight, which allows it to seamlessly integrate with other generative business intelligence (BI) capabilities such as data Q&A. You can also upload either a .csv or a single-table, single-sheet .xlsx file to incorporate into your analysis.

Here’s a visual walkthrough of this new analysis experience in Amazon Q in QuickSight.

I’m planning a customer event, and I’ve received an Excel spreadsheet of all who’ve registered to attend the event. I want to learn more about the attendees, so I analyze the spreadsheet and ask a few questions. I start by describing what I want to explore.

I upload the spreadsheet to start my analysis. Firstly, I want to understand how many people have registered for the event.

To design an agenda that’s suitable for the audience, I want to understand the various roles that will be attending. I select on the + icon to add a new block for asking a question following along the thread from the previous block.

I can continue to ask more questions. However, there are suggested questions for analyzing my data even further, and I now select one of these suggested questions. I want to increase marketing efforts at companies that don’t currently have a lot of attendees in this case, companies with fewer than two attendees.

Amazon Q executes the required analysis and keeps me updated of the progress. Step 1 of the process identifies companies that have fewer than two attendees and lists them.

Step 2 gives an estimate of how many more attendees I might get from each company if marketing efforts are increased.

In Step 3 I can see the potential increase in total attendees (including the percentage increase) in line with the increase in marketing efforts.

Lastly, Step 4 goes even further to highlight companies I should prioritize for these increased marketing efforts.

To increase the potential number of attendees even more, I wanted to change the analysis to identify companies with fewer than three attendees instead of two attendees. I choose the AI sparkle icon in the upper right to launch a modal that I then use to provide more context and make specific changes to the previous result.


This change resulted in new projections, and I can choose to consider them for my marketing efforts or keep to the previous projections.


Now available
Amazon Q in QuickSight Pro users can use this new capability in preview in the following AWS Regions at launch: US East (N. Virginia) and US West (Oregon). Get started with a free 30-day trial of QuickSight today. To learn more, visit the Amazon QuickSight User Guide. You can submit your questions to AWS re:Post for Amazon QuickSight, or through your usual AWS Support contacts.

Veliswa.

Use Amazon Q Developer to build ML models in Amazon SageMaker Canvas

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/use-amazon-q-developer-to-build-ml-models-in-amazon-sagemaker-canvas/

As a data scientist, I’ve experienced firsthand the challenges of making machine learning (ML) accessible to business analysts, marketing analysts, data analysts, and data engineers who are experts in their domains without ML experience. That’s why I’m particularly excited about today’s Amazon Web Services (AWS) announcement that Amazon Q Developer is now available in Amazon SageMaker Canvas. What catches my attention is how Amazon Q Developer helps connect ML expertise with business needs, making ML more accessible across organizations.

Amazon Q Developer helps domain experts build accurate, production-quality ML models through natural language interactions, even if they don’t have ML expertise. Amazon Q Developer guides these users by breaking down their business problems and analyzing their data to recommend step-by-step guidance for building custom ML models. It transforms users’ data to remove anomalies, and builds and evaluates custom ML models to recommend the best one, while providing users control and visibility into every step of the guided ML workflow. This empowers organizations to innovate faster with reduced time to market. It also reduces their reliance on ML experts so their specialists can focus on more complex technical challenges.

For example, a marketing analyst can state, “I want to predict home sales prices using home characteristics and past sales data”, and Amazon Q Developer will translate this into a set of ML steps, analyzing relevant customer data, building multiple models, and recommending the best approach.

Let’s see it in action
To start using Amazon Q Developer, I follow the Getting started with using Amazon SageMaker Canvas guide to launch the Canvas application. In this demo, I use natural language instructions to create a model to predict house prices for marketing and finance teams. From the SageMaker Canvas page, I select Amazon Q and then choose Start a new conversation.

In the new conversation I write:

I am an analyst and need to predict house prices for my marketing and finance teams.

Next, Amazon Q Developer explains the problem and recommends the appropriate ML model type. It also outlines the solution requirements, including the necessary dataset characteristics. Amazon Q Developer then asks if I want to upload my dataset or I want to choose a target column. I select it to upload my dataset.

In the next step, Amazon Q Developer lists the dataset requirements, which include relevant information about houses, current house prices, and the target variable for the regression model. It then recommended next steps, including: I want to upload my dataset, Select an existing dataset, Create a new dataset or I want to choose a target column. For this demo, I’ll use the canvas-sample-housing.csv sample dataset as my existing dataset.

select_an_existing_dataset

After selecting and loading the dataset, Amazon Q Developer analyzes it and suggests median_house_value as the target column for the regression model. I accept by selecting I would like to predict the “median_house_value” column. Moving on to the next step, Amazon Q Developer details which dataset features (such as “location”, “housing_median_age”, and “total_rooms”) it will use to predict the median_house_value.

Before moving forward with model training, I ask about the data quality, because without good data we can’t build a reliable model. Amazon Q Developer responds with quality insights for my entire dataset.

I can ask specific questions about individual features and their distributions to better understand the data quality.

columns in dataset

To my surprise, through the previous question, I discovered that the “households” column has a wide variation between extreme values, which could affect the model’s prediction accuracy. Therefore, I ask Amazon Q Developer to fix this outlier problem.

After the transformation is done, I can ask what steps Amazon Q Developer followed to make this change. Behind the scenes, Amazon Q Developer applies advanced data preparation steps using SageMaker Canvas data preparation capabilities, which I can review and see the steps so that I can visualize and replicate the process to get the final, prepared dataset for training the model.

After reviewing the data preparation steps, I select Launch my training job.

launch training job

After the training job is launched, I can see its progress in the conversation, and the datasets created.

As a data scientist, I particularly appreciate that, with Amazon Q Developer, Ican see detailed metrics such as the confusion matrix and precision-recall scores for classification models and root mean square error (RMSE) for regression models. These are crucial elements I always look for when evaluating model performance and making data-driven decisions, and it’s refreshing to see them presented in a way that’s accessible to nontechnical users to build trust and enable proper governance while maintaining the depth that technical teams need.

You can access these metrics by selecting the new model from My Models or from the Amazon Q conversation menu:

  • Overview – This tab shows the Column impact analysis. In this case, median_income emerges as the primary factor influencing my model.
  • Scoring – This tab provides model accuracy insights, including RMSE metrics.
  • Advanced metrics – This tab displays the detailed Metrics table, Residuals and Error density for in-depth model evaluation.

Analyze My Model

After reviewing these metrics and validating the model’s performance, I can move to the final stages of the ML workflow:

  • Predictions – I can test my model using the Predictions tab to validate its real-world performance.
  • Deployment – I can create an endpoint deployment to make my model available for production use.

This simplifies the deployment process, a step that traditionally requires significant DevOps knowledge, into a straightforward operation that business analysts can handle confidently.

predictions and deploy

Things to know
Amazon Q Developer democratizes ML across organizations:

Empowering all skill levels with ML – Amazon Q Developer is now available in SageMaker Canvas, helping business analysts, marketing analysts, and data professionals who don’t have ML experience create solutions for business problems through a guided ML workflow. From data analysis and model selection to deployment, users can solve business problems using natural language, reducing dependence on ML experts such as data scientists and enabling organizations to innovate faster with reduced time to market.

Streamlining the ML workflow – With Amazon Q Developer available in SageMaker Canvas, users can prepare data, and build, analyze, and deploy ML models through a guided, transparent workflow. Amazon Q Developer provides advanced data preparation and AutoML capabilities that democratize ML, and allows non-ML experts to produce highly-accurate ML models.

Providing full visibility into the ML workflow – Amazon Q Developer provides full transparency by generating the underlying code and technical artifacts such as data transformation steps, model explainability, and accuracy measures. This allows cross-functional teams, including ML experts, to review, validate, and update the models as needed, facilitating collaboration in a secure environment.

Availability – Amazon Q Developer is now in preview release in Amazon SageMaker Canvas.

Pricing – Amazon Q Developer is now available in SageMaker Canvas at no additional cost to both Amazon Q Developer Pro Tier and Amazon Q Developer Free tier users. However, standard charges apply for resources such as SageMaker Canvas workspace instances and any resources used for building or deploying models. For detailed pricing information, visit the Amazon SageMaker Canvas Pricing.

To learn more about getting started visit the Amazon Q Developer product web page.

Eli

Amazon Bedrock Guardrails now supports multimodal toxicity detection with image support (preview)

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-bedrock-guardrails-now-supports-multimodal-toxicity-detection-with-image-support/

Today, we’re announcing the preview of multimodal toxicity detection with image support in Amazon Bedrock Guardrails. This new capability detects and filters out undesirable image content in addition to text, helping you improve user experiences and manage model outputs in your generative AI applications.

Amazon Bedrock Guardrails helps you implement safeguards for generative AI applications by filtering undesirable content, redacting personally identifiable information (PII), and enhancing content safety and privacy. You can configure policies for denied topics, content filters, word filters, PII redaction, contextual grounding checks, and Automated Reasoning checks (preview), to tailor safeguards to your specific use cases and responsible AI policies.

With this launch, you can now use the existing content filter policy in Amazon Bedrock Guardrails to detect and block harmful image content across categories such as hate, insults, sexual, and violence. You can configure thresholds from low to high to match your application’s needs.

This new image support works with all foundation models (FMs) in Amazon Bedrock that support image data, as well as any custom fine-tuned models you bring. It provides a consistent layer of protection across text and image modalities, making it easier to build responsible AI applications.

Tero Hottinen, VP, Head of Strategic Partnerships at KONE, envisions the following use case:

In its ongoing evaluation, KONE recognizes the potential of Amazon Bedrock Guardrails as a key component in protecting gen AI applications, particularly for relevance and contextual grounding checks, as well as the multimodal safeguards. The company envisions integrating product design diagrams and manuals into its applications, with Amazon Bedrock Guardrails playing a crucial role in enabling more accurate diagnosis and analysis of multimodal content.

Here’s how it works.

Multimodal toxicity detection in action
To get started, create a guardrail in the AWS Management Console and configure the content filters for either text or image data or both. You can also use AWS SDKs to integrate this capability into your applications.

Create guardrail
On the console, navigate to Amazon Bedrock and select Guardrails. From there, you can create a new guardrail and use the existing content filters to detect and block image data in addition to text data. The categories for Hate, Insults, Sexual, and Violence under Configure content filters can be configured for either text or image content or both. The Misconduct and Prompt attacks categories can be configured for text content only.

Amazon Bedrock Guardrails Multimodal Support

After you’ve selected and configured the content filters you want to use, you can save the guardrail and start using it to build safe and responsible generative AI applications.

To test the new guardrail in the console, select the guardrail and choose Test. You have two options: test the guardrail by choosing and invoking a model or to test the guardrail without invoking a model by using the Amazon Bedrock Guardrails independent ApplyGuardail API.

With the ApplyGuardrail API, you can validate content at any point in your application flow before processing or serving results to the user. You can also use the API to evaluate inputs and outputs for any self-managed (custom), or third-party FMs, regardless of the underlying infrastructure. For example, you could use the API to evaluate a Meta Llama 3.2 model hosted on Amazon SageMaker or a Mistral NeMo model running on your laptop.

Test guardrail by choosing and invoking a model
Select a model that supports image inputs or outputs, for example, Anthropic’s Claude 3.5 Sonnet. Verify that the prompt and response filters are enabled for image content. Next, provide a prompt, upload an image file, and choose Run.

Amazon Bedrock Guardrails Multimodal Support

In my example, Amazon Bedrock Guardrails intervened. Choose View trace for more details.

The guardrail trace provides a record of how safety measures were applied during an interaction. It shows whether Amazon Bedrock Guardrails intervened or not and what assessments were made on both input (prompt) and output (model response). In my example, the content filters blocked the input prompt because they detected insults in the image with a high confidence.

Amazon Bedrock Guardrails Multimodal Support

Test guardrail without invoking a model
In the console, choose Use Guardrails independent API to test the guardrail without invoking a model. Choose whether you want to validate an input prompt or an example of a model generated output. Then, repeat the steps from before. Verify that the prompt and response filters are enabled for image content, provide the content to validate, and choose Run.

Amazon Bedrock Guardrails Multimodal Support

I reused the same image and input prompt for my demo, and Amazon Bedrock Guardrails intervened again. Choose View trace again for more details.

Amazon Bedrock Guardrails Multimodal Support

Join the preview
Multimodal toxicity detection with image support is available today in preview in Amazon Bedrock Guardrails in the US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Tokyo), Europe (Frankfurt, Ireland, London), and AWS GovCloud (US-West) AWS Regions. To learn more, visit Amazon Bedrock Guardrails.

Give the multimodal toxicity detection content filter a try today in the Amazon Bedrock console and let us know what you think! Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Antje

New Amazon Bedrock capabilities enhance data processing and retrieval

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-amazon-bedrock-capabilities-enhance-data-processing-and-retrieval/

Today, Amazon Bedrock introduces four enhancements that streamline how you can analyze data with generative AI:

Amazon Bedrock Data Automation (preview) – A fully managed capability of Amazon Bedrock that streamlines the generation of valuable insights from unstructured, multimodal content such as documents, images, audio, and videos. With Amazon Bedrock Data Automation, you can build automated intelligent document processing (IDP), media analysis, and Retrieval-Augmented Generation (RAG) workflows quickly and cost-effectively. Insights include video summaries of key moments, detection of inappropriate image content, automated analysis of complex documents, and much more. You can customize outputs to tailor insights into your specific business needs. Amazon Bedrock Data Automation can be used as a standalone feature or as a parser when setting up a knowledge base for RAG workflows.

Amazon Bedrock Knowledge Bases now processes multimodal data –To help build applications that process both text and visual elements in documents and images, you can configure a knowledge base to parse documents using either Amazon Bedrock Data Automation or use a foundation model (FM) as the parser. Multimodal data processing can improve the accuracy and relevancy of the responses you get from a knowledge base which includes information embedded in both images and text.

Amazon Bedrock Knowledge Bases now supports GraphRAG (preview) – We now offer one of the first fully-managed GraphRAG capabilities. GraphRAG enhances generative AI applications by providing more accurate and comprehensive responses to end users by using RAG techniques combined with graphs.

Amazon Bedrock Knowledge Bases now supports structured data retrieval – This capability extends a knowledge base to support natural language querying of data warehouses and data lakes so that applications can access business intelligence (BI) through conversational interfaces and improve the accuracy of the responses by including critical enterprise data. Amazon Bedrock Knowledge Bases provides one of the first fully-managed out-of-the-box RAG solutions that can natively query structured data from where it resides. This capability helps break data silos across data sources and accelerates building generative AI applications from over a month to just a few days.

These new capabilities make it easier to build comprehensive AI applications that can process, understand, and retrieve information from structured and unstructured data sources. For example, a car insurance company can use Amazon Bedrock Data Automation to automate their claims adjudication workflow to reduce the time taken to process automobile claims, improving the productivity of their claims department.

Similarly, a media company can analyze TV shows and extract insights needed for smart advertisement placement such as scene summaries, industry standard advertising taxonomies (IAB), and company logos. A media production company can generate scene-by-scene summaries and capture key moments in their video assets. A financial services company can process complex financial documents containing charts and tables and use GraphRAG to understand relationships between different financial entities. All these companies can use structured data retrieval to query their data warehouse while retrieving information from their knowledge base.

Let’s take a closer look at these features.

Introducing Amazon Bedrock Data Automation
Amazon Bedrock Data Automation is a capability of Amazon Bedrock that simplifies the process of extracting valuable insights from multimodal, unstructured content, such as documents, images, videos, and audio files.

Amazon Bedrock Data Automation provides a unified, API-driven experience that developers can use to process multimodal content through a single interface, eliminating the need to manage and orchestrate multiple AI models and services. With built-in safeguards, such as visual grounding and confidence scores, Amazon Bedrock Data Automation helps promote the accuracy and trustworthiness of the extracted insights, making it easier to integrate into enterprise workflows.

Amazon Bedrock Data Automation supports 4 modalities (documents, images, video, and audio). When used in an application, all modalities use the same asynchronous inference API, and results are written to an Amazon Simple Storage Service (Amazon S3) bucket.

For each modality, you can configure the output based on your processing needs and generate two types of outputs:

Standard output – With standard output, you get predefined default insights that are relevant to the input data type. Examples include semantic representation of documents, summaries of videos by scene, audio transcripts and more. You can configure which insights you want to extract with just a few steps.

Custom output – With custom output, you have the flexibility to define and specify your extraction needs using artifacts called “blueprints” to generate insights tailored to your business needs. You can also transform the generated output into a specific format or schema that is compatible with your downstream systems such as databases or other applications.

Standard output can be used with all formats (audio, documents, images, and videos). During the preview, custom output can only be used with documents and images.

Both standard and custom output configurations can be saved in a project to reference in the Amazon Bedrock Data Automation inference API. A project can be configured to generate both standard output and custom output for each processed file.

Let’s look at an example of processing a document for both standard and custom outputs.

Using Amazon Bedrock Data Automation
On the Amazon Bedrock console, I choose Data Automation in the navigation pane. Here, I can review how this capability works with a few sample use cases.

Console screenshot.

Then, I choose Demo in the Data Automation section of the navigation pane. I can try this capability using one of the provided sample documents or by uploading my own. For example, let’s say I am working on an application that needs to process birth certificates.

I start by uploading a birth certificate to see the standard output results. The first time I upload a document, I’m asked to confirm to create an S3 bucket to store the assets. When I look at the standard output, I can tailor the result with a few quick settings.

Console screenshot.

I choose the Custom output tab. The document is recognized by one of the sample blueprints and information is extracted across multiple fields.

Console screenshot.

Most of the data for my application is there but I need a few customizations. For example, the date the birth certificate was issued (JUNE 10, 2022) is in a different format than the other dates in the document. I also need the state that issued the certificate and a couple of flags that tell me if the child last name matches the one from the mother or the father.

Most of the fields in the previous blueprint use the Explicit extraction type. That means they’re extracted as they are from the document.

If I want a date in a specific format, I can create a new field using the Inferred extraction type and add instructions on how to format the result starting from the content of the document. Inferred extractions can be used to perform transformations, such as date or Social Security number (SSN) format, or validations, for example, to check if a person is over 21 based on today’s date.

Sample blueprints cannot be edited. I choose Duplicate blueprint to create a new blueprint that I can edit and then Add field from the Fields drop down.

I add four fields with extraction type Inferred and these instructions:

  1. The date the birth certificate was issued in MM/DD/YYYY format
  2. The state that issued the birth certificate 
  3. Is ChildLastName equal to FatherLastName
  4. Is ChildLastName equal to MotherLastName

The first two fields are strings and the last two booleans.

Console screenshot.

After I create the new fields, I can apply the new blueprint to the document I previously uploaded.

I choose Get result and look for the new fields in the results. I see the date formatted as I need, the two flags, and the state.

Console screenshot.

Now that I have created this custom blueprint tailored to the needs of my application, I can add it to a project. I can associate multiple blueprints with a project for the different document types I want to process, such as a blueprint for passports, a blueprint for birth certificates, a blueprint for invoices, and so on. When processing documents, Amazon Bedrock Data Automation matches each document to a blueprints within the project to extract relevant information.

I can also create a new blueprint form scratch. In that case, I can start with a prompt where I declare any fields I expect to find in the uploaded document and perform normalizations or validations.

Amazon Bedrock Data Automation can also process audio and video files. For example, here’s the standard output when uploading a video from a keynote presentation by Swami Sivasubramanian VP, AI and Data at AWS.

Console screenshot.

It takes a few minutes to get the output. The results include a summarization of the overall video, a summary scene by scene, and the text that appears during the video. From here, I can toggle the options to have a full audio transcript, content moderation, or Interactive Advertising Bureau (IAB) taxonomy.

I can also use Amazon Bedrock Data Automation as a parser when creating a knowledge base to extract insights from visually rich documents and images, for retrieval and response generation. Let’s see that in the next section.

Using multimodal data processing in Amazon Bedrock Knowledge Bases
Multimodal data processing support enables applications to understand both text and visual elements in documents.

With multimodal data processing, applications can use a knowledge base to:

  • Retrieve answers from visual elements in addition to existing support of text.
  • Generate responses based on the context that includes both text and visual data.
  • Provide source attribution that references visual elements from the original documents.

When creating a knowledge base in the Amazon Bedrock console, I now have the option to select Amazon Bedrock Data Automation as Parsing strategy.

When I select Amazon Bedrock Data Automation as parser, Amazon Bedrock Data Automation handles the extraction, transformation, and generation of insights from visually rich content, while Amazon Bedrock Knowledge Bases manages ingestion, retrieval, model response generation, and source attribution.

Alternatively, I can use the existing Foundation models as a parser option. With this option, there’s now support for Anthropic’s Claude 3.5 Sonnet as parser, and I can use the default prompt or modify it to suit a specific use case.

Console screenshot.

In the next step, I specify the Multimodal storage destination on Amazon S3 that will be used by Amazon Bedrock Knowledge Bases to store images extracted from my documents in the knowledge base data source. These images can be retrieved based on a user query, used to generate the response, and cited in the response.

Console screenshot.

When using the knowledge base, the information extracted by Amazon Bedrock Data Automation or FMs as parser is used to retrieve information about visual elements, understand charts and diagrams, and provide responses that reference both textual and visual content.

Using GraphRAG in Amazon Bedrock Knowledge Bases
Extracting insights from scattered data sources presents significant challenges for RAG applications, requiring multi-step reasoning across these data sources to generate relevant responses. For example, a customer might ask a generative AI-powered travel application to identify family-friendly beach destinations with direct flights from their home location that also offer good seafood restaurants. This requires a connected workflow to identify suitable beaches that other families have enjoyed, match these to flight routes, and select highly-rated local restaurants. A traditional RAG system may struggle to synthesize all these pieces into a cohesive recommendation because the information lives in disparate sources and is not interlinked.

Knowledge graphs can address this challenge by modeling complex relationships between entities in a structured way. However, building and integrating graphs into an application requires significant expertise and effort.

Amazon Bedrock Knowledge Bases now offers one of the first fully managed GraphRAG capabilities that enhances generative AI applications by providing more accurate and comprehensive responses to end users by using RAG techniques combined with graphs.

When creating a knowledge base, I can now enable GraphRAG in just a few steps by choosing Amazon Neptune Analytics as database, automatically generating vector and graph representations of the underlying data, entities and their relationships, and reducing development effort from several weeks to just a few hours.

I start the creation of new knowledge base. In the Vector database section, when creating a new vector store, I select Amazon Neptune Analytics (GraphRAG). If I don’t want to create a new graph, I can provide an existing vector store and select a Neptune Analytics graph from the list. GraphRAG uses Anthropic’s Claude 3 Haiku to automatically build graphs for a knowledge base.

Console screenshot.

After I complete the creation of the knowledge base, Amazon Bedrock automatically builds a graph, linking related concepts and documents. When retrieving information from the knowledge base, GraphRAG traverses these relationships to provide more comprehensive and accurate responses.

Using structured data retrieval in Amazon Bedrock Knowledge Bases
Structured data retrieval allows natural language querying of databases and data warehouses. For example, a business analyst might ask, “What were our top-selling products last quarter?” and the system automatically generates and runs the appropriate SQL query for a data warehouse stored in an Amazon Redshift database.

When creating a knowledge base, I now have the option to use a structured data store.

Console screenshot.

I enter a name and description for the knowledge base. In Data source details, I use Amazon Redshift as Query engine. I create a new AWS Identity and Access Management (IAM) service role to manage the knowledge base resources and choose Next.

Console screenshot.

I choose Redshift serverless in Connection options and the Workgroup to use. Amazon Redshift provisioned clusters are also supported. I use the previously created IAM role for Authentication. Storage metadata can be managed with AWS Glue Data Catalog or directly within an Amazon Redshift database. I select a database from the list.

Console screenshot.

In the configuration of the knowledge base, I can define the maximum duration for a query and include or exclude access to tables or columns. To improve the accuracy of query generation from natural language, I can optionally add a description for tables and columns and a list of curated queries that provides practical examples of how to translate a question into a SQL query for my database. I choose Next, review the settings, and complete the creation of the knowledge base

After a few minutes, the knowledge base is ready. Once synced, Amazon Bedrock Knowledge Bases handles generating, running, and formatting the result of the query, making it easy to build natural language interfaces to structured data. When invoking a knowledge base using structured data, I can ask to only generate SQL, retrieve data, or summarize the data in natural language.

Things to know
These new capabilities are available today in the following AWS Regions:

  • Amazon Bedrock Data Automation is available in preview in US West (Oregon).
  • Multimodal data processing support in Amazon Bedrock Knowledge Bases using Amazon Bedrock Data Automation as parser is available in preview in US West (Oregon). FM as a parser is available in all Regions where Amazon Bedrock Knowledge Bases is offered.
  • GraphRAG in Amazon Bedrock Knowledge Bases is available in preview in all commercial Regions where Amazon Bedrock Knowledge Bases and Amazon Neptune Analytics are offered.
  • Structured data retrieval is available in Amazon Bedrock Knowledge Bases in all commercial Regions where Amazon Bedrock Knowledge Bases is offered.

As usual with Amazon Bedrock, pricing is based on usage:

  • Amazon Bedrock Data Automation charges per images, per page for documents, and per minute for audio or video.
  • Multimodal data processing in Amazon Bedrock Knowledge Bases is charged based on the use of either Amazon Bedrock Data Automation or the FM as parser.
  • There is no additional cost for using GraphRAG in Amazon Bedrock Knowledge Bases but you pay for using Amazon Neptune Analytics as the vector store. For more information, visit Amazon Neptune pricing.
  • There is an additional cost when using structured data retrieval in Amazon Bedrock Knowledge Bases.

For detailed pricing information, see Amazon Bedrock pricing.

Each capability can be used independently or in combination. Together, they make it easier and faster to build applications that use AI to process data. To get started, visit the Amazon Bedrock console. To learn more, you can access the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

Danilo

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/

Today, Amazon Bedrock has introduced in preview two capabilities that help reduce costs and latency for generative AI applications:

Amazon Bedrock Intelligent Prompt Routing – When invoking a model, you can now use a combination of foundation models (FMs) from the same model family to help optimize for quality and cost. For example, with the Anthropic’s Claude model family, Amazon Bedrock can intelligently route requests between Claude 3.5 Sonnet and Claude 3 Haiku depending on the complexity of the prompt. Similarly, Amazon Bedrock can route requests between Meta Llama 3.1 70B and 8B. The prompt router predicts which model will provide the best performance for each request while optimizing the quality of response and cost. This is particularly useful for applications such as customer service assistants, where uncomplicated queries can be handled by smaller, faster, and more cost-effective models, and complex queries are routed to more capable models. Intelligent Prompt Routing can reduce costs by up to 30 percent without compromising on accuracy.

Amazon Bedrock now supports prompt caching – You can now cache frequently used context in prompts across multiple model invocations. This is especially valuable for applications that repeatedly use the same context, such as document Q&A systems where users ask multiple questions about the same document or coding assistants that need to maintain context about code files. The cached context remains available for up to 5 minutes after each access. Prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.

These features make it easier to reduce latency and balance performance with cost efficiency. Let’s look at how you can use them in your applications.

Using Amazon Bedrock Intelligent Prompt Routing in the console
Amazon Bedrock Intelligent Prompt Routing uses advanced prompt matching and model understanding techniques to predict the performance of each model for every request, optimizing for quality of responses and cost. During the preview, you can use the default prompt routers for Anthropic’s Claude and Meta Llama model families.

Intelligent prompt routing can be accessed through the AWS Management Console, the AWS Command Line Interface (AWS CLI), and the AWS SDKs. In the Amazon Bedrock console, I choose Prompt routers in the Foundation models section of the navigation pane.

Console screenshot.

I choose the Anthropic Prompt Router default router to get more information.

Console screenshot.

From the configuration of the prompt router, I see that it’s routing requests between Claude 3.5 Sonnet and Claude 3 Haiku using cross-Region inference profiles. The routing criteria defines the quality difference between the response of the largest model and the smallest model for each prompt as predicted by the router internal model at runtime. The fallback model, used when none of the chosen models meet the desired performance criteria, is Anthropic’s Claude 3.5 Sonnet.

I choose Open in Playground to chat using the prompt router and enter this prompt:

Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?

The result is quickly provided. I choose the new Router metrics icon on the right to see which model was selected by the prompt router. In this case, because the question is rather complex, Anthropic’s Claude 3.5 Sonnet was used.

Console screenshot.

Now I ask a straightforward question to the same prompt router:

Describe the purpose of a 'hello world' program in one line.

This time, Anthropic’s Claude 3 Haiku has been selected by the prompt router.

Console screenshot.

I select the Meta Prompt Router to check its configuration. It’s using the cross-Region inference profiles for Llama 3.1 70B and 8B with the 70B model as fallback.

Console screenshot.

Prompt routers are integrated with other Amazon Bedrock capabilities, such as Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents, or when performing evaluations. For example, here I create a model evaluation to help me compare, for my use case, a prompt router to another model or prompt router.

Console screenshot.

To use a prompt router in an application, I need to set the prompt router Amazon Resource Name (ARN) as model ID in the Amazon Bedrock API. Let’s see how this works with the AWS CLI and an AWS SDK.

Using Amazon Bedrock Intelligent Prompt Routing with the AWS CLI
The Amazon Bedrock API has been extended to handle prompt routers. For example, I can list the existing prompt routes in an AWS Region using ListPromptRouters:

aws bedrock list-prompt-routers

In output, I receive a summary of the existing prompt routers, similar to what I saw in the console.

Here’s the full output of the previous command:

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}

I can get information about a specific prompt router using GetPromptRouter with a prompt router ARN. For example, for the Meta Llama model family:

aws bedrock get-prompt-router --prompt-router-arn arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1
{
    "promptRouterName": "Meta Prompt Router",
    "routingCriteria": {
        "responseQualityDifference": 0.0
    },
    "description": "Routes requests among models in the LLaMA family",
    "createdAt": "2024-11-20T00:00:00+00:00",
    "updatedAt": "2024-11-20T00:00:00+00:00",
    "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
        },
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
        }
    ],
    "fallbackModel": {
        "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
    },
    "status": "AVAILABLE",
    "type": "default"
}

To use a prompt router with Amazon Bedrock, I set the prompt router ARN as model ID when making API calls. For example, here I use the Anthropic Prompt Router with the AWS CLI and the Amazon Bedrock Converse API:

aws bedrock-runtime converse \
    --model-id arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1 \
    --messages '[{ "role": "user", "content": [ { "text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?" } ] }]' \

In output, invocations using a prompt router include a new trace section that tells which model was actually used. In this case, it’s Anthropic’s Claude 3.5 Sonnet:

{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "To solve this problem, let's think it through step-by-step:\n\n1) First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2) Now, we need to consider who Alice's brothers' sisters are:\n   - Alice herself is a sister to all her brothers\n   - All of Alice's sisters are also sisters to Alice's brothers\n\n3) So, the total number of sisters that Alice's brothers have is:\n   - The number of Alice's sisters (M)\n   - Plus Alice herself (+1)\n\n4) Therefore, the answer can be expressed as: M + 1\n\nThus, Alice's brothers have M + 1 sisters."
                }
            ]
        }
    },
    . . .
    "trace": {
        "promptRouter": {
            "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
        }
    }
}

Using Amazon Bedrock Intelligent Prompt Routing with an AWS SDK
Using an AWS SDK with a prompt router is similar to the previous command line experience. When invoking a model, I set the model ID to the prompt model ARN. For example, in this Python code I’m using the Meta Llama router with the ConverseStream API:

import json
import boto3

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1"

user_message = "Describe the purpose of a 'hello world' program in one line."
messages = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

streaming_response = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=messages,
)

for chunk in streaming_response["stream"]:
    if "contentBlockDelta" in chunk:
        text = chunk["contentBlockDelta"]["delta"]["text"]
        print(text, end="")
    if "messageStop" in chunk:
        print()
    if "metadata" in chunk:
        if "trace" in chunk["metadata"]:
            print(json.dumps(chunk['metadata']['trace'], indent=2))

This script prints the response text and the content of the trace in response metadata. For this uncomplicated request, the faster and more affordable model has been selected by the prompt router:

A "Hello World" program is a simple, introductory program that serves as a basic example to demonstrate the fundamental syntax and functionality of a programming language, typically used to verify that a development environment is set up correctly.
{
  "promptRouter": {
    "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
  }
}

Using prompt caching with an AWS SDK
You can use prompt caching with the Amazon Bedrock Converse API. When you tag content for caching and send it to the model for the first time, the model processes the input and saves the intermediate results in a cache. For subsequent requests containing the same content, the model loads the preprocessed results from the cache, significantly reducing both costs and latency.

You can implement prompt caching in your applications with a few steps:

  1. Identify the portions of your prompts that are frequently reused.
  2. Tag these sections for caching in the list of messages using the new cachePoint block.
  3. Monitor cache usage and latency improvements in the response metadata usage section.

Here’s an example of implementing prompt caching when working with documents.

First, I download three decision guides in PDF format from the AWS website. These guides help choose the AWS services that fit your use case.

Then, I use a Python script to ask three questions about the documents. In the code, I create a converse() function to handle the conversation with the model. The first time I call the function, I include a list of documents and a flag to add a cachePoint block.

import json

import boto3

MODEL_ID = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
AWS_REGION = "us-west-2"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name=AWS_REGION,
)

DOCS = [
    "bedrock-or-sagemaker.pdf",
    "generative-ai-on-aws-how-to-choose.pdf",
    "machine-learning-on-aws-how-to-choose.pdf",
]

messages = []


def converse(new_message, docs=[], cache=False):

    if len(messages) == 0 or messages[-1]["role"] != "user":
        messages.append({"role": "user", "content": []})

    for doc in docs:
        print(f"Adding document: {doc}")
        name, format = doc.rsplit('.', maxsplit=1)
        with open(doc, "rb") as f:
            bytes = f.read()
        messages[-1]["content"].append({
            "document": {
                "name": name,
                "format": format,
                "source": {"bytes": bytes},
            }
        })

    messages[-1]["content"].append({"text": new_message})

    if cache:
        messages[-1]["content"].append({"cachePoint": {"type": "default"}})

    response = bedrock_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))

    messages.append(output_message)


converse("Compare AWS Trainium and AWS Inferentia in 20 words or less.", docs=DOCS, cache=True)
converse("Compare Amazon Textract and Amazon Transcribe in 20 words or less.")
converse("Compare Amazon Q Business and Amazon Q Developer in 20 words or less.")

For each invocation, the script prints the response and the usage counters.

Adding document: bedrock-or-sagemaker.pdf
Adding document: generative-ai-on-aws-how-to-choose.pdf
Adding document: machine-learning-on-aws-how-to-choose.pdf
Response text:
AWS Trainium is optimized for machine learning training, while AWS Inferentia is designed for low-cost, high-performance machine learning inference.
Usage:
{
  "inputTokens": 4,
  "outputTokens": 34,
  "totalTokens": 29879,
  "cacheReadInputTokenCount": 0,
  "cacheWriteInputTokenCount": 29841
}
Response text:
Amazon Textract extracts text and data from documents, while Amazon Transcribe converts speech to text from audio or video files.
Usage:
{
  "inputTokens": 59,
  "outputTokens": 30,
  "totalTokens": 29930,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}
Response text:
Amazon Q Business answers questions using enterprise data, while Amazon Q Developer assists with building and operating AWS applications and services.
Usage:
{
  "inputTokens": 108,
  "outputTokens": 26,
  "totalTokens": 29975,
  "cacheReadInputTokenCount": 29841,
  "cacheWriteInputTokenCount": 0
}

The usage section of the response contains two new counters: cacheReadInputTokenCount and cacheWriteInputTokenCount. The total number of tokens for an invocation is the sum of the input and output tokens plus the tokens read and written into the cache.

Each invocation processes a list of messages. The messages in the first invocation contain the documents, the first question, and the cache point. Because the messages preceding the cache point aren’t currently in the cache, they’re written to cache. According to the usage counters, 29,841 tokens have been written into the cache.

"cacheWriteInputTokenCount": 29841

For the next invocations, the previous response and the new question are appended to the list of messages. The messages before the cachePoint are not changed and found in the cache.

As expected, we can tell from the usage counters that the same number of tokens previously written is now read from the cache.

"cacheReadInputTokenCount": 29841

In my tests, the next invocations take 55 percent less time to complete compared to the first one. Depending on your use case (for example, with more cached content), prompt caching can improve latency up to 85 percent.

Depending on the model, you can set more than one cache point in a list of messages. To find the right cache points for your use case, try different configurations and look at the effect on the reported usage.

Things to know
Amazon Bedrock Intelligent Prompt Routing is available in preview today in US East (N. Virginia) and US West (Oregon) AWS Regions. During the preview, you can use the default prompt routers, and there is no additional cost for using a prompt router. You pay the cost of the selected model. You can use prompt routers with other Amazon Bedrock capabilities such as performing evaluations, using knowledge bases, and configuring agents.

Because the internal model used by the prompt routers needs to understand the complexity of a prompt, intelligent prompt routing currently only supports English language prompts.

Amazon Bedrock support for prompt caching is available in preview in US West (Oregon) for Anthropic’s Claude 3.5 Sonnet V2 and Claude 3.5 Haiku. Prompt caching is also available in US East (N. Virginia) for Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro.

With prompt caching, cache reads receive a 90 percent discount compared to noncached input tokens. There are no additional infrastructure charges for cache storage. When using Anthropic models, you pay an additional cost for tokens written in the cache. There are no additional costs for cache writes with Amazon Nova models. For more information, see Amazon Bedrock pricing.

When using prompt caching, content is cached for up to 5 minutes, with each cache hit resetting this countdown. Prompt caching has been implemented to transparently support cross-Region inference. In this way, your applications can get the cost optimization and latency benefit of prompt caching with the flexibility of cross-Region inference.

These new capabilities make it easier to build cost-effective and high-performing generative AI applications. By intelligently routing requests and caching frequently used content, you can significantly reduce your costs while maintaining and even improving application performance.

To learn more and start using these new capabilities today, visit the Amazon Bedrock documentation and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws.

Danilo

Meet your training timelines and budgets with new Amazon SageMaker HyperPod flexible training plans

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/meet-your-training-timelines-and-budgets-with-new-amazon-sagemaker-hyperpod-flexible-training-plans/

Today, we’re announcing the general availability of Amazon SageMaker HyperPod flexible training plans to help data scientists train large foundation models (FMs) within their timelines and budgets and save them weeks of effort in managing the training process based on compute availability.

At AWS re:Invent 2023, we introduced SageMaker HyperPod to reduce the time to train FMs by up to 40 percent and scale across thousands of compute resources in parallel with preconfigured distributed training libraries and built-in resiliency. Most generative AI model development tasks need accelerated compute resources in parallel. Our customers struggle to find timely access to compute resources to complete their training within their timeline and budget constraints.

With today’s announcement, you can find the required accelerated compute resources for training, create the most optimal training plans, and run training workloads across different blocks of capacity based on the availability of the compute resources. Within a few steps, you can identify training completion date, budget, compute resources requirements, create optimal training plans, and run fully managed training jobs, without needing manual intervention.

SageMaker HyperPod training plans in action
To get started, go to the Amazon SageMaker AI console, choose Training plans in the left navigation pane, and choose Create training plan.

For example, choose your preferred training date and time (10 days), instance type and count (16 ml.p5.48xlarge) for SageMaker HyperPod cluster, and choose Find training plan.

SageMaker HyperPod suggests a training plan that is split into two five-day segments. This includes the total upfront price for the plan.

If you accept this training plan, add your training details in the next step and choose Create your plan.

After creating your training plan, you can see the list of training plans. When you’ve created a training plan, you have to pay upfront for the plan within 12 hours. One plan is in the Active state and already started, with all the instances being used. The second plan is Scheduled to start later, but you can already submit jobs that start automatically when the plan begins.

In the active status, the compute resources are available in SageMaker HyperPod, resume automatically after pauses in availability, and terminates at the end of the plan. There is a first segment currently running and another segment queued up to run after the current segment.

This is similar to the Managed Spot training in SageMaker AI, where SageMaker AI takes care of instance interruptions and continues the training with no manual intervention. To learn more, visit the SageMaker HyperPod training plans in the Amazon SageMaker AI Developer Guide.

Now available
Amazon SageMaker HyperPod training plans are now available in US East (N. Virginia), US East (Ohio), US West (Oregon) AWS Regions and support ml.p4d.48xlarge, ml.p5.48xlarge, ml.p5e.48xlargeml.p5en.48xlarge, and ml.trn2.48xlarge instances. Trn2 and P5en instances are only in US East (Ohio) Region. To learn more, visit the SageMaker HyperPod product page and SageMaker AI pricing page.

Give HyperPod training plans a try in the Amazon SageMaker AI console and send feedback to AWS re:Post for SageMaker AI or through your usual AWS Support contacts.

Channy