Tag Archives: Amazon Sagemaker

Let’s Architect! Discovering Generative AI on AWS

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-generative-ai/

Generative artificial intelligence (generative AI) is a type of AI used to generate content, including conversations, images, videos, and music. Generative AI can be used directly to build customer-facing features (a chatbot or an image generator), or it can serve as an underlying component in a more complex system. For example, it can generate embeddings (or compressed representations) or any other artifact necessary to improve downstream machine learning (ML) models or back-end services.

With the advent of generative AI, it’s fundamental to understand what it is, how it works under the hood, and which options are available for putting it into production. In some cases, it can also be helpful to move closer to the underlying model in order to fine tune or drive domain-specific improvements. With this edition of Let’s Architect!, we’ll cover these topics and share an initial set of methodologies to put generative AI into production. We’ll start with a broad introduction to the domain and then share a mix of videos, blogs, and hands-on workshops.

Navigating the future of AI 

Many teams are turning to open source tools running on Kubernetes to help accelerate their ML and generative AI journeys. In this video session, experts discuss why Kubernetes is ideal for ML, then tackle challenges like dependency management and security. You will learn how tools like Ray, JupyterHub, Argo Workflows, and Karpenter can accelerate your path to building and deploying generative AI applications on Amazon Elastic Kubernetes Service (Amazon EKS). A real-world example showcases how Adobe leveraged Amazon EKS to achieve faster time-to-market and reduced costs. You will be also introduced to Data on EKS, a new AWS project offering best practices for deploying various data workloads on Amazon EKS.

Take me to this video!

Containers are a powerful tool for creating reproducible research and production environments for ML.

Figure 1. Containers are a powerful tool for creating reproducible research and production environments for ML.

Generative AI: Architectures and applications in depth

This video session aims to provide an in-depth exploration of the emerging concepts in generative AI. By delving into practical applications and detailing best practices for implementation, the session offers a concrete understanding that empowers businesses to harness the full potential of these technologies. You can gain valuable insights into navigating the complexities of generative AI, equipping you with the knowledge and strategies necessary to stay ahead of the curve and capitalize on the transformative power of these new methods. If you want to dive even deeper, check this generative AI best practices post.

Take me to this video!

Models are growing exponentially: improved capabilities come with higher costs for productionizing them.

Figure 2. Models are growing exponentially: improved capabilities come with higher costs for productionizing them.

SaaS meets AI/ML & generative AI: Multi-tenant patterns & strategies

Working with AI/ML workloads and generative AI in a production environment requires appropriate system design and careful considerations for tenant separation in the context of SaaS. You’ll need to think about how the different tenants are mapped to models, how inferencing is scaled, how solutions are integrated with other upstream/downstream services, and how large language models (LLMs) can be fine-tuned to meet tenant-specific needs.

This video drills down into the concept of multi-tenancy for AI/ML workloads, including the common design, performance, isolation, and experience challenges that you can find during your journey. You will also become familiar with concepts like RAG (used to enrich the LLMs with contextual information) and fine tuning through practical examples.

Take me to this video!

Supporting different tenants might need fetching different context information with RAGs or offering different options for fine-tuning.

Figure 3. Supporting different tenants might need fetching different context information with RAGs or offering different options for fine-tuning.

Achieve DevOps maturity with BMC AMI zAdviser Enterprise and Amazon Bedrock

DevOps Research and Assessment (DORA) metrics, which measure critical DevOps performance indicators like lead time, are essential to engineering practices, as shown in the Accelerate book‘s research. By leveraging generative AI technology, the zAdviser Enterprise platform can now offer in-depth insights and actionable recommendations to help organizations optimize their DevOps practices and drive continuous improvement. This blog demonstrates how generative AI can go beyond language or image generation, applying to a wide spectrum of domains.

Take me to this blog post!

Generative AI is used to provide summarization, analysis, and recommendations for improvement based on the DORA metrics.

Figure 4. Generative AI is used to provide summarization, analysis, and recommendations for improvement based on the DORA metrics.

Hands-on Generative AI: AWS workshops

Getting hands on is often the best way to understand how everything works in practice and create the mental model to connect theoretical foundations with some real-world applications.

Generative AI on Amazon SageMaker shows how you can build, train, and deploy generative AI models. You can learn about options to fine-tune, use out-of-the-box existing models, or even customize the existing open source models based on your needs.

Building with Amazon Bedrock and LangChain demonstrates how an existing fully-managed service provided by AWS can be used when you work with foundational models, covering a wide variety of use cases. Also, if you want a quick guide for prompt engineering, you can check out the PartyRock lab in the workshop.

An image replacement example that you can find in the workshop.

Figure 5. An image replacement example that you can find in the workshop.

See you next time!

Thanks for reading! We hope you got some insight into the applications of generative AI and discovered new strategies for using it. In the next blog, we will dive deeper into machine learning.

To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

Genomics workflows, Part 6: cost prediction

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/part-6-genomics-workflows-cost-prediction/

Genomics workflows run on large pools of compute resources and take petabyte-scale datasets as inputs. Workflow runs can cost as much as hundreds of thousands of US dollars. Given this large scale, scientists want to estimate the projected cost of their genomics workflow runs before deciding to launch them.

In Part 6 of this series, we build on the benchmarking concepts presented in Part 5. You will learn how to train machine learning (ML) models on historical data to predict the cost of future runs. While we focus on genomics, the design pattern is broadly applicable to any compute-intensive workflow use case.

Use case

In large life-sciences organizations, multiple research teams often use the same genomics applications. The actual cost of consuming shared resources is only periodically shown or charged back to research teams.

In this blog post’s scenario, scientists want to predict the cost of future workflow runs based on the following input parameters:

  • Workflow name
  • Input data set size
  • Expected output dataset size

In our experience, scientists might not know how to reliably estimate compute cost based on the preceding parameters. This is because workflow run cost doesn’t linearly correlate to the input dataset size. For example, some workflow steps might be highly parallelizable while others aren’t. Otherwise, scientists could simply use the AWS Pricing Calculator or interact programmatically with the AWS Price List API. To solve this problem, we use ML to model the pattern of correlation and predict workflow cost.

Business benefits of predicting the cost of genomics workflow runs

Price prediction brings the following benefits:

  • Prioritizing workflow runs based on financial impact
  • Promoting cost awareness and frugality with application users
  • Supporting enterprise resource planning and prevention of budget overruns by integrating estimation data into management reporting and approval workflows

Prerequisites

To build this solution, you must have workflows running on AWS for which you collect actual cost data after each workflow run. This setup is demonstrated in Part 3 and Part 5 of this blog series. This data provides training data for the solution’s cost prediction models.

Solution overview

This solution includes a friendly user interface, ML models that predict usage parameters, and a metadata storage mechanism to estimate the cost of a workflow run. We use the automated workflow manager presented in Part 3 and the benchmarking solution from Part 5. The data on historical workflow launches and their cost serves as training and testing data for our ML models. We store this in Amazon DynamoDB. We use AWS Amplify to host a serverless user interface and a library/framework such as React to build it.

Scientists input the required parameters about their genomics workflow run to the Amplify frontend React application. The latter makes a request to an Amazon API Gateway REST API. This invokes an AWS Lambda function, which calls an Amazon SageMaker hosted endpoint to return predicted costs (Figure 1).

This visual summarizes the cost prediction and model training processes. Users request cost predictions for future workflow runs on a web frontend hosted in AWS Amplify. The frontend passes the requests to an Amazon API Gateway endpoint with Lambda integration. The Lambda function retrieves the suitable model endpoint from the DynamoDB table and invokes the model via the Amazon SageMaker API. Model training runs on a schedule and is orchestrated by an AWS Step Functions state machine. The state machine queries training datasets from the DynamoDB table. If the new model performs better, it is registered in the SageMaker model registry. Otherwise, the state machine sends a notification to an Amazon Simple Notification Service topic stating that there are no updates.

Figure 1. Automated cost prediction of genomics workflow runs

Each workflow captured in the DynamoDB table has a corresponding ML model trained for the specific use case. Separating out models for specific workflows simplifies the model development process. This solution periodically trains ML models to improve their overall accuracy and performance. A rule in Amazon EventBridge Scheduler invokes model training on a regular basis. An AWS Step Functions state machine automates the model training process.

Implementation considerations

When a scientist submits a request (which includes the name of the workflow they’re running), API Gateway uses Lambda integration. The Lambda function retrieves a record from the DynamoDB table that keeps track of the SageMaker hosted endpoints. The partition key of the table is the workflow name (indicated as workflow_name), as shown in the following example:

This visual displays an exemplary DynamoDB record. The record includes the Amazon SageMaker hosted endpoint that AWS Lambda would retrieve for a regenie workflow.

Using the input parameters, the Lambda function invokes the SageMaker hosted endpoint and returns the inference values back to the frontend.

Automating model training

Our Step Functions state machine for model training uses native SageMaker SDK integration. It runs as follows:

  1. The state machine invokes a SageMaker training job to train a new ML model. The training job uses the historical workflow run data sourced from the DynamoDB table. After the training job completes, it outputs the ML model to an Amazon Simple Storage Service (Amazon S3) bucket.
  2. The state machine registers the new model in the SageMaker model registry.
  3. A Lambda function compares the performance of the new model with the prior version on the training dataset.
  4. If the new model performs better than the prior model, the state machine creates a new SageMaker hosted endpoint configuration and puts the endpoint name in the DynamoDB table.
  5. Otherwise, the state machine sends a notification to an Amazon Simple Notification Service (Amazon SNS) topic stating that there are no updates.

Conclusion

In this blog post, we demonstrated how genomics research teams can build a price estimator to predict genomics workflow run cost. This solution trains ML models for each workflow based on data from historical workflow runs. A state machine helps automate the entire model training process. You can use price estimation to promote cost awareness in your organization and reduce the risk of budget overruns.

Our solution is particularly suitable if you want to predict the price of individual workflow runs. If you want forecast overall consumption of your shared application infrastructure, consider deploying a forecasting workflow with Amazon Forecast. The Build workflows for Amazon Forecast with AWS Step Functions blog post provides details on the specific use case for using Amazon Forecast workflows.

Related information

Import custom models in Amazon Bedrock (preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/import-custom-models-in-amazon-bedrock-preview/

With Amazon Bedrock, you have access to a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies that make it easier to build and scale generative AI applications. Some of these models provide publicly available weights that can be fine-tuned and customized for specific use cases. However, deploying customized FMs in a secure and scalable way is not an easy task.

Starting today, Amazon Bedrock adds in preview the capability to import custom weights for supported model architectures (such as Meta Llama 2, Llama 3, and Mistral) and serve the custom model using On-Demand mode. You can import models with weights in Hugging Face safetensors format from Amazon SageMaker and Amazon Simple Storage Service (Amazon S3).

In this way, you can use Amazon Bedrock with existing customized models such as Code Llama, a code-specialized version of Llama 2 that was created by further training Llama 2 on code-specific datasets, or use your data to fine-tune models for your own unique business case and import the resulting model in Amazon Bedrock.

Let’s see how this works in practice.

Bringing a custom model to Amazon Bedrock
In the Amazon Bedrock console, I choose Imported models from the Foundation models section of the navigation pane. Now, I can create a custom model by importing model weights from an Amazon Simple Storage Service (Amazon S3) bucket or from an Amazon SageMaker model.

I choose to import model weights from an S3 bucket. In another browser tab, I download the MistralLite model from the Hugging Face website using this pull request (PR) that provides weights in safetensors format. The pull request is currently Ready to merge, so it might be part of the main branch when you read this. MistralLite is a fine-tuned Mistral-7B-v0.1 language model with enhanced capabilities of processing long context up to 32K tokens.

When the download is complete, I upload the files to an S3 bucket in the same AWS Region where I will import the model. Here are the MistralLite model files in the Amazon S3 console:

Console screenshot.

Back at the Amazon Bedrock console, I enter a name for the model and keep the proposed import job name.

Console screenshot.

I select Model weights in the Model import settings and browse S3 to choose the location where I uploaded the model weights.

Console screenshot.

To authorize Amazon Bedrock to access the files on the S3 bucket, I select the option to create and use a new AWS Identity and Access Management (IAM) service role. I use the View permissions details link to check what will be in the role. Then, I submit the job.

About ten minutes later, the import job is completed.

Console screenshot.

Now, I see the imported model in the console. The list also shows the model Amazon Resource Name (ARN) and the creation date.

Console screenshot.

I choose the model to get more information, such as the S3 location of the model files.

Console screenshot.

In the model detail page, I choose Open in playground to test the model in the console. In the text playground, I type a question using the prompt template of the model:

<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>

The MistralLite imported model is quick to reply and describe some of those challenges.

Console screenshot.

In the playground, I can tune responses for my use case using configurations such as temperature and maximum length or add stop sequences specific to the imported model.

To see the syntax of the API request, I choose the three small vertical dots at the top right of the playground.

Console screenshot.

I choose View API syntax and run the command using the AWS Command Line Interface (AWS CLI):

aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-east-1:123412341234:imported-model/a82bkefgp20f \
--body "{\"prompt\":\"<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>\",\"max_tokens\":512,\"top_k\":200,\"top_p\":0.9,\"stop\":[],\"temperature\":0.5}" \
--cli-binary-format raw-in-base64-out \
--region us-east-1 \
invoke-model-output.txt

The output is similar to what I got in the playground. As you can see, for imported models, the model ID is the ARN of the imported model. I can use the model ID to invoke the imported model with the AWS CLI and AWS SDKs.

Things to know
You can bring your own weights for supported model architectures to Amazon Bedrock in the US East (N. Virginia) AWS Region. The model import capability is currently available in preview.

When using custom weights, Amazon Bedrock serves the model with On-Demand mode, and you only pay for what you use with no time-based term commitments. For detailed information, see Amazon Bedrock pricing.

The ability to import models is managed using AWS Identity and Access Management (IAM), and you can allow this capability only to the roles in your organization that need to have it.

With this launch, it’s now easier to build and scale generative AI applications using custom models with security and privacy built in.

To learn more:

Danilo

Hybrid Search with Amazon OpenSearch Service

Post Syndicated from Hajer Bouafif original https://aws.amazon.com/blogs/big-data/hybrid-search-with-amazon-opensearch-service/

Amazon OpenSearch Service has been a long-standing supporter of both lexical and semantic search, facilitated by its utilization of the k-nearest neighbors (k-NN) plugin. By using OpenSearch Service as a vector database, you can seamlessly combine the advantages of both lexical and vector search. The introduction of the neural search feature in OpenSearch Service 2.9 further simplifies integration with artificial intelligence (AI) and machine learning (ML) models, facilitating the implementation of semantic search.

Lexical search using TF/IDF or BM25 has been the workhorse of search systems for decades. These traditional lexical search algorithms match user queries with exact words or phrases in your documents. Lexical search is more suitable for exact matches, provides low latency, and offers good interpretability of results and generalizes well across domains. However, this approach does not consider the context or meaning of the words, which can lead to irrelevant results.

In the past few years, semantic search methods based on vector embeddings have become increasingly popular to enhance search. Semantic search enables a more context-aware search, understanding the natural language questions of user queries. However, semantic search powered by vector embeddings requires fine-tuning of the ML model for the associated domain (such as healthcare or retail) and more memory resources compared to basic lexical search.

Both lexical search and semantic search have their own strengths and weaknesses. Combining lexical and vector search improves the quality of search results by using their best features in a hybrid model. OpenSearch Service 2.11 now supports out-of-the-box hybrid query capabilities that make it straightforward for you to implement a hybrid search model combining lexical search and semantic search.

This post explains the internals of hybrid search and how to build a hybrid search solution using OpenSearch Service. We experiment with sample queries to explore and compare lexical, semantic, and hybrid search. All the code used in this post is publicly available in the GitHub repository.

Hybrid search with OpenSearch Service

In general, hybrid search to combine lexical and semantic search involves the following steps:

  1. Run a semantic and lexical search using a compound search query clause.
  2. Each query type provides scores on different scales. For example, a Lucene lexical search query will return a score between 1 and infinity. On the other hand, a semantic query using the Faiss engine returns scores between 0 and 1. Therefore, you need to normalize the scores coming from each type of query to put them on the same scale before combining the scores. In a distributed search engine, this normalization needs to happen at the global level rather than shard or node level.
  3. After the scores are all on the same scale, they’re combined for every document.
  4. Reorder the documents based on the new combined score and render the documents as a response to the query.

Prior to OpenSearch Service 2.11, search practitioners would need to use compound query types to combine lexical and semantic search queries. However, this approach does not address the challenge of global normalization of scores as mentioned in Step 2.

OpenSearch Service 2.11 added the support of hybrid query by introducing the score normalization processor in search pipelines. Search pipelines take away the heavy lifting of building normalization of score results and combination outside your OpenSearch Service domain. Search pipelines run inside the OpenSearch Service domain and support three types of processors: search request processor, search response processor, and search phase results processor.

In a hybrid search, the search phase results processor runs between the query phase and fetch phase at the coordinator node (global) level. The following diagram illustrates this workflow.

Search Pipeline

The hybrid search workflow in OpenSearch Service contains the following phases:

  • Query phase – The first phase of a search request is the query phase, where each shard in your index runs the search query locally and returns the document ID matching the search request with relevance scores for each document.
  • Score normalization and combination – The search phase results processor runs between the query phase and fetch phase. It uses the normalization processer to normalize scoring results from BM25 and KNN subqueries. The search processor supports min_max and L2-Euclidean distance normalization methods. The processor combines all scores, compiles the final list of ranked document IDs, and passes them to the fetch phase. The processor supports arithmetic_mean, geometric_mean, and harmonic_mean to combine scores.
  • Fetch phase – The final phase is the fetch phase, where the coordinator node retrieves the documents that matches the final ranked list and returns the search query result.

Solution overview

In this post, you build a web application where you can search through a sample image dataset in the retail space, using a hybrid search system powered by OpenSearch Service. Let’s assume that the web application is a retail shop and you as a consumer need to run queries to search for women’s shoes.

For a hybrid search, you combine a lexical and semantic search query against the text captions of images in the dataset. The end-to-end search application high-level architecture is shown in the following figure.

Solution Architecture

The workflow contains the following steps:

  1. You use an Amazon SageMaker notebook to index image captions and image URLs from the Amazon Berkeley Objects Dataset stored in Amazon Simple Storage Service (Amazon S3) into OpenSearch Service using the OpenSearch ingest pipeline. This dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. You only use the item images and item names in US English. For demo purposes, you use approximately 1,600 products.
  2. OpenSearch Service calls the embedding model hosted in SageMaker to generate vector embeddings for the image caption. You use the GPT-J-6B variant embedding model, which generates 4,096 dimensional vectors.
  3. Now you can enter your search query in the web application hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance (c5.large). The application client triggers the hybrid query in OpenSearch Service.
  4. OpenSearch Service calls the SageMaker embedding model to generate vector embeddings for the search query.
  5. OpenSearch Service runs the hybrid query, combines the semantic search and lexical search scores for the documents, and sends back the search results to the EC2 application client.

Let’s look at Steps 1, 2, 4, and 5 in more detail.

Step 1: Ingest the data into OpenSearch

In Step 1, you create an ingest pipeline in OpenSearch Service using the text_embedding processor to generate vector embeddings for the image captions.

After you define a k-NN index with the ingest pipeline, you run a bulk index operation to store your data into the k-NN index. In this solution, you only index the image URLs, text captions, and caption embeddings where the field type for the caption embeddings is k-NN vector.

Step 2 and Step 4: OpenSearch Service calls the SageMaker embedding model

In these steps, OpenSearch Service uses the SageMaker ML connector to generate the embeddings for the image captions and query. The blue box in the preceding architecture diagram refers to the integration of OpenSearch Service with SageMaker using the ML connector feature of OpenSearch. This feature is available in OpenSearch Service starting from version 2.9. It enables you to create integrations with other ML services, such as SageMaker.

Step 5: OpenSearch Service runs the hybrid search query

OpenSearch Service uses the search phase results processor to perform a hybrid search. For hybrid scoring, OpenSearch Service uses the normalization, combination, and weights configuration settings that are set in the normalization processor of the search pipeline.

Prerequisites

Before you deploy the solution, make sure you have the following prerequisites:

  • An AWS account
  • Familiarity with the Python programming language
  • Familiarity with AWS Identity and Access Management (IAM), Amazon EC2, OpenSearch Service, and SageMaker

Deploy the hybrid search application to your AWS account

To deploy your resources, use the provided AWS CloudFormation template. Supported AWS Regions are us-east-1, us-west-2, and eu-west-1. Complete the following steps to launch the stack:

  1. On the AWS CloudFormation console, create a new stack.
  2. For Template source, select Amazon S3 URL.
  3. For Amazon S3 URL, enter the path for the template for deploying hybrid search.
  4. Choose Next.Deploy CloudFormation
  5. Name the stack hybridsearch.
  6. Keep the remaining settings as default and choose Submit.
  7. The template stack should take 15 minutes to deploy. When it’s done, the stack status will show as CREATE_COMPLETE.Check completion of CloudFormation deployment
  8. When the stack is complete, navigate to the stack Outputs tab.
  9. Choose the SagemakerNotebookURL link to open the SageMaker notebook in a separate tab.Open SageMaker Notebook URL
  10. In the SageMaker notebook, navigate to the AI-search-with-amazon-opensearch-service/opensearch-hybridsearch directory and open HybridSearch.ipynb.Open HybridSearch notebook
  11. If the notebook prompts to set the kernel, Choose the conda_pytorch_p310 kernel from the drop-down menu, then choose Set Kernel.Set the Kernel
  12. The notebook should look like the following screenshot.HybridSearch Implementation steps

Now that the notebook is ready to use, follow the step-by-step instructions in the notebook. With these steps, you create an OpenSearch SageMaker ML connector and a k-NN index, ingest the dataset into an OpenSearch Service domain, and host the web search application on Amazon EC2.

Run a hybrid search using the web application

The web application is now deployed in your account and you can access the application using the URL generated at the end of the SageMaker notebook.

Open the Application using the URL

Copy the generated URL and enter it in your browser to launch the application.

Application UI components

Complete the following steps to run a hybrid search:

  1. Use the search bar to enter your search query.
  2. Use the drop-down menu to select the search type. The available options are Keyword Search, Vector Search, and Hybrid Search.
  3. Choose GO to render results for your query or regenerate results based on your new settings.
  4. Use the left pane to tune your hybrid search configuration:
    • Under Weight for Semantic Search, adjust the slider to choose the weight for semantic subquery. Be aware that the total weight for both lexical and semantic queries should be 1.0. The closer the weight is to 1.0, the more weight is given to the semantic subquery, and this setting minus 1.0 goes as weightage to the lexical query.
    • For Select the normalization type, choose the normalization technique (min_max or L2).
    • For Select the Score Combination type, choose the score combination techniques: arithmetic_mean, geometric_mean, or harmonic_mean.

Experiment with Hybrid Search

In this post, you run four experiments to understand the differences between the outputs of each search type.

As a customer of this retail shop, you are looking for women’s shoes, and you don’t know yet what style of shoes you would like to purchase. You expect that the retail shop should be able to help you decide according to the following parameters:

  • Not to deviate from the primary attributes of what you search for.
  • Provide versatile options and styles to help you understand your preference of style and then choose one.

As your first step, enter the search query “women shoes” and choose 5 as the number of documents to output.

Next, run the following experiments and review the observation for each search type

Experiment 1: Lexical search

Experiment 1: Lexical searchFor a lexical search, choose Keyword Search as your search type, then choose GO.

The keyword search runs a lexical query, looking for same words between the query and image captions. In the first four results, two are women’s boat-style shoes identified by common words like “women” and “shoes.” The other two are men’s shoes, linked by the common term “shoes.” The last result is of style “sandals,” and it’s identified based on the common term “shoes.”

In this experiment, the keyword search provided three relevant results out of five—it doesn’t completely capture the user’s intention to have shoes only for women.

Experiment 2: Semantic search

Experiment 2: Semantic search

For a semantic search, choose Semantic search as the search type, then choose GO.

The semantic search provided results that all belong to one particular style of shoes, “boots.” Even though the term “boots” was not part of the search query, the semantic search understands that terms “shoes” and “boots” are similar because they are found to be nearest neighbors in the vector space.

In this experiment, when the user didn’t mention any specific shoe styles like boots, the results limited the user’s choices to a single style. This hindered the user’s ability to explore a variety of styles and make a more informed decision on their preferred style of shoes to purchase.

Let’s see how hybrid search can help in this use case.

Experiment 3: Hybrid search

Experiment 3: Hybrid search

Choose Hybrid Search as the search type, then choose GO.

In this example, the hybrid search uses both lexical and semantic search queries. The results show two “boat shoes” and three “boots,” reflecting a blend of both lexical and semantic search outcomes.

In the top two results, “boat shoes” directly matched the user’s query and were obtained through lexical search. In the lower-ranked items, “boots” was identified through semantic search.

In this experiment, the hybrid search gave equal weighs to both lexical and semantic search, which allowed users to quickly find what they were looking for (shoes) while also presenting additional styles (boots) for them to consider.

Experiment 4: Fine-tune the hybrid search configuration

Experiment 4: Fine-tune the hybrid search configuration

In this experiment, set the weight of the vector subquery to 0.8, which means the keyword search query has a weightage of 0.2. Keep the normalization and score combination settings set to default. Then choose GO to generate new results for the preceding query.

Providing more weight to the semantic search subquery resulted in higher scores to the semantic search query results. You can see a similar outcome as the semantic search results from the second experiment, with five images of boots for women.

You can further fine-tune the hybrid search results by adjusting the combination and normalization techniques.

In a benchmark conducted by the OpenSearch team using publicly available datasets such as BEIR and Amazon ESCI, they concluded that the min_max normalization technique combined with the arithmetic_mean score combination technique provides the best results in a hybrid search.

You need to thoroughly test the different fine-tuning options to choose what is the most relevant to your business requirements.

Overall observations

From all the previous experiments, we can conclude that the hybrid search in the third experiment had a combination of results that looks relevant to the user in terms of giving exact matches and also additional styles to choose from. The hybrid search matches the expectation of the retail shop customer.

Clean up

To avoid incurring continued AWS usage charges, make sure you delete all the resources you created as part of this post.

To clean up your resources, make sure you delete the S3 bucket you created within the application before you delete the CloudFormation stack.

OpenSearch Service integrations

In this post, you deployed a CloudFormation template to host the ML model in a SageMaker endpoint and spun up a new OpenSearch Service domain, then you used a SageMaker notebook to run steps to create the SageMaker-ML connector and deploy the ML model in OpenSearch Service.

You can achieve the same setup for an existing OpenSearch Service domain by using the ready-made CloudFormation templates from the OpenSearch Service console integrations. These templates automate the steps of SageMaker model deployment and SageMaker ML connector creation in OpenSearch Service.

Conclusion

In this post, we provided a complete solution to run a hybrid search with OpenSearch Service using a web application. The experiments in the post provided an example of how you can combine the power of lexical and semantic search in a hybrid search to improve the search experience for your end-users for a retail use case.

We also explained the new features available in version 2.9 and 2.11 in OpenSearch Service that make it effortless for you to build semantic search use cases such as remote ML connectors, ingest pipelines, and search pipelines. In addition, we showed you how the new score normalization processor in the search pipeline makes it straightforward to establish the global normalization of scores within your OpenSearch Service domain before combining multiple search scores.

Learn more about ML-powered search with OpenSearch and set up hybrid search in your own environment using the guidelines in this post. The solution code is also available on the GitHub repo.


About the Authors

Hajer Bouafif, Analytics Specialist Solutions ArchitectHajer Bouafif is an Analytics Specialist Solutions Architect at Amazon Web Services. She focuses on Amazon OpenSearch Service and helps customers design and build well-architected analytics workloads in diverse industries. Hajer enjoys spending time outdoors and discovering new cultures.

Praveen Mohan Prasad, Analytics Specialist Technical Account ManagerPraveen Mohan Prasad is an Analytics Specialist Technical Account Manager at Amazon Web Services and helps customers with pro-active operational reviews on analytics workloads. Praveen actively researches on applying machine learning to improve search relevance.

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

Power neural search with AI/ML connectors in Amazon OpenSearch Service

Post Syndicated from Aruna Govindaraju original https://aws.amazon.com/blogs/big-data/power-neural-search-with-ai-ml-connectors-in-amazon-opensearch-service/

With the launch of the neural search feature for Amazon OpenSearch Service in OpenSearch 2.9, it’s now effortless to integrate with AI/ML models to power semantic search and other use cases. OpenSearch Service has supported both lexical and vector search since the introduction of its k-nearest neighbor (k-NN) feature in 2020; however, configuring semantic search required building a framework to integrate machine learning (ML) models to ingest and search. The neural search feature facilitates text-to-vector transformation during ingestion and search. When you use a neural query during search, the query is translated into a vector embedding and k-NN is used to return the nearest vector embeddings from the corpus.

To use neural search, you must set up an ML model. We recommend configuring AI/ML connectors to AWS AI and ML services (such as Amazon SageMaker or Amazon Bedrock) or third-party alternatives. Starting with version 2.9 on OpenSearch Service, AI/ML connectors integrate with neural search to simplify and operationalize the translation of your data corpus and queries to vector embeddings, thereby removing much of the complexity of vector hydration and search.

In this post, we demonstrate how to configure AI/ML connectors to external models through the OpenSearch Service console.

Solution Overview

Specifically, this post walks you through connecting to a model in SageMaker. Then we guide you through using the connector to configure semantic search on OpenSearch Service as an example of a use case that is supported through connection to an ML model. Amazon Bedrock and SageMaker integrations are currently supported on the OpenSearch Service console UI, and the list of UI-supported first- and third-party integrations will continue to grow.

For any models not supported through the UI, you can instead set them up using the available APIs and the ML blueprints. For more information, refer to Introduction to OpenSearch Models. You can find blueprints for each connector in the ML Commons GitHub repository.

Prerequisites

Before connecting the model via the OpenSearch Service console, create an OpenSearch Service domain. Map an AWS Identity and Access Management (IAM) role by the name LambdaInvokeOpenSearchMLCommonsRole as the backend role on the ml_full_access role using the Security plugin on OpenSearch Dashboards, as shown in the following video. The OpenSearch Service integrations workflow is pre-filled to use the LambdaInvokeOpenSearchMLCommonsRole IAM role by default to create the connector between the OpenSearch Service domain and the model deployed on SageMaker. If you use a custom IAM role on the OpenSearch Service console integrations, make sure the custom role is mapped as the backend role with ml_full_access permissions prior to deploying the template.

Deploy the model using AWS CloudFormation

The following video demonstrates the steps to use the OpenSearch Service console to deploy a model within minutes on Amazon SageMaker and generate the model ID via the AI connectors. The first step is to choose Integrations in the navigation pane on the OpenSearch Service AWS console, which routes to a list of available integrations. The integration is set up through a UI, which will prompt you for the necessary inputs.

To set up the integration, you only need to provide the OpenSearch Service domain endpoint and provide a model name to uniquely identify the model connection. By default, the template deploys the Hugging Face sentence-transformers model, djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2.

When you choose Create Stack, you are routed to the AWS CloudFormation console. The CloudFormation template deploys the architecture detailed in the following diagram.

The CloudFormation stack creates an AWS Lambda application that deploys a model from Amazon Simple Storage Service (Amazon S3), creates the connector, and generates the model ID in the output. You can then use this model ID to create a semantic index.

If the default all-MiniLM-L6-v2 model doesn’t serve your purpose, you can deploy any text embedding model of your choice on the chosen model host (SageMaker or Amazon Bedrock) by providing your model artifacts as an accessible S3 object. Alternatively, you can select one of the following pre-trained language models and deploy it to SageMaker. For instructions to set up your endpoint and models, refer to Available Amazon SageMaker Images.

SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost ML for any use case, delivering key benefits such as model monitoring, serverless hosting, and workflow automation for continuous training and deployment. SageMaker allows you to host and manage the lifecycle of text embedding models, and use them to power semantic search queries in OpenSearch Service. When connected, SageMaker hosts your models and OpenSearch Service is used to query based on inference results from SageMaker.

View the deployed model through OpenSearch Dashboards

To verify the CloudFormation template successfully deployed the model on the OpenSearch Service domain and get the model ID, you can use the ML Commons REST GET API through OpenSearch Dashboards Dev Tools.

The GET _plugins REST API now provides additional APIs to also view the model status. The following command allows you to see the status of a remote model:

GET _plugins/_ml/models/<modelid>

As shown in the following screenshot, a DEPLOYED status in the response indicates the model is successfully deployed on the OpenSearch Service cluster.

Alternatively, you can view the model deployed on your OpenSearch Service domain using the Machine Learning page of OpenSearch Dashboards.

This page lists the model information and the statuses of all the models deployed.

Create the neural pipeline using the model ID

When the status of the model shows as either DEPLOYED in Dev Tools or green and Responding in OpenSearch Dashboards, you can use the model ID to build your neural ingest pipeline. The following ingest pipeline is run in your domain’s OpenSearch Dashboards Dev Tools. Make sure you replace the model ID with the unique ID generated for the model deployed on your domain.

PUT _ingest/pipeline/neural-pipeline
{
  "description": "Semantic Search for retail product catalog ",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "sfG4zosBIsICJFsINo3X",
        "field_map": {
           "description": "desc_v",
           "name": "name_v"
        }
      }
    }
  ]
}

Create the semantic search index using the neural pipeline as the default pipeline

You can now define your index mapping with the default pipeline configured to use the new neural pipeline you created in the previous step. Ensure the vector fields are declared as knn_vector and the dimensions are appropriate to the model that is deployed on SageMaker. If you have retained the default configuration to deploy the all-MiniLM-L6-v2 model on SageMaker, keep the following settings as is and run the command in Dev Tools.

PUT semantic_demostore
{
  "settings": {
    "index.knn": true,  
    "default_pipeline": "neural-pipeline",
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "desc_v": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "engine": "nmslib",
          "space_type": "cosinesimil"
        }
      },
      "name_v": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "engine": "nmslib",
          "space_type": "cosinesimil"
        }
      },
      "description": {
        "type": "text" 
      },
      "name": {
        "type": "text" 
      } 
    }
  }
}

Ingest sample documents to generate vectors

For this demo, you can ingest the sample retail demostore product catalog to the new semantic_demostore index. Replace the user name, password, and domain endpoint with your domain information and ingest raw data into OpenSearch Service:

curl -XPOST -u 'username:password' 'https://domain-end-point/_bulk' --data-binary @semantic_demostore.json -H 'Content-Type: application/json'

Validate the new semantic_demostore index

Now that you have ingested your dataset to the OpenSearch Service domain, validate if the required vectors are generated using a simple search to fetch all fields. Validate if the fields defined as knn_vectors have the required vectors.

Compare lexical search and semantic search powered by neural search using the Compare Search Results tool

The Compare Search Results tool on OpenSearch Dashboards is available for production workloads. You can navigate to the Compare search results page and compare query results between lexical search and neural search configured to use the model ID generated earlier.

Clean up

You can delete the resources you created following the instructions in this post by deleting the CloudFormation stack. This will delete the Lambda resources and the S3 bucket that contain the model that was deployed to SageMaker. Complete the following steps:

  1. On the AWS CloudFormation console, navigate to your stack details page.
  2. Choose Delete.

  1. Choose Delete to confirm.

You can monitor the stack deletion progress on the AWS CloudFormation console.

Note that, deleting the CloudFormation stack doesn’t delete the model deployed on the SageMaker domain and the AI/ML connector created. This is because these models and the connector can be associated with multiple indexes within the domain. To specifically delete a model and its associated connector, use the model APIs as shown in the following screenshots.

First, undeploy the model from the OpenSearch Service domain memory:

POST /_plugins/_ml/models/<model_id>/_undeploy

Then you can delete the model from the model index:

DELETE /_plugins/_ml/models/<model_id>

Lastly, delete the connector from the connector index:

DELETE /_plugins/_ml/connectors/<connector_id>

Conclusion

In this post, you learned how to deploy a model in SageMaker, create the AI/ML connector using the OpenSearch Service console, and build the neural search index. The ability to configure AI/ML connectors in OpenSearch Service simplifies the vector hydration process by making the integrations to external models native. You can create a neural search index in minutes using the neural ingestion pipeline and the neural search that use the model ID to generate the vector embedding on the fly during ingest and search.

To learn more about these AI/ML connectors, refer to Amazon OpenSearch Service AI connectors for AWS services, AWS CloudFormation template integrations for semantic search, and Creating connectors for third-party ML platforms.


About the Authors

Aruna Govindaraju is an Amazon OpenSearch Specialist Solutions Architect and has worked with many commercial and open source search engines. She is passionate about search, relevancy, and user experience. Her expertise with correlating end-user signals with search engine behavior has helped many customers improve their search experience.

Dagney Braun is a Principal Product Manager at AWS focused on OpenSearch.

AWS Weekly Roundup—Amazon Route53, Amazon EventBridge, Amazon SageMaker, and more – January 15, 2024

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-route53-amazon-eventbridge-amazon-sagemaker-and-more-january-15-2024/

We are in January, the start of a new year, and I imagine many of you have made a new year resolution to learn something new. If you want to learn something new and get a free Amazon Web Services (AWS) Learning Badge, check out the new Events and Workflows Learning Path. This learning path will teach you everything you need to know about AWS Step Functions, Amazon EventBridge, event-driven architectures, and serverless, and when you finish the learning path, you can take an assessment. If you pass the assessment, you get an AWS Learning Badge, credited by Credly, that you can share in your résumé and social media profiles.

Events and workflows learning path badge

Last Week’s Launches
Here are some launches that got my attention during the previous week.

Amazon Route 53 – Now you can enable Route 53 Resolver DNS Firewall to filter DNS traffic based on the query type contained in the question section of the DNS query format. In addition, Route 53 now supports geoproximity routing as an additional routing policy for DNS records. Expand and reduce the geographic area from which traffic is routed to a resource by changing the record’s bias value. This is really helpful for industries that need to deliver highly responsive digital experiences.

Amazon CloudWatch LogsCloudWatch Logs now support creating account-level subscription filters. This capability allows you to forward all the logs groups from an account to other services like Amazon OpenSearch Service or Amazon Kinesis Data Firehose.

Amazon Elastic Container Service (Amazon ECS) Amazon ECS now integrates with Amazon Elastic Block Store (Amazon EBS), allowing you to provision and attach EBS volumes to Amazon ECS tasks running on both AWS Fargate and Amazon Elastic Cloud Compute (Amazon EC2). Read the blog post Channy wrote where he shows this feature in action.

Amazon EventBridgeEventBridge now supports AWS AppSync as a target of EventBridge buses. This enables you to stream real-time updates from your backend applications to your front-end clients. For example, you can get notifications in your mobile application from an order you did when the order status changes on the backend.

Amazon SageMakerSageMaker now supports M7i, C7i, and R7i instances for machine learning (ML) inference. These instances are powered by custom 4th generation Intel Xeon scalable processors and deliver up to 15 percent better price performance than their previous generations.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

If you are a serverless enthusiast, this week, the AWS Compute Blog published the Serverless ICYMI (in case you missed it) quarterly recap for the last quarter of 2023. This post compiles the announcements made during the months of October, November, and December, with all the relevant content that was produced by AWS Developer Advocates during that time. In addition to that blog post, you can learn about ServerlessVideo, a new demo application that we launched at AWS re:Invent 2023.

ServerlessVideo

This week there were also a couple of really interesting blog posts that explain how to solve very common challenges that customers face. The first one is the blog post in the AWS Security Blog that explains how to customize access tokens in Amazon Cognito user pools. And the second one is from the AWS Database Blog, which explains how to effectively sort data with Amazon DynamoDB.

The Official AWS Podcast – Listen each week for updates on the latest AWS news and deep dives into exciting use cases. There are also official AWS podcasts in several languages. Check out the ones in FrenchGermanItalian, and Spanish.

AWS open source newsletter – This is a newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

For our customers in Turkey, on January 1, 2024, AWS Turkey Pazarlama Teknoloji ve Danışmanlık Hizmetleri Limited Şirketi (AWS Turkey) replaced AWS EMEA SARL (AWS Europe) as the contracting party and service provider to customers in Türkiye. This enables AWS customers in Türkiye to transact in their local currency (Turkish Lira) and with a local bank. For more information on AWS Turkey, visit the FAQ page.

Upcoming AWS Events
The beginning of the year is the season of AWS re:Invent recaps, which are happening all around the globe during the next two months. You can check the recaps page to find the one closest to you.

You can browse all upcoming AWS led in-person and virtual events, as well as developer-focused events such as AWS DevDay.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Package and deploy models faster with new tools and guided workflows in Amazon SageMaker

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/package-and-deploy-models-faster-with-new-tools-and-guided-workflows-in-amazon-sagemaker/

I’m happy to share that Amazon SageMaker now comes with an improved model deployment experience to help you deploy traditional machine learning (ML) models and foundation models (FMs) faster.

As a data scientist or ML practitioner, you can now use the new ModelBuilder class in the SageMaker Python SDK to package models, perform local inference to validate runtime errors, and deploy to SageMaker from your local IDE or SageMaker Studio notebooks.

In SageMaker Studio, new interactive model deployment workflows give you step-by-step guidance on which instance type to choose to find the most optimal endpoint configuration. SageMaker Studio also provides additional interfaces to add models, test inference, and enable auto scaling policies on the deployed endpoints.

New tools in SageMaker Python SDK
The SageMaker Python SDK has been updated with new tools, including ModelBuilder and SchemaBuilder classes that unify the experience of converting models into SageMaker deployable models across ML frameworks and model servers. Model builder automates the model deployment by selecting a compatible SageMaker container and capturing dependencies from your development environment. Schema builder helps to manage serialization and deserialization tasks of model inputs and outputs. You can use the tools to deploy the model in your local development environment to experiment with it, fix any runtime errors, and when ready, transition from local testing to deploy the model on SageMaker with a single line of code.

Amazon SageMaker ModelBuilder

Let me show you how this works. In the following example, I choose the Falcon-7B model from the Hugging Face model hub. I first deploy the model locally, run a sample inference, perform local benchmarking to find the optimal configuration, and finally deploy the model with the suggested configuration to SageMaker.

First, import the updated SageMaker Python SDK and define a sample model input and output that matches the prompt format for the selected model.

import sagemaker
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve import Mode

prompt = "Falcons are"
response = "Falcons are small to medium-sized birds of prey related to hawks and eagles."

sample_input = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 32}
}

sample_output = [{"generated_text": response}]

Then, create a ModelBuilder instance with the Hugging Face model ID, a SchemaBuilder instance with the sample model input and output, define a local model path, and set the mode to LOCAL_CONTAINER to deploy the model locally. The schema builder generates the required functions for serializing and deserializing the model inputs and outputs.

model_builder = ModelBuilder(
    model="tiiuae/falcon-7b",
    schema_builder=SchemaBuilder(sample_input, sample_output),
    model_path="/path/to/falcon-7b",
    mode=Mode.LOCAL_CONTAINER,
	env_vars={"HF_TRUST_REMOTE_CODE": "True"}
)

Next, call build() to convert the PyTorch model into a SageMaker deployable model. The build function generates the required artifacts for the model server, including the inferency.py and serving.properties files.

local_mode_model = model_builder.build()

For FMs, such as Falcon, you can optionally run tune() in local container mode that performs local benchmarking to find the optimal model serving configuration. This includes the tensor parallel degree that specifies the number of GPUs to use if your environment has multiple GPUs available. Once ready, call deploy() to deploy the model in your local development environment.

tuned_model = local_mode_model.tune()
tuned_model.deploy()

Let’s test the model.

updated_sample_input = model_builder.schema_builder.sample_input
print(updated_sample_input)

{'inputs': 'Falcons are',
 'parameters': {'max_new_tokens': 32}}
 
local_tuned_predictor.predict(updated_sample_input)[0]["generated_text"]

In my demo, the model returns the following response:

a type of bird that are known for their sharp talons and powerful beaks. They are also known for their ability to fly at high speeds […]

When you’re ready to deploy the model on SageMaker, call deploy() again, set the mode to SAGEMAKLER_ENDPOINT, and provide an AWS Identity and Access Management (IAM) role with appropriate permissions.

sm_predictor = tuned_model.deploy(
    mode=Mode.SAGEMAKER_ENDPOINT, 
	role="arn:aws:iam::012345678910:role/role_name"
)

This starts deploying your model on a SageMaker endpoint. Once the endpoint is ready, you can run predictions.

new_input = {'inputs': 'Eagles are','parameters': {'max_new_tokens': 32}}
sm_predictor.predict(new_input)[0]["generated_text"])

New SageMaker Studio model deployment experience
You can start the new interactive model deployment workflows by selecting one or more models to deploy from the models landing page or SageMaker JumpStart model details page or by creating a new endpoint from the endpoints details page.

Amazon SageMaker - New Model Deployment Experience

The new workflows help you quickly deploy the selected model(s) with minimal inputs. If you used SageMaker Inference Recommender to benchmark your model, the dropdown will show instance recommendations from that benchmarking.

Model deployment experience in SageMaker Studio

Without benchmarking your model, the dropdown will display prospective instances that SageMaker predicts could be a good fit based on its own heuristics. For some of the most popular SageMaker JumpStart models, you’ll see an AWS pretested optimal instance type. For other models, you’ll see generally recommended instance types. For example, if I select the Falcon 40B Instruct model in SageMaker JumpStart, I can see the recommended instance types.

Model deployment experience in SageMaker Studio

Model deployment experience in SageMaker Studio

However, if I want to optimize the deployment for cost or performance to meet my specific use cases, I could open the Alternate configurations panel to view more options based on data from before benchmarking.

Model deployment experience in SageMaker Studio

Once deployed, you can test inference or manage auto scaling policies.

Model deployment experience in SageMaker Studio

Things to know
Here are a couple of important things to know:

Supported ML models and frameworks – At launch, the new SageMaker Python SDK tools support model deployment for XGBoost and PyTorch models. You can deploy FMs by specifying the Hugging Face model ID or SageMaker JumpStart model ID using the SageMaker LMI container or Hugging Face TGI-based container. You can also bring your own container (BYOC) or deploy models using the Triton model server in ONNX format.

Now available
The new set of tools is available today in all AWS Regions where Amazon SageMaker real-time inference is available. There is no cost to use the new set of tools; you pay only for any underlying SageMaker resources that get created.

Learn more

Get started
Explore the new SageMaker model deployment experience in the AWS Management Console today!

— Antje

Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-sagemaker-adds-new-inference-capabilities-to-help-reduce-foundation-model-deployment-costs-and-latency/

Today, we are announcing new Amazon SageMaker inference capabilities that can help you optimize deployment costs and reduce latency. With the new inference capabilities, you can deploy one or more foundation models (FMs) on the same SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps to improve resource utilization, reduce model deployment costs on average by 50 percent, and lets you scale endpoints together with your use cases.

For each FM, you can define separate scaling policies to adapt to model usage patterns while further optimizing infrastructure costs. In addition, SageMaker actively monitors the instances that are processing inference requests and intelligently routes requests based on which instances are available, helping to achieve on average 20 percent lower inference latency.

Key components
The new inference capabilities build upon SageMaker real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.

Amazon SageMaker - MME

Let me show you how this works.

New inference capabilities in action
You can start using the new inference capabilities from SageMaker Studio, the SageMaker Python SDK, and the AWS SDKs and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation.

For this demo, I use the AWS SDK for Python (Boto3) to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face model hub on a SageMaker real-time endpoint using the new inference capabilities.

Create a SageMaker endpoint configuration

import boto3
import sagemaker

role = sagemaker.get_execution_role()
sm_client = boto3.client(service_name="sagemaker")

sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ExecutionRoleArn=role,
    ProductionVariants=[{
        "VariantName": "AllTraffic",
        "InstanceType": "ml.g5.12xlarge",
        "InitialInstanceCount": 1,
		"RoutingConfig": {
            "RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"
        }
    }]
)

Create the SageMaker endpoint

sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

Before you can create the inference component, you need to create a SageMaker-compatible model and specify a container image to use. For both models, I use the Hugging Face LLM Inference Container for Amazon SageMaker. These deep learning containers (DLCs) include the necessary components, libraries, and drivers to host large models on SageMaker.

Prepare the Dolly v2 model

from sagemaker.huggingface import get_huggingface_llm_image_uri

# Retrieve the container image URI
hf_inference_dlc = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

# Configure model container
dolly7b = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'databricks/dolly-v2-7b',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "dolly-v2-7b",
    ExecutionRoleArn = role,
    Containers       = [dolly7b]
)

Prepare the FLAN-T5 XXL model

# Configure model container
flant5xxlmodel = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'google/flan-t5-xxl',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "flan-t5-xxl",
    ExecutionRoleArn = role,
    Containers       = [flant5xxlmodel]
)

Now, you’re ready to create the inference component.

Create an inference component for each model
Specify an inference component for each model you want to deploy on the endpoint. Inference components let you specify the SageMaker-compatible model and the compute and memory resources you want to allocate. For CPU workloads, define the number of cores to allocate. For accelerator workloads, define the number of accelerators. RuntimeConfig defines the number of model copies you want to deploy.

# Inference compoonent for Dolly v2 7B
sm_client.create_inference_component(
    InferenceComponentName="IC-dolly-v2-7b",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "dolly-v2-7b",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 2, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

# Inference component for FLAN-T5 XXL
sm_client.create_inference_component(
    InferenceComponentName="IC-flan-t5-xxl",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "flan-t5-xxl",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 1, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

Once the inference components have successfully deployed, you can invoke the models.

Run inference
To invoke a model on the endpoint, specify the corresponding inference component.

import json
sm_runtime_client = boto3.client(service_name="sagemaker-runtime")
payload = {"inputs": "Why is California a great place to live?"}

response_dolly = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-dolly-v2-7b",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

response_flant5 = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-flan-t5-xxl",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

result_dolly = json.loads(response_dolly['Body'].read().decode())
result_flant5 = json.loads(response_flant5['Body'].read().decode())

Next, you can define separate scaling policies for each model by registering the scaling target and applying the scaling policy to the inference component. Check out the SageMaker Developer Guide for detailed instructions.

The new inference capabilities provide per-model CloudWatch metrics and CloudWatch Logs and can be used with any SageMaker-compatible container image across SageMaker CPU- and GPU-based compute instances. Given support by the container image, you can also use response streaming.

Now available
The new Amazon SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing. To learn more, visit Amazon SageMaker.

Get started
Log in to the AWS Management Console and deploy your FMs using the new SageMaker inference capabilities today!

— Antje

Amazon SageMaker Clarify makes it easier to evaluate and select foundation models (preview)

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-sagemaker-clarify-makes-it-easier-to-evaluate-and-select-foundation-models-preview/

I’m happy to share that Amazon SageMaker Clarify now supports foundation model (FM) evaluation (preview). As a data scientist or machine learning (ML) engineer, you can now use SageMaker Clarify to evaluate, compare, and select FMs in minutes based on metrics such as accuracy, robustness, creativity, factual knowledge, bias, and toxicity. This new capability adds to SageMaker Clarify’s existing ability to detect bias in ML data and models and explain model predictions.

The new capability provides both automatic and human-in-the-loop evaluations for large language models (LLMs) anywhere, including LLMs available in SageMaker JumpStart, as well as models trained and hosted outside of AWS. This removes the heavy lifting of finding the right model evaluation tools and integrating them into your development environment. It also simplifies the complexity of trying to adopt academic benchmarks to your generative artificial intelligence (AI) use case.

Evaluate FMs with SageMaker Clarify
With SageMaker Clarify, you now have a single place to evaluate and compare any LLM based on predefined criteria during model selection and throughout the model customization workflow. In addition to automatic evaluation, you can also use the human-in-the-loop capabilities to set up human reviews for more subjective criteria, such as helpfulness, creative intent, and style, by using your own workforce or managed workforce from SageMaker Ground Truth.

To get started with model evaluations, you can use curated prompt datasets that are purpose-built for common LLM tasks, including open-ended text generation, text summarization, question answering (Q&A), and classification. You can also extend the model evaluation with your own custom prompt datasets and metrics for your specific use case. Human-in-the-loop evaluations can be used for any task and evaluation metric. After each evaluation job, you receive an evaluation report that summarizes the results in natural language and includes visualizations and examples. You can download all metrics and reports and also integrate model evaluations into SageMaker MLOps workflows.

In SageMaker Studio, you can find Model evaluation under Jobs in the left menu. You can also select Evaluate directly from the model details page of any LLM in SageMaker JumpStart.

Evaluate foundation models with Amazon SageMaker Clarify

Select Evaluate a model to set up the evaluation job. The UI wizard will guide you through the selection of automatic or human evaluation, model(s), relevant tasks, metrics, prompt datasets, and review teams.

Evaluate foundation models with Amazon SageMaker Clarify

Once the model evaluation job is complete, you can view the results in the evaluation report.

Evaluate foundation models with Amazon SageMaker Clarify

In addition to the UI, you can also start with example Jupyter notebooks that walk you through step-by-step instructions on how to programmatically run model evaluation in SageMaker.

Evaluate models anywhere with the FMEval open source library
To run model evaluation anywhere, including models trained and hosted outside of AWS, use the FMEval open source library. The following example demonstrates how to use the library to evaluate a custom model by extending the ModelRunner class.

For this demo, I choose GPT-2 from the Hugging Face model hub and define a custom HFModelConfig and HuggingFaceCausalLLMModelRunner class that works with causal decoder-only models from the Hugging Face model hub such as GPT-2. The example is also available in the FMEval GitHub repo.

!pip install fmeval

# ModelRunners invoke FMs
from amazon_fmeval.model_runners.model_runner import ModelRunner

# Additional imports for custom model
import warnings
from dataclasses import dataclass
from typing import Tuple, Optional
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

@dataclass
class HFModelConfig:
    model_name: str
    max_new_tokens: int
    normalize_probabilities: bool = False
    seed: int = 0
    remove_prompt_from_generated_text: bool = True

class HuggingFaceCausalLLMModelRunner(ModelRunner):
    def __init__(self, model_config: HFModelConfig):
        self.config = model_config
        self.model = AutoModelForCausalLM.from_pretrained(self.config.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.config.model_name)

    def predict(self, prompt: str) -> Tuple[Optional[str], Optional[float]]:
        input_ids = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        generations = self.model.generate(
            **input_ids,
            max_new_tokens=self.config.max_new_tokens,
            pad_token_id=self.tokenizer.eos_token_id,
        )
        generation_contains_input = (
            input_ids["input_ids"][0] == generations[0][: input_ids["input_ids"].shape[1]]
        ).all()
        if self.config.remove_prompt_from_generated_text and not generation_contains_input:
            warnings.warn(
                "Your model does not return the prompt as part of its generations. "
                "`remove_prompt_from_generated_text` does nothing."
            )
        if self.config.remove_prompt_from_generated_text and generation_contains_input:
            output = self.tokenizer.batch_decode(generations[:, input_ids["input_ids"].shape[1] :])[0]
        else:
            output = self.tokenizer.batch_decode(generations, skip_special_tokens=True)[0]

        with torch.inference_mode():
            input_ids = self.tokenizer(self.tokenizer.bos_token + prompt, return_tensors="pt")["input_ids"]
            model_output = self.model(input_ids, labels=input_ids)
            probability = -model_output[0].item()

        return output, probability

Next, create an instance of HFModelConfig and HuggingFaceCausalLLMModelRunner with the model information.

hf_config = HFModelConfig(model_name="gpt2", max_new_tokens=32)
model = HuggingFaceCausalLLMModelRunner(model_config=hf_config)

Then, select and configure the evaluation algorithm.

# Let's evaluate the FM for FactualKnowledge
from amazon_fmeval.fmeval import get_eval_algorithm
from amazon_fmeval.eval_algorithms.factual_knowledge import FactualKnowledgeConfig

eval_algorithm_config = FactualKnowledgeConfig("<OR>")
eval_algorithm = get_eval_algorithm("factual_knowledge", eval_algorithm_config)

Let’s first test with one sample. The evaluation score is the percentage of factually correct responses.

model_output = model.predict("London is the capital of")[0]
print(model_output)

eval_algo.evaluate_sample(
    target_output="UK<OR>England<OR>United Kingdom", 
	model_output=model_output
)
the UK, and the UK is the largest producer of food in the world.

The UK is the world's largest producer of food in the world.
[EvalScore(name='factual_knowledge', value=1)]

Although it’s not a perfect response, it includes “UK.”

Next, you can evaluate the FM using built-in datasets or define your custom dataset. If you want to use a custom evaluation dataset, create an instance of DataConfig:

config = DataConfig(
    dataset_name="my_custom_dataset",
    dataset_uri="dataset.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="question",
    target_output_location="answer",
)

eval_output = eval_algorithm.evaluate(
    model=model, 
    dataset_config=config, 
    prompt_template="$feature", #$feature is replaced by the input value in the dataset 
    save=True
)

The evaluation results will return a combined evaluation score across the dataset and detailed results for each model input stored in a local output path.

Join the preview
FM evaluation with Amazon SageMaker Clarify is available today in public preview in AWS Regions US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). The FMEval open source library] is available on GitHub. To learn more, visit Amazon SageMaker Clarify.

Get started
Log in to the AWS Management Console and start evaluating your FMs with SageMaker Clarify today!

— Antje

Introducing Amazon SageMaker HyperPod, a purpose-built infrastructure for distributed training at scale

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/introducing-amazon-sagemaker-hyperpod-a-purpose-built-infrastructure-for-distributed-training-at-scale/

Today, we are introducing Amazon SageMaker HyperPod, which helps reducing time to train foundation models (FMs) by providing a purpose-built infrastructure for distributed training at scale. You can now use SageMaker HyperPod to train FMs for weeks or even months while SageMaker actively monitors the cluster health and provides automated node and job resiliency by replacing faulty nodes and resuming model training from a checkpoint.

The clusters come preconfigured with SageMaker’s distributed training libraries that help you split your training data and model across all the nodes to process them in parallel and fully utilize the cluster’s compute and network infrastructure. You can further customize your training environment by installing additional frameworks, debugging tools, and optimization libraries.

Let me show you how to get started with SageMaker HyperPod. In the following demo, I create a SageMaker HyperPod and show you how to train a Llama 2 7B model using the example shared in the AWS ML Training Reference Architectures GitHub repository.

Create and manage clusters
As the SageMaker HyperPod admin, you can create and manage clusters using the AWS Management Console or AWS Command Line Interface (AWS CLI). In the console, navigate to Amazon SageMaker, select Cluster management under HyperPod Clusters in the left menu, then choose Create a cluster.

Amazon SageMaker HyperPod Clusters

In the setup that follows, provide a cluster name and configure instance groups with your instance types of choice and the number of instances to allocate to each instance group.

Amazon SageMaker HyperPod

You also need to prepare and upload one or more lifecycle scripts to your Amazon Simple Storage Service (Amazon S3) bucket to run in each instance group during cluster creation. With lifecycle scripts, you can customize your cluster environment and install required libraries and packages. You can find example lifecycle scripts for SageMaker HyperPod in the GitHub repo.

Using the AWS CLI
You can also use the AWS CLI to create and manage clusters. For my demo, I specify my cluster configuration in a JSON file. I choose to create two instance groups, one for the cluster controller node(s) that I call “controller-group,” and one for the cluster worker nodes that I call “worker-group.” For the worker nodes that will perform model training, I specify Amazon EC2 Trn1 instances powered by AWS Trainium chips.

// demo-cluster.json
{
   "InstanceGroups": [
        {
            "InstanceGroupName": "controller-group",
            "InstanceType": "ml.m5.xlarge",
            "InstanceCount": 1,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        },
        {
            "InstanceGroupName": "worker-group",
            "InstanceType": "trn1.32xlarge",
            "InstanceCount": 4,
            "lifecycleConfig": {
                "SourceS3Uri": "s3://<your-s3-bucket>/<lifecycle-script-directory>/",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "arn:aws:iam::111122223333:role/my-role-for-cluster",
            "ThreadsPerCore": 1
        }
    ]
}

To create the cluster, I run the following AWS CLI command:

aws sagemaker create-cluster \
--cluster-name antje-demo-cluster \
--instance-groups file://demo-cluster.json

Upon creation, you can use aws sagemaker describe-cluster and aws sagemaker list-cluster-nodes to view your cluster and node details. Note down the cluster ID and instance ID of your controller node. You need that information to connect to your cluster.

You also have the option to attach a shared file system, such as Amazon FSx for Lustre. To use FSx for Lustre, you need to set up your cluster with an Amazon Virtual Private Cloud (Amazon VPC) configuration. Here’s an AWS CloudFormation template that shows how to create a SageMaker VPC and how to deploy FSx for Lustre.

Connect to your cluster
As a cluster user, you need to have access to the cluster provisioned by your cluster admin. With access permissions in place, you can connect to the cluster using SSH to schedule and run jobs. You can use the preinstalled AWS CLI plugin for AWS Systems Manager to connect to the controller node of your cluster.

For my demo, I run the following command specifying my cluster ID and instance ID of the control node as the target.

aws ssm start-session \
--target sagemaker-cluster:ntg44z9os8pn_i-05a854e0d4358b59c \
--region us-west-2

Schedule and run jobs on the cluster using Slurm
At launch, SageMaker HyperPod supports Slurm for workload orchestration. Slurm is a popular an open source cluster management and job scheduling system. You can install and set up Slurm through lifecycle scripts as part of the cluster creation. The example lifecycle scripts show how. Then, you can use the standard Slurm commands to schedule and launch jobs. Check out the Slurm Quick Start User Guide for architecture details and helpful commands.

For this demo, I’m using this example from the AWS ML Training Reference Architectures GitHub repo that shows how to train Llama 2 7B on Slurm with Trn1 instances. My cluster is already setup with Slurm, and I have an FSx for Lustre filesystem mounted.

Note
The Llama 2 model is governed by Meta. You can request access through the Meta request access page.

Set up the cluster environment
SageMaker HyperPod supports training in a range of environments, including Conda, venv, Docker, and enroot. Following the instructions in the README, I build my virtual environment aws_neuron_venv_pytorch and set up the torch_neuronx and neuronx-nemo-megatron libraries for training models on Trn1 instances.

Prepare model, tokenizer, and dataset
I follow the instructions to download the Llama 2 model and tokenizer and convert the model into the Hugging Face format. Then, I download and tokenize the RedPajama dataset. As a final preparation step, I pre-compile the Llama 2 model using ahead-of-time (AOT) compilation to speed up model training.

Launch jobs on the cluster
Now, I’m ready to start my model training job using the sbatch command.

sbatch --nodes 4 --auto-resume=1 run.slurm ./llama_7b.sh

You can use the squeue command to view the job queue. Once the training job is running, the SageMaker HyperPod resiliency features are automatically enabled. SageMaker HyperPod will automatically detect hardware failures, replace nodes as needed, and resume training from checkpoints if the auto-resume parameter is set, as shown in the preceding command.

You can view the output of the model training job in the following file:

tail -f slurm-run.slurm-<JOB_ID>.out

A sample output indicating that model training has started will look like this:

Epoch 0:  22%|██▏       | 4499/20101 [22:26:14<77:48:37, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.43, v_num=5563, reduced_train_loss=2.470, gradient_norm=0.121, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.40]
Epoch 0:  22%|██▏       | 4500/20101 [22:26:32<77:48:18, 17.95s/it, loss=2.44, v_num=5563, reduced_train_loss=2.450, gradient_norm=0.120, parameter_norm=1864.0, global_step=4512.0, consumed_samples=1.16e+6, iteration_time=16.50]

To further monitor and profile your model training jobs, you can use SageMaker hosted TensorBoard or any other tool of your choice.

Now available
SageMaker HyperPod is available today in AWS Regions US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more:

— Antje

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Brad Doran, Justin Pirtle, Ben Snyder, Pierre-Yves Aquilanti, Keita Watanabe, and Verdi March for their generous help with example code and sharing their expertise in managing large-scale model training infrastructures, Slurm, and SageMaker HyperPod.

How to improve your security incident response processes with Jupyter notebooks

Post Syndicated from Tim Manik original https://aws.amazon.com/blogs/security/how-to-improve-your-security-incident-response-processes-with-jupyter-notebooks/

Customers face a number of challenges to quickly and effectively respond to a security event. To start, it can be difficult to standardize how to respond to a partic­ular security event, such as an Amazon GuardDuty finding. Additionally, silos can form with reliance on one security analyst who is designated to perform certain tasks, such as investigate all GuardDuty findings. Jupyter notebooks can help you address these challenges by simplifying both standardization and collaboration.

Jupyter Notebook is an open-source, web-based application to run and document code. Although Jupyter notebooks are most frequently used for data science and machine learning, you can also use them to more efficiently and effectively investigate and respond to security events.

In this blog post, we will show you how to use Jupyter Notebook to investigate a security event. With this solution, you can automate the tasks of gathering data, presenting the data, and providing procedures and next steps for the findings.

Benefits of using Jupyter notebooks for security incident response

The following are some ways that you can use Jupyter notebooks for security incident response:

  • Develop readable code for analysts – Within a notebook, you can combine markdown text and code cells to improve readability. Analysts can read context around the code cell, run the code cell, and analyze the results within the notebook.
  • Standardize analysis and response – You can reuse notebooks after the initial creation. This makes it simpler for you to standardize your incident response processes for how to respond to a certain type of security event. Additionally, you can use notebooks to achieve repeatable responses. You can rerun an entire notebook or a specific cell.
  • Collaborate and share incident response knowledge – After you create a Jupyter notebook, you can share it with peers to more seamlessly collaborate and share knowledge, which helps reduce silos and reliance on certain analysts.
  • Iterate on your incident response playbooks – Developing a security incident response program involves continuous iteration. With Jupyter notebooks, you can start small and iterate on what you have developed. You can keep Jupyter notebooks under source code control by using services such as AWS CodeCommit. This allows you to approve and track changes to your notebooks.

Architecture overview

Figure 1: Architecture for incident response analysis

Figure 1: Architecture for incident response analysis

The architecture shown in Figure 1 consists of the foundational services required to analyze and contain security incidents on AWS. You create and access the playbooks through the Jupyter console that is hosted on Amazon SageMaker. Within the playbooks, you run several Amazon Athena queries against AWS CloudTrail logs hosted in Amazon Simple Storage Service (Amazon S3).

Solution implementation

To deploy the solution, you will complete the following steps:

  1. Deploy a SageMaker notebook instance
  2. Create an Athena table for your CloudTrail trail
  3. Grant AWS Lake Formation access
  4. Access the Credential Compromise playbooks by using JupyterLab

Step 1: Deploy a SageMaker notebook instance

You will host your Jupyter notebooks on a SageMaker notebook instance. We chose to use SageMaker instead of running the notebooks locally because SageMaker provides flexible compute, seamless integration with CodeCommit and GitHub, temporary credentials through AWS Identity and Access Management (IAM) roles, and lower latency for Athena queries.

You can deploy the SageMaker notebook instance by using the AWS CloudFormation template from our jupyter-notebook-for-incident-response GitHub repository. We recommend that you deploy SageMaker in your security tooling account or an equivalent.

The CloudFormation template deploys the following resources:

  • A SageMaker notebook instance to run the analysis notebooks. Because this is a proof of concept (POC), the deployed SageMaker instance is the smallest instance type available. However, within an enterprise environment, you will likely need a larger instance type.
  • An AWS Key Management Service (AWS KMS) key to encrypt the SageMaker notebook instance and protect sensitive data.
  • An IAM role that grants the SageMaker notebook permissions to query CloudTrail, VPC Flow Logs, and other log sources.
  • An IAM role that allows access to the pre-signed URL of the SageMaker notebook from only an allowlisted IP range.
  • A VPC configured for SageMaker with an internet gateway, NAT gateway, and VPC endpoints to access required AWS services securely. The internet gateway and NAT gateway provide internet access to install external packages.
  • An S3 bucket to store results for your Athena log queries—you will reference the S3 bucket in the next step.

Step 2: Create an Athena table for your CloudTrail trail

The solution uses Athena to query CloudTrail logs, so you need to create an Athena table for CloudTrail.

There are two main ways to create an Athena table for CloudTrail:

For either of these methods to create an Athena table, you need to provide the URI of an S3 bucket. For this blog post, use the URI of the S3 bucket that the CloudFormation template created in Step 1. To find the URI of the S3 bucket, see the Output section of the CloudFormation stack.

Step 3: Grant AWS Lake Formation access

If you don’t use AWS Lake Formation in your AWS environment, skip to Step 4. Otherwise, continue with the following instructions. Lake Formation is how data access control for your Athena tables is managed.

To grant permission to the Security Log database

  1. Open the Lake Formation console.
  2. Select the database that you created in Step 2 for your security logs. If you used the Security Analytics Bootstrap, then the table name is either security_analysis or a custom name that you provided—you can find the name in the CloudFormation stack. If you created the Athena table by using the CloudTrail console, then the database is named default.
  3. From the Actions dropdown, select Grant.
  4. In Grant data permissions, select IAM users and roles.
  5. Find the IAM role used by the SageMaker Notebook instance.
  6. In Database permissions, select Describe and then Grant.

To grant permission to the Security Log CloudTrail table

  1. Open the Lake Formation console.
  2. Select the database that you created in Step 2.
  3. Choose View Tables.
  4. Select CloudTrail. If you created VPC flow log and DNS log tables, select those, too.
  5. From the Actions dropdown, select Grant.
  6. In Grant data permissions, select IAM users and roles.
  7. Find the IAM role used by the SageMaker notebook instance.
  8. In Table permissions, select Describe and then Grant.

Step 4: Access the Credential Compromise playbooks by using JupyterLab

The CloudFormation template clones the jupyter-notebook-for-incident-response GitHub repo into your Jupyter workspace.

You can access JupyterLab hosted on your SageMaker notebook instance by following the steps in the Access Notebook Instances documentation.

Your folder structure should match that shown in Figure 2. The parent folder should be jupyter-notebook-for-incident-response, and the child folders should be playbooks and cfn-templates.

Figure 2: Folder structure after GitHub repo is cloned to the environment

Figure 2: Folder structure after GitHub repo is cloned to the environment

Sample investigation of a spike in failed login attempts

In the following sections, you will use the Jupyter notebook that we created to investigate a scenario where failed login attempts have spiked. We designed this notebook to guide you through the process of gathering more information about the spike.

We discuss the important components of these notebooks so that you can use the framework to create your own playbooks. We encourage you to build on top of the playbook, and add additional queries and steps in the playbook to customize it for your organization’s specific business and security goals.

For this blog post, we will focus primarily on the analysis phase of incident response and walk you through how you can use Jupyter notebooks to help with this phase.

Before you get started with the following steps, open the credential-compromise-analysis.ipynb notebook in your JupyterLab environment.

How to import Python libraries and set environment variables

The notebooks require that you have the following Python libraries:

  • Boto3 – to interact with AWS services through API calls
  • Pandas – to visualize the data
  • PyAthena – to simplify the code to connect to Athena

To install the required Python libraries, in the Setup section of the notebook, under Load libraries, edit the variables in the two code cells as follows:

  • region – specify the AWS Region that you want your AWS API commands to run in (for example, us-east-1).
  • athena_bucket – specify the S3 bucket URI that is configured to store your Athena queries. You can find this information at Athena > Query Editor > Settings > Query result location.
  • db_name – specify the database used by Athena that contains your Athena table for CloudTrail.
Figure 3: Load the Python libraries in the notebook

Figure 3: Load the Python libraries in the notebook

This helps ensure that subsequent code cells that run are configured to run in your environment.

Run each code cell by choosing the cell and pressing SHIFT+ENTER or by choosing the play button (▶) in the toolbar at the top of the console.

How to set up the helper function for Athena

The Python query_results function, shown in the following figure, helps you query Athena tables. Run this code cell. You will use the query_results function later in the 2.0 IAM Investigation section of the notebook.

Figure 4: Code cell for the helper function to query with Athena

Figure 4: Code cell for the helper function to query with Athena

Credential Compromise Analysis Notebook

The credential-compromise-analysis.ipynb notebook includes several prebuilt queries to help you start your investigation of a potentially compromised credential. In this post, we discuss three of these queries:

  • The first query provides a broad view by retrieving the CloudTrail events related to authorization failures. By reviewing these results, you get baseline information about where users and roles are attempting to access resources or take actions without having the proper permissions.
  • The second query narrows the focus by identifying the top five IAM entities (such as users, roles, and identities) that are causing most of the authorization failures. Frequent failures from specific entities often indicate that their credentials are compromised.
  • The third query zooms in on one of the suspicious entities from the previous query. It retrieves API activity and events initiated by that entity across AWS services or resource. Analyzing actions performed by a suspicious entity can reveal if valid permissions are being misused or if the entity is systematically trying to access resources it doesn’t have access to.

Investigate authorization failures

The notebook has markdown cells that provide a description of the expected result of the query. The next cell contains the query statement. The final cell calls the query_result function to run your query by using Athena and display your results in tabular format.

In query 2.1, you query for specific error codes such as AccessDenied, and filter for anything that is an IAM entity by looking for useridentity.arn like ‘%iam%’. The notebook orders the entries by eventTime. If you want to look for specific IAM Identity Center entities, update the query to filter by useridentity.sessioncontext.sessionissuer.arn like ‘%sso.amazonaws.com%’.

This query retrieves a list of failed API calls to AWS services. From this list, you can gain additional insight into the context surrounding the spike in failed login attempts.

When you investigate denied API access requests, carefully examine details such as the user identity, timestamp, source IP address, and other metadata. This information helps you determine if the event is a legitimate threat or a false positive. Here are some specific questions to ask:

  • Does the IP address originate from within your network, or is it external? Internal addresses might be less concerning.
  • Is the access attempt occurring during normal working hours for that user? Requests outside of normal times might warrant more scrutiny.
  • What resources or changes is the user trying to access or make? Attempts to modify sensitive data or systems might indicate malicious intent.

By thoroughly evaluating the context around denied API calls, you can more accurately assess the risk they pose and whether you need to take further action. You can use the specifics in the logs to go beyond just the fact that access was denied, and learn the story of who, when, and why.

As shown in the following figure, the queries in the notebook use the following structure.

  1. Markdown cell to explain the purpose of the query (the query statement).
  2. Code cell to run the query and display the query results.

In the figure, the first code cell that runs stores the input for the query statement. After that finishes, the next code block displays the query results.

Figure 5: Run predefined Athena queries in JupyterLab

Figure 5: Run predefined Athena queries in JupyterLab

Figure 6 shows the output of the query that you ran in the 2.1 Investigation Authorization Failures section. It contains critical details for understanding the context around a denied API call:

  • The eventtime field shows the date and time that the request was completed.
  • The useridentity field reveals which IAM identity made a request.
  • The sourceipddress provides the IP address that the request was made from.
  • The useragent shows which client or app was used to make the call.
Figure 6: Results from the first investigative query

Figure 6: Results from the first investigative query

Figure 6 only shows a subset of the many details captured in CloudTrail logs. By scrolling to the right in the query output, you can view additional attributes that provide further context around the event. The CloudTrail record contents guide contains a comprehensive list of the fields included in the logs, along with descriptions of each attribute.

Often, you will need to search for more information to determine if remediation is necessary. For this reason, we have included additional queries to help you further examine the sequence of events leading up to the failed login attempt spike and after the spike occurred.

Triaging suspicious entities (Queries 2.2 and 2.3)

By running the second and third queries you can dig deeper into anomalous authorization failures. As shown in Figure 7, Query 2.2 provides the top five IAM entities with the most frequent access denials. This highlights the specific users, roles, and identities causing the most failures, which indicates potentially compromised credentials.

Query 2.3 takes the investigation further by isolating the activity from one suspicious entity. Retrieving the actions attempted by a single problematic user or role reveals useful context to determine if you need to revoke credentials. For example, is the entity probing resources that it shouldn’t have access to? Are there unusual API calls outside of normal hours? By scrutinizing an entity’s full history, you can make an informed decision on remediation.

Figure 7: Overview of queries 2.2 and 2.3

Figure 7: Overview of queries 2.2 and 2.3

You can use these two queries together to triage authorization failures: query 2 identifies high-risk entities, and query 3 gathers intelligence to drive your response. This progression from a macro view to a micro view is crucial for transforming signals into action.

Although log analysis relies on automation and queries to facilitate insights, human judgment is essential to interpret these signals and determine the appropriate response. You should discuss flagged events with stakeholders and resource owners to benefit from their domain expertise. You can export the results of your analysis by exporting your Jupyter notebook.

By collaborating with other people, you can gather contextual clues that might not be captured in the raw data. For example, an owner might confirm that a suspicious login time is expected for users in a certain time zone. By pairing automated detection with human perspectives, you can accurately assess risk and decide if credential revocation or other remediation is truly warranted. Uptime or downtime technical issues alone can’t dictate if remediation is necessary—the human element provides pivotal context.

Build your own queries

In addition to the existing queries, you can run your own queries and include them in your copy of the Credential-compromise-analysis.ipynb notebook. The AWS Security Analytics Bootstrap contains a library of common Athena queries for CloudTrail. We recommend that you review these queries before you start to build your own queries. The key takeaway is that these notebooks are highly customizable. You can use the Jupyter Notebook application to help meet the specific incident response requirements of your organization.

Contain compromised IAM entities

If the investigation reveals that a compromised IAM entity requires containment, follow these steps to revoke access:

  • For federated users, revoke their active AWS sessions according to the guidance in How to revoke federated users’ active AWS sessions. This uses IAM policies and AWS Organizations service control policies (SCPs) to revoke access to assumed roles.
  • Avoid using long-lived IAM credentials such as access keys. Instead, use temporary credentials through IAM roles. However, if you detect a compromised access key, immediately rotate or deactivate it by following the guidance in What to Do If You Inadvertently Expose an AWS Access Key. Review the permissions granted to the compromised IAM entity and consider if these permissions should be reduced after access is restored. Overly permissive policies might have enabled broader access for the threat actor.

Going forward, implement least privilege access and monitor authorization activity to detect suspicious behavior. By quickly containing compromised entities and proactively improving IAM hygiene, you can minimize the adversaries’ access duration and prevent further unauthorized access.

Additional considerations

In addition to querying CloudTrail, you can use Athena to query other logs, such as VPC Flow Logs and Amazon Route 53 DNS logs. You can also use Amazon Security Lake, which is generally available, to automatically centralize security data from AWS environments, SaaS providers, on-premises environments, and cloud sources into a purpose-built data lake stored in your account. To better understand which logs to collect and analyze as part of your incident response process, see Logging strategies for security incident response.

We recommended that you understand the playbook implementation described in this blog post before you expand the scope of your incident response solution. The running of queries and automation of containment are two elements to consider as you think about the next steps to evolve your incident response processes.

Conclusion

In this blog post, we showed how you can use Jupyter notebooks to simplify and standardize your incident response processes. You reviewed how to respond to a potential credential compromise incident using a Jupyter notebook style playbook. You also saw how this helps reduce the time to resolution and standardize the analysis and response. Finally, we presented several artifacts and recommendations showing how you can tailor this solution to meet your organization’s specific security needs. You can use this framework to evolve your incident response process.

Further resources

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS re:Post or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Tim Manik

Tim Manik

Tim is a Solutions Architect at AWS working with enterprise customers in North America. He specializes in cybersecurity and AI/ML. When he’s not working, you can find Tim exploring new hiking trails, swimming, or playing the guitar.

Daria Pshonkina

Daria Pshonkina

Daria is a Solutions Architect at AWS supporting enterprise customers that started their business journey in the cloud. Her specialization is in security. Outside of work, she enjoys outdoor activities, travelling, and spending quality time with friends and family.

Bryant Pickford

Bryant Pickford

Bryant is a Security Specialist Solutions Architect within the Prototyping Security team under the Worldwide Specialist Organization. He has experience in incident response, threat detection, and perimeter security. In his free time, he likes to DJ, make music, and skydive.

Create, train, and deploy Amazon Redshift ML model integrating features from Amazon SageMaker Feature Store

Post Syndicated from Anirban Sinha original https://aws.amazon.com/blogs/big-data/create-train-and-deploy-amazon-redshift-ml-model-integrating-features-from-amazon-sagemaker-feature-store/

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Data analysts and database developers want to use this data to train machine learning (ML) models, which can then be used to generate insights on new data for use cases such as forecasting revenue, predicting customer churn, and detecting anomalies. Amazon Redshift ML makes it easy for SQL users to create, train, and deploy ML models using SQL commands familiar to many roles such as executives, business analysts, and data analysts. We covered in a previous post how you can use data in Amazon Redshift to train models in Amazon SageMaker, a fully managed ML service, and then make predictions within your Redshift data warehouse.

Redshift ML currently supports ML algorithms such as XGBoost, multilayer perceptron (MLP), KMEANS, and Linear Learner. Additionally, you can import existing SageMaker models into Amazon Redshift for in-database inference or remotely invoke a SageMaker endpoint.

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for ML models. However, one challenge in training a production-ready ML model using SageMaker Feature Store is access to a diverse set of features that aren’t always owned and maintained by the team that is building the model. For example, an ML model to identify fraudulent financial transactions needs access to both identifying (device type, browser) and transaction (amount, credit or debit, and so on) related features. As a data scientist building an ML model, you may have access to the identifying information but not the transaction information, and having access to a feature store solves this.

In this post, we discuss the combined feature store pattern, which allows teams to maintain their own local feature stores using a local Redshift table while still being able to access shared features from the centralized feature store. In a local feature store, you can store sensitive data that can’t be shared across the organization for regulatory and compliance reasons.

We also show you how to use familiar SQL statements to create and train ML models by combining shared features from the centralized store with local features and use these models to make in-database predictions on new data for use cases such as fraud risk scoring.

Overview of solution

For this post, we create an ML model to predict if a transaction is fraudulent or not, given the transaction record. To build this, we need to engineer features that describe an individual credit card’s spending pattern, such as the number of transactions or the average transaction amount, and also information about the merchant, the cardholder, the device used to make the payment, and any other data that may be relevant to detecting fraud.

To get started, we need an Amazon Redshift Serverless data warehouse with the Redshift ML feature enabled and an Amazon SageMaker Studio environment with access to SageMaker Feature Store. For an introduction to Redshift ML and instructions on setting it up, see Create, train, and deploy machine learning models in Amazon Redshift using SQL with Amazon Redshift ML.

We also need an offline feature store to store features in feature groups. The offline store uses an Amazon Simple Storage Service (Amazon S3) bucket for storage and can also fetch data using Amazon Athena queries. For an introduction to SageMaker Feature Store and instructions on setting it up, see Getting started with Amazon SageMaker Feature Store.

The following diagram illustrates solution architecture.

The workflow contains the following steps:

  1. Create the offline feature group in SageMaker Feature Store and ingest data into the feature group.
  2. Create a Redshift table and load local feature data into the table.
  3. Create an external schema for Amazon Redshift Spectrum to access the offline store data stored in Amazon S3 using the AWS Glue Data Catalog.
  4. Train and validate a fraud risk scoring ML model using local feature data and external offline feature store data.
  5. Use the offline feature store and local store for inference.

Dataset

To demonstrate this use case, we use a synthetic dataset with two tables: identity and transactions. They can both be joined by the TransactionID column. The transaction table contains information about a particular transaction, such as amount, credit or debit card, and so on, and the identity table contains information about the user, such as device type and browser. The transaction must exist in the transaction table, but might not always be available in the identity table.

The following is an example of the transactions dataset.

The following is an example of the identity dataset.

Let’s assume that across the organization, data science teams centrally manage the identity data and process it to extract features in a centralized offline feature store. The data warehouse team ingests and analyzes transaction data in a Redshift table, owned by them.

We work through this use case to understand how the data warehouse team can securely retrieve the latest features from the identity feature group and join it with transaction data in Amazon Redshift to create a feature set for training and inferencing a fraud detection model.

Create the offline feature group and ingest data

To start, we set up SageMaker Feature Store, create a feature group for the identity dataset, inspect and process the dataset, and ingest some sample data. We then prepare the transaction features from the transaction data and store it in Amazon S3 for further loading into the Redshift table.

Alternatively, you can author features using Amazon SageMaker Data Wrangler, create feature groups in SageMaker Feature Store, and ingest features in batches using an Amazon SageMaker Processing job with a notebook exported from SageMaker Data Wrangler. This mode allows for batch ingestion into the offline store.

Let’s explore some of the key steps in this section.

  1. Download the sample notebook.
  2. On the SageMaker console, under Notebook in the navigation pane, choose Notebook instances.
  3. Locate your notebook instance and choose Open Jupyter.
  4. Choose Upload and upload the notebook you just downloaded.
  5. Open the notebook sagemaker_featurestore_fraud_redshiftml_python_sdk.ipynb.
  6. Follow the instructions and run all the cells up to the Cleanup Resources section.

The following are key steps from the notebook:

  1. We create a Pandas DataFrame with the initial CSV data. We apply feature transformations for this dataset.
    identity_data = pd.read_csv(io.BytesIO(identity_data_object["Body"].read()))
    transaction_data = pd.read_csv(io.BytesIO(transaction_data_object["Body"].read()))
    
    identity_data = identity_data.round(5)
    transaction_data = transaction_data.round(5)
    
    identity_data = identity_data.fillna(0)
    transaction_data = transaction_data.fillna(0)
    
    # Feature transformations for this dataset are applied 
    # One hot encode card4, card6
    encoded_card_bank = pd.get_dummies(transaction_data["card4"], prefix="card_bank")
    encoded_card_type = pd.get_dummies(transaction_data["card6"], prefix="card_type")
    
    transformed_transaction_data = pd.concat(
        [transaction_data, encoded_card_type, encoded_card_bank], axis=1
    )

  2. We store the processed and transformed transaction dataset in an S3 bucket. This transaction data will be loaded later in the Redshift table for building the local feature store.
    transformed_transaction_data.to_csv("transformed_transaction_data.csv", header=False, index=False)
    s3_client.upload_file("transformed_transaction_data.csv", default_s3_bucket_name, prefix + "/training_input/transformed_transaction_data.csv")

  3. Next, we need a record identifier name and an event time feature name. In our fraud detection example, the column of interest is TransactionID.EventTime can be appended to your data when no timestamp is available. In the following code, you can see how these variables are set, and then EventTime is appended to both features’ data.
    # record identifier and event time feature names
    record_identifier_feature_name = "TransactionID"
    event_time_feature_name = "EventTime"
    
    # append EventTime feature
    identity_data[event_time_feature_name] = pd.Series(
        [current_time_sec] * len(identity_data), dtype="float64"
    )

  4. We then create and ingest the data into the feature group using the SageMaker SDK FeatureGroup.ingest API. This is a small dataset and therefore can be loaded into a Pandas DataFrame. When we work with large amounts of data and millions of rows, there are other scalable mechanisms to ingest data into SageMaker Feature Store, such as batch ingestion with Apache Spark.
    identity_feature_group.create(
        s3_uri=<S3_Path_Feature_Store>,
        record_identifier_name=record_identifier_feature_name,
        event_time_feature_name=event_time_feature_name,
        role_arn=<role_arn>,
        enable_online_store=False,
    )
    
    identity_feature_group_name = "identity-feature-group"
    
    # load feature definitions to the feature group. SageMaker FeatureStore Python SDK will auto-detect the data schema based on input data.
    identity_feature_group.load_feature_definitions(data_frame=identity_data)
    identity_feature_group.ingest(data_frame=identity_data, max_workers=3, wait=True)
    

  5. We can verify that data has been ingested into the feature group by running Athena queries in the notebook or running queries on the Athena console.

At this point, the identity feature group is created in an offline feature store with historical data persisted in Amazon S3. SageMaker Feature Store automatically creates an AWS Glue Data Catalog for the offline store, which enables us to run SQL queries against the offline data using Athena or Redshift Spectrum.

Create a Redshift table and load local feature data

To build a Redshift ML model, we build a training dataset joining the identity data and transaction data using SQL queries. The identity data is in a centralized feature store where the historical set of records are persisted in Amazon S3. The transaction data is a local feature for training data that needs to made available in the Redshift table.

Let’s explore how to create the schema and load the processed transaction data from Amazon S3 into a Redshift table.

  1. Create the customer_transaction table and load daily transaction data into the table, which you’ll use to train the ML model:
    DROP TABLE customer_transaction;
    CREATE TABLE customer_transaction (
      TransactionID INT,    
      isFraud INT,  
      TransactionDT INT,    
      TransactionAmt decimal(10,2), 
      card1 INT,    
      card2 decimal(10,2),card3 decimal(10,2),  
      card4 VARCHAR(20),card5 decimal(10,2),    
      card6 VARCHAR(20),    
      B1 INT,B2 INT,B3 INT,B4 INT,B5 INT,B6 INT,
      B7 INT,B8 INT,B9 INT,B10 INT,B11 INT,B12 INT,
      F1 INT,F2 INT,F3 INT,F4 INT,F5 INT,F6 INT,
      F7 INT,F8 INT,F9 INT,F10 INT,F11 INT,F12 INT,
      F13 INT,F14 INT,F15 INT,F16 INT,F17 INT,  
      N1 VARCHAR(20),N2 VARCHAR(20),N3 VARCHAR(20), 
      N4 VARCHAR(20),N5 VARCHAR(20),N6 VARCHAR(20), 
      N7 VARCHAR(20),N8 VARCHAR(20),N9 VARCHAR(20), 
      card_type_0  boolean,
      card_type_credit boolean,
      card_type_debit  boolean,
      card_bank_0  boolean,
      card_bank_american_express boolean,
      card_bank_discover  boolean,
      card_bank_mastercard  boolean,
      card_bank_visa boolean  
    );

  2. Load the sample data by using the following command. Replace your Region and S3 path as appropriate. You will find the S3 path in the S3 Bucket Setup For The OfflineStore section in the notebook or by checking the dataset_uri_prefix in the notebook.
    COPY customer_transaction
    FROM '<s3path>/transformed_transaction_data.csv' 
    IAM_ROLE default delimiter ',' 
    region 'your-region';

Now that we have created a local feature store for the transaction data, we focus on integrating a centralized feature store with Amazon Redshift to access the identity data.

Create an external schema for Redshift Spectrum to access the offline store data

We have created a centralized feature store for identity features, and we can access this offline feature store using services such as Redshift Spectrum. When the identity data is available through the Redshift Spectrum table, we can create a training dataset with feature values from the identity feature group and customer_transaction, joining on the TransactionId column.

This section provides an overview of how to enable Redshift Spectrum to query data directly from files on Amazon S3 through an external database in an AWS Glue Data Catalog.

  1. First, check that the identity-feature-group table is present in the Data Catalog under the sagemamker_featurestore database.
  2. Using Redshift Query Editor V2, create an external schema using the following command:
    CREATE EXTERNAL SCHEMA sagemaker_featurestore
    FROM DATA CATALOG
    DATABASE 'sagemaker_featurestore'
    IAM_ROLE default
    create external database if not exists;

All the tables, including identity-feature-group external tables, are visible under the sagemaker_featurestore external schema. In Redshift Query Editor v2, you can check the contents of the external schema.

  1. Run the following query to sample a few records—note that your table name may be different:
    Select * from sagemaker_featurestore.identity_feature_group_1680208535 limit 10;

  2. Create a view to join the latest data from identity-feature-group and customer_transaction on the TransactionId column. Be sure to change the external table name to match your external table name:
    create or replace view public.credit_fraud_detection_v
    AS select  "isfraud",
            "transactiondt",
            "transactionamt",
            "card1","card2","card3","card5",
             case when "card_type_credit" = 'False' then 0 else 1 end as card_type_credit,
             case when "card_type_debit" = 'False' then 0 else 1 end as card_type_debit,
             case when "card_bank_american_express" = 'False' then 0 else 1 end as card_bank_american_express,
             case when "card_bank_discover" = 'False' then 0 else 1 end as card_bank_discover,
             case when "card_bank_mastercard" = 'False' then 0 else 1 end as card_bank_mastercard,
             case when "card_bank_visa" = 'False' then 0 else 1 end as card_bank_visa,
            "id_01","id_02","id_03","id_04","id_05"
    from public.customer_transaction ct left join sagemaker_featurestore.identity_feature_group_1680208535 id
    on id.transactionid = ct.transactionid with no schema binding;

Train and validate the fraud risk scoring ML model

Redshift ML gives you the flexibility to specify your own algorithms and model types and also to provide your own advanced parameters, which can include preprocessors, problem type, and hyperparameters. In this post, we create a customer model by specifying AUTO OFF and the model type of XGBOOST. By turning AUTO OFF and using XGBoost, we are providing the necessary inputs for SageMaker to train the model. A benefit of this can be faster training times. XGBoost is as open-source version of the gradient boosted trees algorithm. For more details on XGBoost, refer to Build XGBoost models with Amazon Redshift ML.

We train the model using 80% of the dataset by filtering on transactiondt < 12517618. The other 20% will be used for inference. A centralized feature store is useful in providing the latest supplementing data for training requests. Note that you will need to provide an S3 bucket name in the create model statement. It will take approximately 10 minutes to create the model.

CREATE MODEL frauddetection_xgboost
FROM (select  "isfraud",
        "transactiondt",
        "transactionamt",
        "card1","card2","card3","card5",
        "card_type_credit",
        "card_type_debit",
        "card_bank_american_express",
        "card_bank_discover",
        "card_bank_mastercard",
        "card_bank_visa",
        "id_01","id_02","id_03","id_04","id_05"
from credit_fraud_detection_v where transactiondt < 12517618
)
TARGET isfraud
FUNCTION ml_fn_frauddetection_xgboost
IAM_ROLE default
AUTO OFF
MODEL_TYPE XGBOOST
OBJECTIVE 'binary:logistic'
PREPROCESSORS 'none'
HYPERPARAMETERS DEFAULT EXCEPT(NUM_ROUND '100')
SETTINGS (S3_BUCKET <s3_bucket>);

When you run the create model command, it will complete quickly in Amazon Redshift while the model training is happening in the background using SageMaker. You can check the status of the model by running a show model command:

show model frauddetection_xgboost;

The output of the show model command shows that the model state is TRAINING. It also shows other information such as the model type and the training job name that SageMaker assigned.
After a few minutes, we run the show model command again:

show model frauddetection_xgboost;

Now the output shows the model state is READY. We can also see the train:error score here, which at 0 tells us we have a good model. Now that the model is trained, we can use it for running inference queries.

Use the offline feature store and local store for inference

We can use the SQL function to apply the ML model to data in queries, reports, and dashboards. Let’s use the function ml_fn_frauddetection_xgboost created by our model against our test dataset by filtering where transactiondt >=12517618, to predict whether a transaction is fraudulent or not. SageMaker Feature Store can be useful in supplementing data for inference requests.

Run the following query to predict whether transactions are fraudulent or not:

select  "isfraud" as "Actual",
        ml_fn_frauddetection_xgboost(
        "transactiondt",
        "transactionamt",
        "card1","card2","card3","card5",
        "card_type_credit",
        "card_type_debit",
        "card_bank_american_express",
        "card_bank_discover",
        "card_bank_mastercard",
        "card_bank_visa",
        "id_01","id_02","id_03","id_04","id_05") as "Predicted"
from credit_fraud_detection_v where transactiondt >= 12517618;

For binary and multi-class classification problems, we compute the accuracy as the model metric. Accuracy can be calculated based on the following:

accuracy = (sum (actual == predicted)/total) *100

Let’s apply the preceding code to our use case to find the accuracy of the model. We use the test data (transactiondt >= 12517618) to test the accuracy, and use the newly created function ml_fn_frauddetection_xgboost to predict and take the columns other than the target and label as the input:

-- check accuracy 
WITH infer_data AS (
SELECT "isfraud" AS label,
ml_fn_frauddetection_xgboost(
        "transactiondt",
        "transactionamt",
        "card1","card2","card3","card5",
        "card_type_credit",
        "card_type_debit",
        "card_bank_american_express",
        "card_bank_discover",
        "card_bank_mastercard",
        "card_bank_visa",
        "id_01","id_02","id_03","id_04","id_05") AS predicted,
CASE 
   WHEN label IS NULL
       THEN 0
   ELSE label
   END AS actual,
CASE 
   WHEN actual = predicted
       THEN 1::INT
   ELSE 0::INT
   END AS correct
FROM credit_fraud_detection_v where transactiondt >= 12517618),
aggr_data AS (
SELECT SUM(correct) AS num_correct,
COUNT(*) AS total
FROM infer_data) 

SELECT (num_correct::FLOAT / total::FLOAT) AS accuracy FROM aggr_data;

Clean up

As a final step, clean up the resources:

  1. Delete the Redshift cluster.
  2. Run the Cleanup Resources section of your notebook.

Conclusion

Redshift ML enables you to bring machine learning to your data, powering fast and informed decision-making. SageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams.

In this post, we showed how you can train an XGBoost model using Redshift ML with data spread across SageMaker Feature Store and a Redshift table. Additionally, we showed how you can make inferences on a trained model to detect fraud using Amazon Redshift SQL commands.


About the authors

Anirban Sinha is a Senior Technical Account Manager at AWS. He is passionate about building scalable data warehouses and big data solutions working closely with customers. He works with large ISVs customers, in helping them build and operate secure, resilient, scalable, and high-performance SaaS applications in the cloud.

Phil Bates is a Senior Analytics Specialist Solutions Architect at AWS. He has more than 25 years of experience implementing large-scale data warehouse solutions. He is passionate about helping customers through their cloud journey and using the power of ML within their data warehouse.

Gaurav Singh is a Senior Solutions Architect at AWS, specializing in AI/ML and Generative AI. Based in Pune, India, he focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. In his spare time, Gaurav loves to explore nature, read, and run.

Unstructured data management and governance using AWS AI/ML and analytics services

Post Syndicated from Sakti Mishra original https://aws.amazon.com/blogs/big-data/unstructured-data-management-and-governance-using-aws-ai-ml-and-analytics-services/

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Unstructured information may have a little or a lot of structure but in ways that are unexpected or inconsistent. Text, images, audio, and videos are common examples of unstructured data. Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data.

In this post, we discuss how AWS can help you successfully address the challenges of extracting insights from unstructured data. We discuss various design patterns and architectures for extracting and cataloging valuable insights from unstructured data using AWS. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

Why it’s challenging to process and manage unstructured data

Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). Understanding the data, categorizing it, storing it, and extracting insights from it can be challenging. In addition, identifying incremental changes requires specialized patterns and detecting sensitive data and meeting compliance requirements calls for sophisticated functions. It can be difficult to integrate unstructured data with structured data from existing information systems. Some view structured and unstructured data as apples and oranges, instead of being complementary. But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly.

Solution overview

Data and metadata discovery is one of the primary requirements in data analytics, where data consumers explore what data is available and in what format, and then consume or query it for analysis. If you can apply a schema on top of the dataset, then it’s straightforward to query because you can load the data into a database or impose a virtual table schema for querying. But in the case of unstructured data, metadata discovery is challenging because the raw data isn’t easily readable.

You can integrate different technologies or tools to build a solution. In this post, we explain how to integrate different AWS services to provide an end-to-end solution that includes data extraction, management, and governance.

The solution integrates data in three tiers. The first is the raw input data that gets ingested by source systems, the second is the output data that gets extracted from input data using AI, and the third is the metadata layer that maintains a relationship between them for data discovery.

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store.

Unstructured Data Management - Block Level Architecture Diagram

The steps of the workflow are as follows:

  1. Integrated AI services extract data from the unstructured data.
  2. These services write the output to a data lake.
  3. A metadata layer helps build the relationship between the raw data and AI extracted output. When the data and metadata are available for end-users, we can break the user access pattern into additional steps.
  4. In the metadata catalog discovery step, we can use query engines to access the metadata for discovery and apply filters as per our analytics needs. Then we move to the next stage of accessing the actual data extracted from the raw unstructured data.
  5. The end-user accesses the output of the AI services and uses the query engines to query the structured data available in the data lake. We can optionally integrate additional tools that help control access and provide governance.
  6. There might be scenarios where, after accessing the AI extracted output, the end-user wants to access the original raw object (such as media files) for further analysis. Additionally, we need to make sure we have access control policies so the end-user has access only to the respective raw data they want to access.

Now that we understand the high-level architecture, let’s discuss what AWS services we can integrate in each step of the architecture to provide an end-to-end solution.

The following diagram is the enhanced version of our solution architecture, where we have integrated AWS services.

Unstructured Data Management - AWS Native Architecture

Let’s understand how these AWS services are integrated in detail. We have divided the steps into two broad user flows: data processing and metadata enrichment (Steps 1–3) and end-users accessing the data and metadata with fine-grained access control (Steps 4–6).

  1. Various AI services (which we discuss in the next section) extract data from the unstructured datasets.
  2. The output is written to an Amazon Simple Storage Service (Amazon S3) bucket (labeled Extracted JSON in the preceding diagram). Optionally, we can restructure the input raw objects for better partitioning, which can help while implementing fine-grained access control on the raw input data (labeled as the Partitioned bucket in the diagram).
  3. After the initial data extraction phase, we can apply additional transformations to enrich the datasets using AWS Glue. We also build an additional metadata layer, which maintains a relationship between the raw S3 object path, the AI extracted output path, the optional enriched version S3 path, and any other metadata that will help the end-user discover the data.
  4. In the metadata catalog discovery step, we use the AWS Glue Data Catalog as the technical catalog, Amazon Athena and Amazon Redshift Spectrum as query engines, AWS Lake Formation for fine-grained access control, and Amazon DataZone for additional governance.
  5. The AI extracted output is expected to be available as a delimited file or in JSON format. We can create an AWS Glue Data Catalog table for querying using Athena or Redshift Spectrum. Like the previous step, we can use Lake Formation policies for fine-grained access control.
  6. Lastly, the end-user accesses the raw unstructured data available in Amazon S3 for further analysis. We have proposed integrating Amazon S3 Access Points for access control at this layer. We explain this in detail later in this post.

Now let’s expand the following parts of the architecture to understand the implementation better:

  • Using AWS AI services to process unstructured data
  • Using S3 Access Points to integrate access control on raw S3 unstructured data

Process unstructured data with AWS AI services

As we discussed earlier, unstructured data can come in a variety of formats, such as text, audio, video, and images, and each type of data requires a different approach for extracting metadata. AWS AI services are designed to extract metadata from different types of unstructured data. The following are the most commonly used services for unstructured data processing:

  • Amazon Comprehend – This natural language processing (NLP) service uses ML to extract metadata from text data. It can analyze text in multiple languages, detect entities, extract key phrases, determine sentiment, and more. With Amazon Comprehend, you can easily gain insights from large volumes of text data such as extracting product entity, customer name, and sentiment from social media posts.
  • Amazon Transcribe – This speech-to-text service uses ML to convert speech to text and extract metadata from audio data. It can recognize multiple speakers, transcribe conversations, identify keywords, and more. With Amazon Transcribe, you can convert unstructured data such as customer support recordings into text and further derive insights from it.
  • Amazon Rekognition – This image and video analysis service uses ML to extract metadata from visual data. It can recognize objects, people, faces, and text, detect inappropriate content, and more. With Amazon Rekognition, you can easily analyze images and videos to gain insights such as identifying entity type (human or other) and identifying if the person is a known celebrity in an image.
  • Amazon Textract – You can use this ML service to extract metadata from scanned documents and images. It can extract text, tables, and forms from images, PDFs, and scanned documents. With Amazon Textract, you can digitize documents and extract data such as customer name, product name, product price, and date from an invoice.
  • Amazon SageMaker – This service enables you to build and deploy custom ML models for a wide range of use cases, including extracting metadata from unstructured data. With SageMaker, you can build custom models that are tailored to your specific needs, which can be particularly useful for extracting metadata from unstructured data that requires a high degree of accuracy or domain-specific knowledge.
  • Amazon Bedrock – This fully managed service offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API. It also offers a broad set of capabilities to build generative AI applications, simplifying development while maintaining privacy and security.

With these specialized AI services, you can efficiently extract metadata from unstructured data and use it for further analysis and insights. It’s important to note that each service has its own strengths and limitations, and choosing the right service for your specific use case is critical for achieving accurate and reliable results.

AWS AI services are available via various APIs, which enables you to integrate AI capabilities into your applications and workflows. AWS Step Functions is a serverless workflow service that allows you to coordinate and orchestrate multiple AWS services, including AI services, into a single workflow. This can be particularly useful when you need to process large amounts of unstructured data and perform multiple AI-related tasks, such as text analysis, image recognition, and NLP.

With Step Functions and AWS Lambda functions, you can create sophisticated workflows that include AI services and other AWS services. For instance, you can use Amazon S3 to store input data, invoke a Lambda function to trigger an Amazon Transcribe job to transcribe an audio file, and use the output to trigger an Amazon Comprehend analysis job to generate sentiment metadata for the transcribed text. This enables you to create complex, multi-step workflows that are straightforward to manage, scalable, and cost-effective.

The following is an example architecture that shows how Step Functions can help invoke AWS AI services using Lambda functions.

AWS AI Services - Lambda Event Workflow -Unstructured Data

The workflow steps are as follows:

  1. Unstructured data, such as text files, audio files, and video files, are ingested into the S3 raw bucket.
  2. A Lambda function is triggered to read the data from the S3 bucket and call Step Functions to orchestrate the workflow required to extract the metadata.
  3. The Step Functions workflow checks the type of file, calls the corresponding AWS AI service APIs, checks the job status, and performs any postprocessing required on the output.
  4. AWS AI services can be accessed via APIs and invoked as batch jobs. To extract metadata from different types of unstructured data, you can use multiple AI services in sequence, with each service processing the corresponding file type.
  5. After the Step Functions workflow completes the metadata extraction process and performs any required postprocessing, the resulting output is stored in an S3 bucket for cataloging.

Next, let’s understand how can we implement security or access control on both the extracted output as well as the raw input objects.

Implement access control on raw and processed data in Amazon S3

We just consider access controls for three types of data when managing unstructured data: the AI-extracted semi-structured output, the metadata, and the raw unstructured original files. When it comes to AI extracted output, it’s in JSON format and can be restricted via Lake Formation and Amazon DataZone. We recommend keeping the metadata (information that captures which unstructured datasets are already processed by the pipeline and available for analysis) open to your organization, which will enable metadata discovery across the organization.

To control access of raw unstructured data, you can integrate S3 Access Points and explore additional support in the future as AWS services evolve. S3 Access Points simplify data access for any AWS service or customer application that stores data in Amazon S3. Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations. Each access point has distinct permissions and network controls that Amazon S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket. With S3 Access Points, you can create unique access control policies for each access point to easily control access to specific datasets within an S3 bucket. This works well in multi-tenant or shared bucket scenarios where users or teams are assigned to unique prefixes within one S3 bucket.

An access point can support a single user or application, or groups of users or applications within and across accounts, allowing separate management of each access point. Every access point is associated with a single bucket and contains a network origin control and a Block Public Access control. For example, you can create an access point with a network origin control that only permits storage access from your virtual private cloud (VPC), a logically isolated section of the AWS Cloud. You can also create an access point with the access point policy configured to only allow access to objects with a defined prefix or to objects with specific tags. You can also configure custom Block Public Access settings for each access point.

The following architecture provides an overview of how an end-user can get access to specific S3 objects by assuming a specific AWS Identity and Access Management (IAM) role. If you have a large number of S3 objects to control access, consider grouping the S3 objects, assigning them tags, and then defining access control by tags.

S3 Access Points - Unstructured Data Management - Access Control

If you are implementing a solution that integrates S3 data available in multiple AWS accounts, you can take advantage of cross-account support for S3 Access Points.

Conclusion

This post explained how you can use AWS AI services to extract readable data from unstructured datasets, build a metadata layer on top of them to allow data discovery, and build an access control mechanism on top of the raw S3 objects and extracted data using Lake Formation, Amazon DataZone, and S3 Access Points.

In addition to AWS AI services, you can also integrate large language models with vector databases to enable semantic or similarity search on top of unstructured datasets. To learn more about how to enable semantic search on unstructured data by integrating Amazon OpenSearch Service as a vector database, refer to Try semantic search with the Amazon OpenSearch Service vector engine.

As of writing this post, S3 Access Points is one of the best solutions to implement access control on raw S3 objects using tagging, but as AWS service features evolve in the future, you can explore alternative options as well.


About the Authors

Sakti Mishra is a Principal Solutions Architect at AWS, where he helps customers modernize their data architecture and define their end-to-end data strategy, including data security, accessibility, governance, and more. He is also the author of the book Simplify Big Data Analytics with Amazon EMR. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places with family.

Bhavana Chirumamilla is a Senior Resident Architect at AWS with a strong passion for data and machine learning operations. She brings a wealth of experience and enthusiasm to help enterprises build effective data and ML strategies. In her spare time, Bhavana enjoys spending time with her family and engaging in various activities such as traveling, hiking, gardening, and watching documentaries.

Sheela Sonone is a Senior Resident Architect at AWS. She helps AWS customers make informed choices and trade-offs about accelerating their data, analytics, and AI/ML workloads and implementations. In her spare time, she enjoys spending time with her family—usually on tennis courts.

Daniel Bruno is a Principal Resident Architect at AWS. He had been building analytics and machine learning solutions for over 20 years and splits his time helping customers build data science programs and designing impactful ML products.

AWS Weekly Roundup: AWS Control Tower, Amazon Bedrock, Amazon OpenSearch Service, and More (October 9, 2023)

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-control-tower-amazon-bedrock-amazon-opensearch-service-and-more-october-9-2023/

Pumpkins

As the Northern Hemisphere enjoys early fall and pumpkins take over the local farmers markets and coffee flavors here in the United States, we’re also just 50 days away from re:Invent 2023! But before we officially enter pre:Invent sea­­son, let’s have a look at some of last week’s exciting news and announcements.

Last Week’s Launches
Here are some launches that got my attention:

AWS Control Tower – AWS Control Tower released 22 proactive controls and 10 AWS Security Hub detective controls to help you meet regulatory requirements and meet control objectives such as encrypting data in transit, encrypting data at rest, or using strong authentication. For more details and a list of controls, check out the AWS Control Tower user guide.

Amazon Bedrock – Just a week after Amazon Bedrock became available in AWS Regions US East (N. Virginia) and US West (Oregon), Amazon Bedrock is now also available in the Asia Pacific (Tokyo) AWS Region. To get started building and scaling generative AI applications with foundation models, check out the Amazon Bedrock documentation, explore the generative AI space at community.aws, and get hands-on with the Amazon Bedrock workshop.

Amazon OpenSearch Service – You can now run OpenSearch version 2.9 in Amazon OpenSearch Service with improvements to search, observability, security analytics, and machine learning (ML) capabilities. OpenSearch Service has expanded its geospatial aggregations support in version 2.9 to gather insights on high-level overview of trends and patterns and establish correlations within the data. OpenSearch Service 2.9 now also comes with OpenSearch Service Integrations to take advantage of new schema standards such as OpenTelemetry and supports managing and overlaying alerts and anomalies onto dashboard visualization line charts.

Amazon SageMakerSageMaker Feature Store now supports a fully managed, in-memory online store to help you retrieve features for model serving in real time for high throughput ML applications. The new online store is powered by ElastiCache for Redis, an in-memory data store built on open-source Redis. The SageMaker developer guide has all the details.

Also, SageMaker Model Registry added support for private model repositories. You can now register models that are stored in private Docker repositories and track all your models across multiple private AWS and non-AWS model repositories in one central service, simplifying ML operations (MLOps) and ML governance at scale. The SageMaker Developer Guide shows you how to get started.

Amazon SageMaker CanvasSageMaker Canvas expanded its support for ready-to-use models to include foundation models (FMs). You can now access FMs such as Claude 2, Amazon Titan, and Jurassic-2 (powered by Amazon Bedrock) as well as publicly available models such as Falcon and MPT (powered by SageMaker JumpStart) through a no-code chat interface. Check out the SageMaker Developer Guide for more details.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional blog posts and news items that you might find interesting:

Behind the scenes on AWS contributions to open-source databases – This post shares some of the more substantial open-source contributions AWS has made in the past two years to upstream databases, introduces some key contributors, and shares how AWS approaches upstream work in our database services.

Fast and cost-effective Llama 2 fine-tuning with AWS Trainium – This post shows you how to fine-tune the Llama 2 model from Meta on AWS Trainium, a purpose-built accelerator for LLM training, to reduce training times and costs.

Code Llama code generation models from Meta are now available via Amazon SageMaker JumpStart – You can now deploy Code Llama FMs, developed by Meta, with one click in SageMaker JumpStart. This post walks you through the details.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

Build On AWS - Generative AIBuild On Generative AI – Season 2 of this weekly Twitch show about all things generative AI is in full swing! Every Monday, 9:00 US PT, my colleagues Emily and Darko look at new technical and scientific patterns on AWS, invite guest speakers to demo their work, and show us how they built something new to improve the state of generative AI. In today’s episode, Emily and Darko discussed how to translate unstructured documents into structured data. Check out show notes and the full list of episodes on community.aws.

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: DMV (DC, Maryland, Virginia) (October 13), Italy (October 18), UAE (October 21), Jaipur (November 4), Vadodara (November 4), and Brasil (November 4).

AWS InnovateAWS Innovate: Every Application Edition – Join our free online conference to explore cutting-edge ways to enhance security and reliability, optimize performance on a budget, speed up application development, and revolutionize your applications with generative AI. Register for AWS Innovate Online Americas and EMEA on October 19 and AWS Innovate Online Asia Pacific & Japan on October 26.

AWS re:Invent 2023AWS re:Invent (November 27 – December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Browse the session catalog and attendee guides and check out the re:Invent highlights for generative AI.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

Post Syndicated from Khandu Shinde original https://aws.amazon.com/blogs/big-data/how-chime-financial-uses-aws-to-build-a-serverless-stream-analytics-platform-and-defeat-fraudsters/

This is a guest post by Khandu Shinde, Staff Software Engineer and Edward Paget, Senior Software Engineering at Chime Financial.

Chime is a financial technology company founded on the premise that basic banking services should be helpful, easy, and free. Chime partners with national banks to design member first financial products. This creates a more competitive market with better, lower-cost options for everyday Americans who aren’t being served well by traditional banks. We help drive innovation, inclusion, and access across the industry.

Chime has a responsibility to protect our members against unauthorized transactions on their accounts. Chime’s Risk Analysis team constantly monitors trends in our data to find patterns that indicate fraudulent transactions.

This post discusses how Chime utilizes AWS Glue, Amazon Kinesis, Amazon DynamoDB, and Amazon SageMaker to build an online, serverless fraud detection solution — the Chime Streaming 2.0 system.

Problem statement

In order to keep up with the rapid movement of fraudsters, our decision platform must continuously monitor user events and respond in real-time. However, our legacy data warehouse-based solution was not equipped for this challenge. It was designed to manage complex queries and business intelligence (BI) use cases on a large scale. However, with a minimum data freshness of 10 minutes, this architecture inherently didn’t align with the near real-time fraud detection use case.

To make high-quality decisions, we need to collect user event data from various sources and update risk profiles in real time. We also need to be able to add new fields and metrics to the risk profiles as our team identifies new attacks, without needing engineering intervention or complex deployments.

We decided to explore streaming analytics solutions where we can capture, transform, and store event streams at scale, and serve rule-based fraud detection models and machine learning (ML) models with milliseconds latency.

Solution overview

The following diagram illustrates the design of the Chime Streaming 2.0 system.

The design included the following key components:

  1. We have Amazon Kinesis Data Streams as our streaming data service to capture and store event streams at scale. Our stream pipelines capture various event types, including user enrollment events, user login events, card swipe events, peer-to-peer payments, and application screen actions.
  2. Amazon DynamoDB is another data source for our Streaming 2.0 system. It acts as the application backend and stores data such as blocked devices list and device-user mapping. We mainly use it as lookup tables in our pipeline.
  3. AWS Glue jobs form the backbone of our Streaming 2.0 system. The simple AWS Glue icon in the diagram represents thousands of AWS Glue jobs performing different transformations. To achieve the 5-15 seconds end-to-end data freshness service level agreement (SLA) for the Steaming 2.0 pipeline, we use streaming ETL jobs in AWS Glue to consume data from Kinesis Data Streams and apply near-real-time transformation. We choose AWS Glue mainly due to its serverless nature, which simplifies infrastructure management with automatic provisioning and worker management, and the ability to perform complex data transformations at scale.
  4. The AWS Glue streaming jobs generate derived fields and risk profiles that get stored in Amazon DynamoDB. We use Amazon DynamoDB as our online feature store due to its millisecond performance and scalability.
  5. Our applications call Amazon SageMaker Inference endpoints for fraud detections. The Amazon DynamoDB online feature store supports real-time inference with single digit millisecond query latency.
  6. We use Amazon Simple Storage Service (Amazon S3) as our offline feature store. It contains historical user activities and other derived ML features.
  7. Our data scientist team can access the dataset and perform ML model training and batch inferencing using Amazon SageMaker.

AWS Glue pipeline implementation deep dive

There are several key design principles for our AWS Glue Pipeline and the Streaming 2.0 project.

  • We want to democratize our data platform and make the data pipeline accessible to all Chime developers.
  • We want to implement cloud financial backend services and achieve cost efficiency.

To achieve data democratization, we needed to enable different personas in the organization to use the platform and define transformation jobs quickly, without worrying about the actual implementation details of the pipelines. The data infrastructure team built an abstraction layer on top of Spark and integrated services. This layer contained API wrappers over integrated services, job tags, scheduling configurations and debug tooling, hiding Spark and other lower-level complexities from end users. As a result, end users were able to define jobs with declarative YAML configurations and define transformation logic with SQL. This simplified the onboarding process and accelerated the implementation phase.

To achieve cost efficiency, our team built a cost attribution dashboard based on AWS cost allocation tags. We enforced tagging with the above abstraction layer and had clear cost attribution for all AWS Glue jobs down to the team level. This enabled us to track down less optimized jobs and work with job owners to implement best practices with impact-based priority. One common misconfiguration we found was sizing of AWS Glue jobs. With data democratization, many users lacked the knowledge to right-size their AWS Glue jobs. The AWS team introduced AWS Glue auto scaling to us as a solution. With AWS Glue Auto Scaling, we no longer needed to plan AWS Glue Spark cluster capacity in advance. We could just set the maximum number of workers and run the jobs. AWS Glue monitors the Spark application execution, and allocates more worker nodes to the cluster in near-real time after Spark requests more executors based on our workload requirements. We noticed a 30–45% cost saving across our AWS Glue Jobs once we turned on Auto Scaling.

Conclusion

In this post, we showed you how Chime’s Streaming 2.0 system allows us to ingest events and make them available to the decision platform just seconds after they are emitted from other services. This enables us to write better risk policies, provide fresher data for our machine learning models, and protect our members from unauthorized transactions on their accounts.

Over 500 developers in Chime are using this streaming pipeline and we ingest more than 1 million events per second. We follow the sizing and scaling process from the AWS Glue streaming ETL jobs best practices blog and land on a 1:1 mapping between Kinesis Shard and vCPU core. The end-to-end latency is less than 15 seconds, and it improves the model score calculation speed by 1200% compared to legacy implementation. This system has proven to be reliable, performant, and cost-effective at scale.

We hope this post will inspire your organization to build a real-time analytics platform using serverless technologies to accelerate your business goals.


About the Authors

Khandu Shinde Khandu Shinde is a Staff Engineer focused on Big Data Platforms and Solutions for Chime. He helps to make the platform scalable for Chime’s business needs with architectural direction and vision. He’s based in San Francisco where he plays cricket and watches movies.

Edward Paget Edward Paget is a Software Engineer working on building Chime’s capabilities to mitigate risk to ensure our members’ financial peace of mind. He enjoys being at the intersection of big data and programming language theory. He’s based in Chicago where he spends his time running along the lake shore.

Dylan Qu is a Specialist Solutions Architect focused on Big Data & Analytics with Amazon Web Services. He helps customers architect and build highly scalable, performant, and secure cloud-based solutions on AWS.

AWS Weekly Roundup: Farewell EC2-Classic, EBS at 15 Years, and More (Sept. 4, 2023)

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-farewell-ec2-classic-ebs-at-15-years-and-more-sept-4-2023/

Last week, there was some great reading about Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS) written by AWS tech leaders.

Dr. Werner Vogels wrote Farewell EC2-Classic, it’s been swell, celebrating the 17 years of loyal duty of the original version that started what we now know as cloud computing. You can read how it made the process of acquiring compute resources simple, even though the stack running behind the scenes was incredibly complex.

We have come a long way since 2006, and we’re not done innovating for our customers. As celebrated in this year’s AWS Storage Day, Amazon EBS was launched 15 years ago this month. James Hamilton, SVP and distinguished engineer at Amazon, wrote Amazon EBS at 15 Years, about how the service has evolved to handle over 100 trillion I/O operations a day, and transfers over 13 exabytes of data daily.

As Dr. Werner said in his piece, “it’s a reminder that building evolvable systems is a strategy, and revisiting your architectures with an open mind is a must.” Our innovation efforts driven by customer feedback continue today, and this week is no different.

Last Week’s Launches
Here are some launches that got my attention:

Renaming Amazon Kinesis Data Analytics to Amazon Managed Service for Apache Flink – You can now use Amazon Managed Service for Apache Flink, a fully managed and serverless service for you to build and run real-time streaming applications using Apache Flink. All your existing running applications in Kinesis Data Analytics will work as-is, without any changes. To learn more, see my blog post.

Extended Support for Amazon Aurora and Amazon RDS – You can now get more time for support, up to three years, for Amazon Aurora and Amazon RDS database instances running MySQL 5.7, PostgreSQL 11, and higher major versions. This e will allow you time to upgrade to a new major version to help you meet your business requirements even after the community ends support for these versions.

Enhanced Starter Template for AWS Step Functions Workflow Studio – You can now use starter templates to streamline the process of creating and prototyping workflows swiftly, plus a new code mode, which enables builders to move easily between design and code authoring views. With the improved authoring experience in Workflow Studio, you can seamlessly alternate between a drag-and-drop visual builder experience or the new code editor so that you can pick your preferred tool to accelerate development.

To learn more, see Enhancing Workflow Studio with new features for streamlined authoring in the AWS Compute Blog.

Email Delivery History for Every Email in Amazon SES – You can now troubleshoot individual email delivery problems, confirm delivery of critical messages, and identify engaged recipients on a granular, single email basis. Email senders can investigate trends in delivery performance and see delivery and engagement status for each email sent using Amazon SES Virtual Deliverability Manager.

Response Streaming through Amazon SageMaker Real-time Inference – You can now continuously stream inference responses back to the client to help you build interactive experiences for various generative AI applications such as chatbots, virtual assistants, and music generators.

For more details on how to use response streaming along with examples, see Invoke to Stream an Inference Response and How containers should respond in the AWS documentation, and Elevating the generative AI experience: Introducing streaming support in Amazon SageMaker hosting in the AWS Machine Learning Blog.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you might have missed:

AI & Sports: How AWS & the NFL are Changing the Game – Over the last 5 years, AWS has partnered with the National Football League (NFL), helping fans better understand the game, helping broadcasters tell better stories, and helping teams use data to improve operations and player safety. Watch AWS CEO, Adam Selipsky, former NFL All-Pro Larry Fitzgerald, and the NFL Network’s Cynthia Frelund during their earlier livestream discussing the intersection of artificial intelligence and machine learning in sports.

Amazon Bedrock Story from Amazon Science – This is a good article explaining the benefits of using Amazon Bedrock to build and scale generative AI applications with leading foundation models, including Amazon’s Titan FMs, which focus on responsible AI to avoid toxic content.

Amazon EC2 Flexibility Score – This is an open source tool developed by AWS to assess any configuration used to launch instances through an Auto Scaling Group (ASG) against the recommended EC2 best practices. It converts the best practice adoption into a “flexibility score” that can be used to identify, improve, and monitor the configurations.

To learn more open-source news and updates, see this newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:InventAWS re:Invent 2023Ready to start planning your re:Invent? Browse the session catalog now. Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community.

AWS Global SummitsAWS Summits – The last in-person AWS Summit will be held in Johannesburg on Sept. 26.

AWS Community Days AWS Community Day– Join a community-led conference run by AWS user group leaders in your region: Aotearoa (Sept. 6), Lebanon (Sept. 9), Munich (Sept. 14), Argentina (Sept. 16), Spain (Sept. 23), and Chile (Sept. 30). Visit the landing page to check out all the upcoming AWS Community Days.

CDK Day – A community-led fully virtual event on Sept. 29 with tracks in English and Spanish about CDK and related projects. Learn more at the website.

You can browse all upcoming AWS-led in-person and virtual events, and developer-focused events such as AWS DevDay.

Channy

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS Weekly Roundup – AWS Dedicated Zones, Events and More – August 28, 2023

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-dedicated-zones-events-and-more-august-28-2023/

This week, I will meet our customers and partners at the AWS Summit Mexico. If you are around, please come say hi at the community lounge and at the F1 Game Day where I will spend most of my time. I would love to discuss your developer experience on AWS and listen to your stories about building on AWS.

Last Week’s Launches
I am amazed at how quickly service teams are deploying services to the new il-central-1 Region, aka AWS Israel (Tel-Aviv) Region. I counted no fewer than 25 new service announcements since we opened the Region on August 1, including ten just for last week!

In addition to these developments in the new Region, here are some launches that got my attention during the previous week.

AWS Dedicated Local Zones – Just like Local Zones, Dedicated Local Zones are a type of AWS infrastructure that is fully managed by AWS. Unlike Local Zones, they are built for exclusive use by you or your community and placed in a location or data center specified by you to help comply with regulatory requirements. I think about them as a portion of AWS infrastructure dedicated to my exclusive usage.

Enhanced search on AWS re:Post – AWS re:Post is a cloud knowledge service. The enhanced search experience helps you locate answers and discover articles more quickly. Search results are now presenting a consolidated view of all AWS knowledge on re:Post. The view shows AWS Knowledge Center articles, question and answers, and community articles that are relevant to the user’s search query.

Amazon QuickSight supports scheduled programmatic export to Microsoft ExcelAmazon QuickSight now supports scheduled generation of Excel workbooks by selecting multiple tables and pivot table visuals from any sheet of a dashboard. Snapshot Export APIs will now also support programmatic export to Excel format, in addition to Paginated PDF and CSV.

Amazon WorkSpaces announced a new client to support Ubuntu 20.04 and 22.04 – The new client, powered by WorkSpaces Streaming Protocol (WSP), improves the remote desktop experience by offering enhanced web conferencing functionality, better multi-monitor support, and a more user-friendly interface. To get started, simply download the new Linux client versions from Amazon WorkSpaces client download website.

Amazon Sagemaker CPU/GPU profiler – We launched the preview of Amazon SageMaker Profiler, an advanced observability tool for large deep learning workloads. With this new capability, you are able to access granular compute hardware-related profiling insights for optimizing model training performance.

Amazon Sagemaker rolling deployments strategy – You can now update your Amazon SageMaker Endpoints using a rolling deployment strategy. Rolling deployment makes it easier for you to update fully-scaled endpoints that are deployed on hundreds of popular accelerated compute instances.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you might have missed:

On-demand Container Loading in AWS Lambda – This one is not new from this week, but I spotted it while I was taking a few days of holidays. Marc Brooker and team were awarded Best Paper by USENIX Association for On-demand Container Loading in AWS Lambda (pdf). They explained in detail the challenges of loading (huge) container images in AWS Lambda. A must-read if you’re curious how Lambda functions work behind the scenes (pdf).

The Official AWS Podcast – Listen each week for updates on the latest AWS news and deep dives into exciting use cases. There are also official AWS podcasts in several languages. Check out the ones in FrenchGermanItalian, and Spanish.

AWS Open Source News and Updates – This is a newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS Hybrid Cloud & Edge Day (August 30) – Join a free-to-attend one-day virtual event to hear the latest hybrid cloud and edge computing trends, emerging technologies, and learn best practices from AWS leaders, customers, and industry analysts. To learn more, see the detail agenda and register now.

AWS Global SummitsAWS Summits – The 2023 AWS Summits season is almost ending with the last two in-person events in Mexico City (August 30) and Johannesburg (September 26).

AWS re:Invent – But don’t worry because re:Invent season (November 27–December 1) is coming closer. Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Registration is now open.

AWS Community Days AWS Community Day– Join a community-led conference run by AWS user group leaders in your region: Aotearoa (September 6), Lebanon (September 9), Munich (September 14), Argentina (September 16), Spain (September 23), and Chile (September 30). Visit the landing page to check out all the upcoming AWS Community Days.

CDK Day (September 29) – A community-led fully virtual event with tracks in English and Spanish about CDK and related projects. Learn more at the website.

That’s all for this week. Check back next Monday for another Week in Review!

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

— seb

Improving medical imaging workflows with AWS HealthImaging and SageMaker

Post Syndicated from Sukhomoy Basak original https://aws.amazon.com/blogs/architecture/improving-medical-imaging-workflows-with-aws-healthimaging-and-sagemaker/

Medical imaging plays a critical role in patient diagnosis and treatment planning in healthcare. However, healthcare providers face several challenges when it comes to managing, storing, and analyzing medical images. The process can be time-consuming, error-prone, and costly.

There’s also a radiologist shortage across regions and healthcare systems, making the demand for this specialty increases due to an aging population, advances in imaging technology, and the growing importance of diagnostic imaging in healthcare.

As the demand for imaging studies continues to rise, the limited number of available radiologists results in delays in available appointments and timely diagnoses. And while technology enables healthcare delivery improvements for clinicians and patients, hospitals seek additional tools to solve their most pressing challenges, including:

  • Professional burnout due to an increasing demand for imaging and diagnostic services
  • Labor-intensive tasks, such as volume measurement or structural segmentation of images
  • Increasing expectations from patients expecting high-quality healthcare experiences that match retail and technology in terms of convenience, ease, and personalization

To improve clinician and patient experiences, run your picture archiving and communication system (PACS) with an artificial intelligence (AI)-enabled diagnostic imaging cloud solution to securely gain critical insights and improve access to care.

AI helps reduce the radiologist burndown rate through automation. For example, AI saves radiologist chest x-ray interpretation time. It is also a powerful tool to identify areas that need closer inspection, and helps capture secondary findings that weren’t initially identified. The advancement of interoperability and analytics gives radiologist a 360-degree, longitudinal view of patient health records to provide better healthcare at potentially lower costs.

AWS offers services to address these challenges. This blog post discusses AWS HealthImaging (AWS AHI) and Amazon SageMaker, and how they are used together to improve healthcare providers’ medical imaging workflows. This ultimately accelerates imaging diagnostics and increases radiology productivity. AWS AHI enables developers to deliver performance, security, and scale to cloud-native medical imaging applications. It allows ingestion of Digital Imaging and Communication in Medicine (DICOM) images. Amazon SageMaker provides end-to-end solution for AI and machine learning.

Let’s explore an example use case involving X-rays after an auto accident. In this diagnostic medical imaging workflow, a patient is in the emergency room. From there:

  • The patient undergoes an X-ray to check for fractures.
  • The scanned acquisition device images flow to the PACS system.
  • The radiologist reviews the information gathered from this procedure and authors the report.
  • The patient workflow continues as the reports are made available to the referring physician.

Next-generation imaging solutions and workflows

Healthcare providers can use AWS AHI and Amazon SageMaker together to enable next-generation imaging solutions and improve medical imaging workflows. The following architecture illustrates this example.

X-ray images are sent to AWS HealthImaging and an Amazon SageMaker endpoint extracts insights.

Figure 1: X-ray images are sent to AWS HealthImaging and an Amazon SageMaker endpoint extracts insights.

Let’s review the architecture and the key components:

1. Imaging Scanner: Captures the images from a patient’s body. Depending on the modality, this can be an X-ray detector; a series of detectors in a CT scanner; a magnetic field and radio frequency coils in an MRI scanner; or an ultrasound transducer. This example uses an X-ray device.

2. Amazon SQS message queue: Consumes event from S3 bucket and triggers an AWS Step Functions workflow orchestration.

3. AWS Step Functions runs the transform and import jobs to further process and import the images into AWS AHI data store instance.

4. The final diagnostic image—along with any relevant patient information and metadata—is stored in the AWS AHI datastore. This allows for efficient imaging date retrieval and management. It also enables medical imaging data access with sub-second image retrieval latencies at scale, powered by cloud-native APIs and applications from AWS partners.

5. Radiologists responsible for ground truth for ML images perform medical image annotations using Amazon SageMaker Ground Truth. They visualize and label DICOM images using a custom data labeling workflow—a fully managed data labeling service that supports built-in or custom data labeling workflows. They also leverage tools like 3D Slicer for interactive medical image annotations.

6. Data scientists build or leverage built-in deep learning models using the annotated images on Amazon SageMaker. SageMaker offers a range of deployment options that vary from low latency and high throughput to long-running inference jobs. These options include considerations for batch, real-time, or near real-time inference.

7. Healthcare providers use AWS AHI and Amazon SageMaker to run AI-assisted detection and interpretation workflow. This workflow is used to identify hard-to-see fractures, dislocations, or soft tissue injuries to allow surgeons and radiologist to be more confident in their treatment choices.

8. Finally, the image stored in AWS AHI is displayed on a monitor or other visual output device where it can be analyzed and interpreted by a radiologist or other medical professional.

  • The Open Health Imaging Foundation (OHIF) Viewer is an open source, web-based, medical imaging platform. It provides a core framework for building complex imaging applications.
  • Radical Imaging or Arterys are AWS partners that provide OHIF-based medical imaging viewer.

Each of these components plays a critical role in the overall performance and accuracy of the medical imaging system as well as ongoing research and development focused on improving diagnostic outcomes and patient care. AWS AHI uses efficient metadata encoding, lossless compression, and progressive resolution data access to provide industry leading performance for loading images. Efficient metadata encoding enables image viewers and AI algorithms to understand the contents of a DICOM study without having to load the image data.

Security

The AWS shared responsibility model applies to data protection in AWS AHI and Amazon SageMaker.

Amazon SageMaker is HIPAA-eligible and can operate with data containing Protected Health Information (PHI). Encryption of data in transit is provided by SSL/TLS and is used when communicating both with the front-end interface of Amazon SageMaker (to the Notebook) and whenever Amazon SageMaker interacts with any other AWS services.

AWS AHI is also HIPAA-eligible service and provides access control at the metadata level, ensuring that each user and application can only see the images and metadata fields that are required based upon their role. This prevents the proliferation of Patient PHI. All access to AWS AHI APIs is logged in detail in AWS CloudTrail.

Both of these services leverage AWS Key Management service (AWS KMS) to satisfy the requirement that PHI data is encrypted at rest.

Conclusion

In this post, we reviewed a common use case for early detection and treatment of conditions, resulting in better patient outcomes. We also covered an architecture that can transform the radiology field by leveraging the power of technology to improve accuracy, efficiency, and accessibility of medical imaging.

Further reading

Content Repository for Unstructured Data with Multilingual Semantic Search: Part 2

Post Syndicated from Patrik Nagel original https://aws.amazon.com/blogs/architecture/content-repository-for-unstructured-data-with-multilingual-semantic-search-part-2/

Leveraging vast unstructured data poses challenges, particularly for global businesses needing cross-language data search. In Part 1 of this blog series, we built the architectural foundation for the content repository. The key component of Part 1 was the dynamic access control-based logic with a web UI to upload documents.

In Part 2, we extend the content repository with multilingual semantic search capabilities while maintaining the access control logic from Part 1. This allows users to ingest documents in content repository across multiple languages and then run search queries to get reference to semantically similar documents.

Solution overview

Building on the architectural foundation from Part 1, we introduce four new building blocks to extend the search functionality.

Optical character recognition (OCR) workflow: To automatically identify, understand, and extract text from ingested documents, we use Amazon Textract and a sample review dataset of .png format documents (Figure 1). We use Amazon Textract synchronous application programming interfaces (APIs) to capture key-value pairs for the reviewid and reviewBody attributes. Based on your specific requirements, you can choose to capture either the complete extracted text or parts the text.

Sample document for ingestion

Figure 1. Sample document for ingestion

Embedding generation: To capture the semantic relationship between the text, we use a machine learning (ML) model that maps words and sentences to high-dimensional vector embeddings. You can use Amazon SageMaker, a fully-managed ML service, to build, train, and deploy your ML models to production-ready hosted environments. You can also deploy ready-to-use pre-trained models from multiple avenues such as SageMaker JumpStart. For this blog post, we use the open-source pre-trained universal-sentence-encoder-multilingual model from TensorFlow Hub. The model inference endpoint deployed to a SageMaker endpoint generates embeddings for the document text and the search query. Figure 2 is an example of n-dimensional vector that is generated as the output of the reviewBody attribute text provided to the embeddings model.

Sample embedding representation of the value of reviewBody

Figure 2. Sample embedding representation of the value of reviewBody

Embedding ingestion: To make the embeddings searchable for the content repository users, you can use the k-Nearest Neighbor (k-NN) search feature of Amazon OpenSearch Service. The OpenSearch k-NN plugin provides different methods. For this blog post, we use the Approximate k-NN search approach, based on the Hierarchical Navigable Small World (HNSW) algorithm. HNSW uses a hierarchical set of proximity graphs in multiple layers to improve performance when searching large datasets to find the “nearest neighbors” for the search query text embeddings.

Semantic search: We make the search service accessible as an additional backend logic on Amazon API Gateway. Authenticated content repository users send their search query using the frontend to receive the matching documents. The solution maintains end-to-end access control logic by using the user’s enriched Amazon Cognito provided identity (ID) token claim with the department attribute to compare it with the ingested documents.

Technical architecture

The technical architecture includes two parts:

  1. Implementing multilingual semantic search functionality: Describes the processing workflow for the document that the user uploads; makes the document searchable.
  2. Running input search query: Covers the search workflow for the input query; finds and returns the nearest neighbors of the input text query to the user.

Part 1. Implementing multilingual semantic search functionality

Our previous blog post discussed blocks A through D (Figure 3), including user authentication, ID token enrichment, Amazon Simple Storage Service (Amazon S3) object tags for dynamic access control, and document upload to the source S3 bucket. In the following section, we cover blocks E through H. The overall workflow describes how an unstructured document is ingested in the content repository, run through the backend OCR and embeddings generation process and finally the resulting vector embedding are stored in OpenSearch service.

Technical architecture for implementing multi-lingual semantic search functionality

Figure 3. Technical architecture for implementing multilingual semantic search functionality

  1. The OCR workflow extracts text from your uploaded documents.
    • The source S3 bucket sends an event notification to Amazon Simple Queue Service (Amazon SQS).
    • The document transformation AWS Lambda function subscribed to the Amazon SQS queue invokes an Amazon Textract API call to extract the text.
  2. The document transformation Lambda function makes an inference request to the encoder model hosted on SageMaker. In this example, the Lambda function submits the reviewBody attribute to the encoder model to generate the embedding.
  3. The document transformation Lambda function writes an output file in the transformed S3 bucket. The text file consists of:
    • The reviewid and reviewBody attributes extracted from Step 1
    • An additional reviewBody_embeddings attribute from Step 2
      Note: The workflow tags the output file with the same S3 object tags as the source document for downstream access control.
  4. The transformed S3 bucket sends an event notification to invoke the indexing Lambda function.
  5. The indexing Lambda function reads the text file content. Then indexing Lambda function makes an OpenSearch index API call along with source document tag as one of the indexing attributes for access control.

Part 2. Running user-initiated search query

Next, we describe how the user’s request produces query results (Figure 4).

Search query lifecycle

Figure 4. Search query lifecycle

  1. The user enters a search string in the web UI to retrieve relevant documents.
  2. Based on the active sign-in session, the UI passes the user’s ID token to the search endpoint of the API Gateway.
  3. The API Gateway uses Amazon Cognito integration to authorize the search API request.
  4. Once validated, the search API endpoint request invokes the search document Lambda function.
  5. The search document function sends the search query string as the inference request to the encoder model to receive the embedding as the inference response.
  6. The search document function uses the embedding response to build an OpenSearch k-NN search query. The HNSW algorithm is configured with the Lucene engine and its filter option to maintain the access control logic based on the custom department claim from the user’s ID token. The OpenSearch query returns the following to the query embeddings:
    • Top three Approximate k-NN
    • Other attributes, such as reviewid and reviewBody
  7. The workflow sends the relevant query result attributes back to the UI.

Prerequisites

You must have the following prerequisites for this solution:

Walkthrough

Setup

The following steps deploy two AWS CDK stacks into your AWS account:

  • content-repo-search-stack (blog-content-repo-search-stack.ts) creates the environment detailed in Figure 3, except for the SageMaker endpoint, which you create in a spearate step.
  • demo-data-stack (userpool-demo-data-stack.ts) deploys sample users, groups, and role mappings.

To continue setup, use the following commands:

  1. Clone the project Git repository:
    git clone https://github.com/aws-samples/content-repository-with-multilingual-search content-repository
  2. Install the necessary dependencies:
    cd content-repository/backend-cdk 
    npm install
  3. Configure environment variables:
    export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query 'Account' --output text)
    export CDK_DEFAULT_REGION=$(aws configure get region)
  4. Bootstrap your account for AWS CDK usage:
    cdk bootstrap aws://$CDK_DEFAULT_ACCOUNT/$CDK_DEFAULT_REGION
  5. Deploy the code to your AWS account:
    cdk deploy --all

The complete stack set-up may take up to 20 minutes.

Creation of SageMaker endpoint

Follow below steps to create the SageMaker endpoint in the same AWS Region where you deployed the AWS CDK stack.

    1. Sign in to the SageMaker console.
    2. In the navigation menu, select Notebook, then Notebook instances.
    3. Choose Create notebook instance.
    4. Under the Notebook instance settings, enter content-repo-notebook as the notebook instance name, and leave other defaults as-is.
    5. Under the Permissions and encryption section (Figure 5), you need to set the IAM role section to the role with the prefix content-repo-search-stack. In case you don’t see this role automatically populated, select it from the drop-down. Leave the rest of the defaults, and choose Create notebook instance.

      Notebook permissions

      Figure 5. Notebook permissions

    6. The notebook creation status changes to Pending before it’s available for use within 3-4 minutes.
    7. Once the notebook is in the Available status, choose Open Jupyter.
    8. Choose the Upload button and upload the create-sagemaker-endpoint.ipynb file in the backend-cdk folder of the root of the blog repository.
    9. Open the create-sagemaker-endpoint.ipynb notebook. Select the option Run All from the Cell menu (Figure 6). This might take up to 10 minutes.

      Run create-sagemaker-endpoint notebook cells

      Figure 6. Run create-sagemaker-endpoint notebook cells

    10. After all the cells have successfully run, verify that the AWS Systems Manager parameter sagemaker-endpoint is updated with the value of the SageMaker endpoint name. An example of value as the output of the cell is in Figure 7. In case you don’t see the output, check if the preceding steps were run correctly.

      SSM parameter updated with SageMaker endpoint

      Figure 7. SSM parameter updated with SageMaker endpoint

    11. Verify in the SageMaker console that the inference endpoint with the prefix tensorflow-inference has been deployed and is set to status InService.
    12. Upload sample data to the content repository:
      • Update the S3_BUCKET_NAME variable in the upload_documents_to_S3.sh script in the root folder of the blog repository with the s3SourceBucketName from the AWS CDK output of the content-repo-search-stack.
      • Run upload_documents_to_S3.sh script to upload 150 sample documents to the content repository. This takes 5-6 minutes. During this process, the uploaded document triggers the workflow described in the Implementing multilingual semantic search functionality.

Using the search service

At this stage, you have deployed all the building blocks for the content repository in your AWS account. Next, as part of the upload sample data to the content repository, you pushed a limited corpus of 150 sample documents (.png format). Each document is in one of the four different languages – English, German, Spanish and French. With the added multilingual search capability, you can query in one language and receive semantically similar results across different languages while maintaining the access control logic.

  1. Access the frontend application:
    • Copy the amplifyHostedAppUrl value of the AWS CDK output from the content-repo-search-stack shown in the terminal.
    • Enter the URL in your web browser to access the frontend application.
    • A temporary page displays until the automated build and deployment of the React application completes after 4-5 minutes.
  2. Sign into the application:
    • The content repository provides two demo users with credentials as part of the demo-data-stack in the AWS CDK output. Copy the password from the terminal associated with the sales-user, which belongs to the sales department.
    • Follow the prompts from the React webpage to sign in with the sales-user and change the temporary password.
  3. Enter search queries and verify results. The search action invokes the workflow described in Running input search query. For example:
    • Enter works well as the search query. Note the multilingual output and the semantically similar results (Figure 8).

      Positive sentiment multi-lingual search result for the sales-user

        Figure 8. Positive sentiment multilingual search result for the sales-user

    • Enter bad quality as the search query. Note the multilingual output and the semantically similar results (Figure 9).

      Negative sentiment multi-lingual search result for the sales-user

      Figure 9. Negative sentiment multi-lingual search result for the sales-user

  4. Sign out as the sales-user with the Log Out button on the webpage.
  5. Sign in using the marketing-user credentials to verify access control:
    • Follow the sign in procedure in step 2 but with the marketing-user.
    • This time with works well as search query, you find different output. This is because the access control only allows marketing-user to search for the documents that belong to the marketing department (Figure 10).

      Positive sentiment multi-lingual search result for the marketing-user

      Figure 10. Positive sentiment multilingual search result for the marketing-user

Cleanup

In the backend-cdk subdirectory of the cloned repository, delete the deployed resources: cdk destroy --all.

Additionally, you need to access the Amazon SageMaker console to delete the SageMaker endpoint and notebook instance created as part of the Walkthrough setup section.

Conclusion

In this blog, we enriched the content repository with multi-lingual semantic search features while maintaining the access control fundamentals that we implemented in Part 1. The building blocks of the semantic search for unstructured documents—Amazon Textract, Amazon SageMaker, and Amazon OpenSearch Service—set a foundation for you to customize and enhance the search capabilities for your specific use case. For example, you can leverage the fast developments in Large Language Models (LLM) to enhance the semantic search experience. You can replace the encoder model with an LLM capable of generating multilingual embeddings while still maintaining the OpenSearch service to store and index data and perform vector search.