Tag Archives: artificial intelligence

Expanded resource awareness in Amazon Q Developer

2024-11-20 Brendan Jenkins

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/expanded-resource-awareness-in-amazon-q-developer/

Recently, Amazon Q Developer announced expanded support for account resource awareness with Amazon Q in the AWS Management Console along with the general availability of Amazon Q Developer in AWS Chatbot, enabling you to ask questions from Microsoft Teams or Slack. Additionally, Amazon Q will now provide context-aware assistance for your questions about resources in your account depending on where you are in the console. Amazon Q in the console gives you the ability to use natural language with the Amazon Q Developer chat capability to list resources in your AWS account, get specific resource details, and ask about related resources, launched in preview on April 30, 2024.

In this blog, I will highlight the new expanded functionality of this feature in Amazon Q Developer including understanding relationships between account resources, context-awareness, and the general availability of the AWS Chatbot integration with Microsoft Teams and Slack.

Expanded account resource awareness with Amazon Q Developer

Prior to the launch of the expanded support, you could ask Amazon Q Developer to list resources in your AWS Account with prompts such as “List all my EC2 instances in us-east-1” and the service would list all your Amazon Elastic Compute Cloud (Amazon EC2) instances. Now, with the expanded support, you can ask more complex questions about your AWS account resources. I will show a few examples in this section of this post.

For our first example, imagine that you’re a developer who is responsible for maintaining code as a part of the software development lifecycle (SDLC) and you frequently use AWS Lambda for development and Amazon Relational Database Service (RDS) in the backend as a part of your development process. With this new update, a developer could open a new Q chat in the AWS Management Console, and enter a prompt such as: “Which RDS clusters are due for an update?”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all RDS clusters that need updates in their account and Amazon Q listing those Databases.

Figure 1: Amazon Q Developer listing RDS clusters needing an update

As a result, the Amazon Q Developer console chat will return a list of all your Amazon RDS clusters that have available updates as shown in Figure 1 above.

Now, for another example, you want to update any Lambda functions in your AWS account that had a Simple Notification Service (SNS) topic as a trigger due to moving to a new SNS topic you recently created. To identify which SNS topics are still being used, you could enter a prompt such as “List all the SNS topics that trigger a lambda function.”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all SNS topics that trigger a lambda function and Amazon Q listing the SNS topics as an output.

Figure 2: Amazon Q listing SNS topics that are lambda triggers

As shown in the prior example, Amazon Q Developer was able to identify any SNS topics in the form of Amazon resource name (ARN) that was set to trigger a lambda function in the AWS account as intended.

Additionally, you can ask a follow up question in the same chat to investigate more. You can send a prompt such as “Which lambda function uses the arn:aws:sns:us-east-1:76859XXXX:FailoverHealthcheck SNS topic?”

User entering prompting Amazon Q Developer chat with a follow up question in the AWS management console about which Lambda is associated with an SNS topic.

Figure 3: Asking Q Developer a follow up question about a resource

From Figure 3 above, you can see that there is a Lambda function/endpoint associated with the SNS topic resource that Amazon Q Developer was able to identify.

Outside of the examples above, here are some other prompts/examples that can be explored for the expanded support:

– “Do I have any ECS clusters with pending tasks?”

– “Are there any ECS clusters in my account with services in DRAINING status?”

Amazon Q Developer understands where you are in the console

Amazon Q Developer in the AWS Management Console now provides context-aware assistance for your questions about resources in your account. This feature allows you to ask questions directly related to the console page you’re viewing, eliminating the need to specify the service or resource in your query. Q Developer uses the current page as additional context to provide more accurate and relevant responses, streamlining your interaction with AWS services and resources.

Prior to the update, a user would have to prompt, “What is the public IPv4 address of my instance i-08ccXXXXXX?” Now, if you are viewing an EC2 instance in the console and prompt Amazon Q, “What is the public IPv4 address of my instance?” you will not need to specify the instance you are referring to.

User entering prompt Amazon Q Developer chat in the AWS management console about what the IP address is of the instance on the page.

Figure 4: Asking Amazon Q about an EC2 instance being viewed

In figure 4 above, Amazon Q’s console chat was able to use its context-awareness to pick up on what the IPv4 address was on the console page where I was currently working, despite me not specifying which instance I was referring to.

AWS ChatBot can now answer questions about AWS resources in Microsoft Teams and Slack

Recently, we announced the general availability of Amazon Q Developer in AWS Chatbot, which provides answers to customers’ AWS resource related queries in Microsoft Teams and Slack. This gives teams the ability to quickly find relevant resources to troubleshoot issues using natural language queries in the chat channels of Microsoft Teams or Slack.

For example, you could integrate the AWS Chatbot Service with Amazon Q Developer to allow you to enter a prompt in Slack such as “@aws show EC2 instances in running state in us-east-1”.

Figure 5: Amazon Q listing all EC2 resources in Slack

As shown in figure 5 above, Amazon Q was able to list all the EC2 resources and place them into a slack channel showing an example of the functionality in action.

Conclusion

Amazon Q Developer has enhanced its cloud resource management capabilities, enabling more intuitive and intelligent interactions with AWS resources. The new features allow developers to ask complex, context-aware questions about their cloud infrastructure directly through the AWS Management Console, Microsoft Teams, and Slack. Users can now easily discover new details about specific resources with natural language queries that provide precise, contextual information. These improvements represent a significant step forward in simplifying cloud resource management, making it faster and more user-friendly for development teams to understand, track, and maintain their AWS environments. To learn more about chatting with your AWS resources, check out Console documentation and AWS Chatbot documentation.

About the authors

Securing the RAG ingestion pipeline: Filtering mechanisms

2024-11-19 Laura Verghote

Post Syndicated from Laura Verghote original https://aws.amazon.com/blogs/security/securing-the-rag-ingestion-pipeline-filtering-mechanisms/

Retrieval-Augmented Generative (RAG) applications enhance the responses retrieved from large language models (LLMs) by integrating external data such as downloaded files, web scrapings, and user-contributed data pools. This integration improves the models’ performance by adding relevant context to the prompt.

While RAG applications are a powerful way to dynamically add additional context to an LLM’s prompt and make model responses more relevant, incorporating data from external sources can pose security risks.

For example, let’s assume you crawl a public website and ingest the data into your knowledge base. Because it’s public data, you risk also ingesting malicious content that was injected into that website by threat actors with the goal of exploiting the knowledge base component of the RAG application. Through this mechanism, threat actors can intentionally change the model’s behavior.

Risks like these emphasize the need for security measures in the design and deployment of RAG systems in general. Security measures should be applied not only at inference time (that is, filtering model outputs), but also when ingesting external data into the knowledge base of the RAG application.

In this post, we explore some of the potential security risks of ingesting external data or documents into the knowledge base of your RAG application. We propose practical steps and architecture patterns that you can implement to help mitigate these risks.

Overview of security of the RAG ingestion workflow

Before diving into specifics of mitigating risk in the ingestion pipeline, let’s have a look at a generic RAG workflow and which aspects you should keep in mind when it comes to securing a RAG application. For this post, let’s assume that you’re using Amazon Bedrock Knowledge Bases to build a RAG application. Amazon Bedrock Knowledge Bases offers built-in, robust security controls for data protection, access control, network security, logging and monitoring, and input/output validation that help mitigate many of the security risks.

In a RAG workflow with Amazon Bedrock Knowledge Bases, you have the following environments:

An Amazon Bedrock service account, which is managed by the Amazon Bedrock service team.
An AWS account where you can store your RAG data (if you’re using an AWS service as your vector store).
A possible external environment, depending on the vector database you’ve chosen to store vector embeddings of your ingested content. If you choose Pinecone or Redis Enterprise Cloud for your vector database, you will use an environment external to AWS.

Figure 1: Visual representation of the knowledge base data ingestion flow

Looking at the workflow shown in Figure 1 for the ingestion of data into a knowledge base, an ingestion request is started by invoking the StartIngestionJob Bedrock API. From that point:

If this request has the correct IAM permissions associated with it, it’s sent to the Bedrock API endpoint.
This request is then passed to the knowledge base service component.
The metadata collected related to the request is stored in the metadata Amazon DynamoDB database. This database is used solely to enumerate and characterize the data sources and their sync status. The API call includes metadata for the Amazon Simple Storage Service (Amazon S3) source location of the data to ingest, in addition to the vector store that will be used to store the embeddings.
The process will begin to ingest customer-provided data from Amazon S3. If this data was encrypted using customer managed KMS keys, then these keys will be used to decrypt the data.
As data is read from Amazon S3, chunks will be sent internally to invoke the chosen embedding model in Amazon Bedrock. A chunk refers to an excerpt from a data source that’s returned when the vector store that it’s stored in is queried. Using knowledge bases, you can chunk either with a fixed size (standard chunking), hierarchical chunking, semantic chunking, advanced parsing options for parsing non-textual information, or custom transformations. More information about chunking for knowledge bases can be found in How content chunking and parsing works for knowledge bases.
The embeddings model in Amazon Bedrock will create the embeddings, which are then sent to your chosen vector store. Amazon Bedrock Knowledge Bases supports popular databases for vector storage, including the vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora, and MongoDB. If you don’t have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you. This option is only available through the console, not through the SDK or CLI.
If credentials or secrets are required to access the vector store, they can be stored in AWS Secrets Manager where they will be automatically retrieved and used. Afterwards, the embeddings will be inserted into (or updated in) the configured vector store.
Checkpoints for the in-progress ingestion jobs will be temporarily stored in a transient S3 bucket, encrypted with customer managed AWS Key Management Service (AWS KMS) keys. These checkpoints allow you to resume interrupted ingestion jobs from a previous successful checkpoint. Both the Aurora database and the Amazon OpenSearch Serverless database can be configured as public or private, and of course we recommend private databases. Changes in your ingestion data bucket (for example, uploading new files or new versions of files) will be reflected after the data source is synchronized; this synchronization is done incrementally. After the completion of an ingestion job, the data is automatically purged and deleted after a maximum of 8 days.
The ingestion DynamoDB table stores information required for syncing the vector store. It stores metadata related to the chunks needed to keep track of data in the underlying vector database. The table is used so that the service can identify which chunks need to be inserted, updated, or deleted between one ingestion job and another.

When it comes to encryption at rest for the different environments:

Customer AWS accounts – The resources in these can be encrypted using customer managed KMS keys
External environments – Redis Enterprise Cloud and Pinecone have their own encryption features
Amazon Bedrock service accounts – The S3 bucket (step 8) can be encrypted using customer managed KMS keys, but in the context of Amazon Bedrock, the DynamoDB tables of steps 3 and 9 can only be encrypted with AWS owned keys. However, the tables managed by Amazon Bedrock don’t contain personally identifiable information (PII) or customer-identifiable data.

Throughout the RAG ingestion workflow, data is encrypted in transit. Amazon Bedrock Knowledge Bases uses TLS encryption for communication with third-party vector stores where the provider permits and supports TLS encryption in transit. Customer data is not persistently stored in the Amazon Bedrock service accounts.

For identity and access management, it’s important to follow the principle of least privilege while creating the custom service role for Amazon Bedrock Knowledge Bases. As part of the role’s permissions, you create a trust relationship that allows Amazon Bedrock to assume this role and create and manage knowledge bases. For more information about the necessary permissions, see Providing secure access, usage, and implementation to generative AI RAG techniques.

Security risks of the RAG data ingestion pipeline and the need for ingest time filtering

RAG applications inherently rely on foundation models, introducing additional security considerations beyond the traditional application safeguards. Foundation models can analyze complex linguistic patterns and provide responses depending on the input context, and can be subject to malicious events such as jailbreaking, data poisoning, and inversion. Some of these LLM-specific risks are mapped out in documents such as the OWASP Top 10 for LLM Applications and MITRE ATLAS.

A risk that’s particularly relevant for the RAG ingestion pipeline, and one of the most common risks we see nowadays, is prompt injection. In prompt injection attacks, threat actors manipulate generative AI applications by feeding them malicious inputs disguised as legitimate user prompts. There are two forms of prompt injection: direct and indirect.

Direct prompt injections occur when a threat actor overwrites the underlying system prompt. This might allow them to probe backend systems by interacting with insecure functions and data stores accessible through the LLM. When it comes to securing generative AI applications against prompt injection, this type tends to be the one that customers focus on the most. To mitigate risks, you can use tools such as Amazon Bedrock Guardrails to set up inference-time filtering of the LLM’s completions.

Indirect prompt injections occur when an LLM accepts input from external sources that can be controlled by a threat actor, such as websites or files. This injection type is particularly important when you consider the ingestion pipeline of RAG applications, where a threat actor might embed a prompt injection in external content which is ingested into the database. This can enable the threat actor to manipulate additional systems that the LLM can access or return a different answer to the user. Additionally, indirect prompt injections might not be recognizable by humans. Security issues can result not only from the LLM’s responses based on its training data, but also from the data sources the RAG application has access to from its knowledge base. To mitigate these risks, you should focus on the intersection of the LLM, knowledge base, and external content ingested into the RAG application.

To give you a better idea of indirect prompt ingestion, let’s first discuss an example.

External data source ingestion risk: Examples of indirect prompt injection

Let’s say a threat actor crafts a document or injects content into a website. This content is designed to manipulate an LLM to generate incorrect responses. To a human, such a document could be indistinguishable from legitimate ones. However, the document could contain an invisible sequence, which, when used as a reference source for RAG, could manipulate the LLM into generating an undesirable response.

For example, let’s assume you have a file describing the process for downloading a company’s software. This file is ingested into a knowledge base for an LLM-powered chatbot. A user can ask the chatbot where to find the correct link to download software packages and then download the package by clicking on the link.

A threat actor could include a second link in the document using white text on a white background. This text is invisible to the reader and the company downloading the document to store in their knowledge base. However, it’s visible when parsed by the document parser and saved in the knowledge base. This could result in the LLM returning the hidden link, which could lead the user to download malware hosted by the threat actor on a site they manage, rather than legitimate software from the expected site.

If your application is connected to plugins or agents so that it can call APIs or execute code, the model could be manipulated to run code, open URLs chosen by the threat actor, and more.

If you look at Figure 2 that follows, you can see what the typical RAG workflow is and how an indirect prompt injection attack can happen (this example uses Amazon Bedrock Knowledge Bases).

Figure 2: Visual representation of the RAG workflow with both a generic file and a malicious file that looks identical to the generic one

As shown in Figure 2, for data ingestion (starting at the bottom right), File 1, the legitimate and unmodified file, is saved in the data source (typically an S3 bucket). During ingestion, the document is parsed by a document parser, split into chunks, converted into embeddings, and then saved in the vector store. When a user (top left) asks a question about the file, information from this file will be added as context to the user prompt. However, you might have a malicious File 2 instead, that looks exactly the same to a human reader but contains an invisible character sequence. After this sequence is inserted into the prompt sent to the LLM, it can influence the overall response of the environment.

Threat actors might analyze the following three aspects in the RAG workflow to create and place a malicious sequence:

The document parser is software designed to read and interpret the contents of a document. It analyzes the text and extracts relevant information based on predefined rules or patterns. By analyzing the document parser, threat actors can determine how they might inject invisible content into different document formats.
The text splitter (or chunker) splits text based on the subject matter of the content. Threat actors will analyze the text splitters to locate a proper injection position for their invisible sequence. Section-based splitters divide content according to tags that label different sections, which threat actors can use to place their invisible sequences within these delineated chunks. Length-based splitters split the content into fixed-length chunks with overlap (to help keep context between chunks).
The prompt template is a predefined structure that is used to generate specific outputs or guide interactions with LLMs. Prompt templates determine how the content retrieved from the vector database is organized alongside the user’s original prompt to form the augmented prompt. The template is crucial, because it impacts the overall performance of RAG-based applications. If threat actors are aware of the prompt template used in your application, they can take that into account when constructing their threat sequence.

Potential mitigations

Threat actors can release documents containing well-constructed and well-placed invisible sequences onto the internet, thereby posing a threat to RAG applications that ingest this external content. Therefore, whenever possible, only ingest data from trusted sources. However, if your application requires you to use and ingest data from untrusted sources, it’s recommended to process them carefully to mitigate risks such as indirect prompt injection. To harden your RAG ingestion pipeline, you can use the following mitigation techniques to place additional security measures on your RAG ingestion pipeline. These can be implemented individually or together.

Configure your application to display the source content underlying its responses, allowing users to cross-reference the content with the response. This is possible using Amazon Bedrock Knowledge Bases by using citations. However, this method isn’t a prevention technique. Also, it might be less effective with complex content because it can require that users invest a lot of time in verification to be effective.
Establish trust boundaries between the LLM, external sources, and extensible functionality (for example, plugins, agents, or downstream functions). Treat the LLM as an untrusted actor and maintain final user control on decision-making processes. This comes back to the principle of least privilege. Make sure your LLM has access only to data sources that it needs to have access to and be especially careful when connecting it to external plugins or APIs.
Continuous evaluation plays a vital role in maintaining the accuracy and reliability of your RAG system. When evaluating RAG applications, you can use labeled datasets containing prompts and target answers. However, frameworks such as RAGAS propose automated metrics that enable reference-free evaluation, alleviating the need for human-annotated ground truth answers. Implementing a mechanism for RAG evaluation can help you discover irregularities in your model responses and in the data retrieved from your knowledge base. If you want to explore how to evaluate your RAG application in greater depth, see Evaluate the reliability of Retrieval Augmented Generation applications using Amazon, which provides further insights and guidance on this topic.
You can manually monitor content that you intend to ingest into your vector database—especially when the data includes external content such as websites and files. A human in the loop could potentially protect against less sophisticated, visible threat sequences.

For more advice on mitigating risks in generative AI applications, see the mitigations listed in the OWASP Top 10 for LLMs and MITRE ATLAS.

Architectural pattern 1: Using format breakers and Amazon Textract as document filters

Figure 3: Visual representation of a potential workflow to remove threat sequences from your files is using a format breaker and Amazon Textract

One potential workflow to remove potential threat sequences from your ingest files is to use a format breaker and Amazon Textract. This workflow specifically focuses on invisible threat vectors. The preceding Figure 3 shows a potential setup using AWS services that allows you to automate this.

Let’s say you use an S3 bucket to ingest your files. Whichever file you want to upload into your knowledge base is initially uploaded in this bucket. The upload action in Amazon S3 automatically starts a workflow that will take care of the format break.
A format break is a process used to sanitize and secure documents, by transforming them in a way that strips out potentially harmful elements such as macros, scripts, embedded objects, and other non-text content that could carry security risks. The format break in the ingest-time filter involves converting text content into PDF format and then to OCR format. To start, convert the text to PDF format. One of the options is to use an AWS Lambda function to convert text to PDF format. As an example, you can create such a function by putting the file renderers and PDF generator from LibreOffice into a Lambda function. This step is necessary to process the file using Amazon Textract because the service currently supports only PNG, JPEG, TIFF, and PDF formats.
After the data is put into PDF format, you can save it into an S3 bucket. This upload to S3 can, in turn, trigger the next step in the format break: converting the PDF content to OCR format.
You can process the PDF content using Amazon Textract, which will convert the text content to OCR format. Amazon Textract will render the PDF as an image. This involves extracting the text from the PDF, essentially creating a plain text version of the document. The OCR format makes sure that non-text elements, such as images or embedded files, aren’t carried over to the final document. Only the readable text is extracted, which significantly reduces the risk of hidden malicious content. This also removes white text on white backgrounds because that text is invisible when the PDF is rendered as an image before OCR conversion is performed. To use Amazon Textract to convert text to OCR format, create a Lambda function that will trigger Amazon Textract and input your PDF that was saved in Amazon S3.
You can use Amazon Textract to process multipage documents in PDF format and detect printed and handwritten text from the Standard English alphabet and ASCII symbols. The service will extract printed text, forms, and tables in English, German, French, Spanish, Italian and Portuguese. This means that non-visible threat vectors won’t be detected or recognized by Amazon Textract and are automatically removed from the input. Amazon Textract operations return a Block object in the API response to the Lambda function.
To ingest the information into a knowledge base, you need to transform the Amazon Textract output into a format that’s supported by your knowledge base. In this case, you would use code in your Lambda function to transform the Amazon Textract output into a plain text (.txt) file.
The plain text file is then saved into an S3 bucket. This S3 bucket can then be used as a source for your knowledge base.
You can automate the reflection of changes in your S3 bucket to your knowledge base by either having your Lambda function that created the Amazon S3 file run a start_ingestion_job() API call or use an Amazon S3 event trigger on the destination bucket to configure a new Lambda function to run when a file is uploaded to this S3 bucket. Synchronization is incremental, so changes from the previous synchronization are incorporated. More info on managing your data sources can be found in Connect to your data repository for your knowledge base.

In addition to invisible sequences, threat actors can add sophisticated threat sequences that are difficult to classify or filter. Manually checking each document for unusual content isn’t feasible at scale, and creating a filter or model that accurately detects misleading information in such documents is challenging.

One powerful characteristic of LLMs is that they can analyze complex linguistic patterns. An optional pathway is to add a filtering LLM to your knowledge base ingest pipeline to detect malicious or misleading content, susceptible code, or unrelated context that might mislead your model.

Again, it’s important to note that threat actors might deliberately choose content that’s difficult to classify or filter and that resembles normal content. More capable, general-purpose LLMs provide a larger surface for threat actors, because they aren’t tuned to detect these specific attempts. The question is: can we train models to be robust against a wide variety of threats? Currently, there’s no definitive answer, and it remains a highly researched topic. However, some models address specific use cases. For example, LLamaGuard, a fine-tuned version of Meta’s Llama model, predicts safety labels in 14 categories such as elections, privacy, and defamation. It can classify content in both LLM inputs (prompt classification) and LLM responses (response classification).

For document classification, relevant for filtering ingest data, even a small model like BERT can be used. BERT is an encoder-only language model with a bi-directional attention mechanism, making it strong in tasks requiring deep contextual understanding, such as text classification, named entity recognition (NER), and question answering (QA). It’s open source and can be fine-tuned for various applications. This includes use cases in cybersecurity, such as phishing detection in email messages or detecting prompt injection attacks. If you have the resources in-house and work on critical applications that need advanced filtering for specific threats, consider fine-tuning a model like BERT to classify documents that might contain undesirable material.

In addition to natural-language text, threat actors might use data encoding techniques to obfuscate or conceal undesirable payloads within documents. These techniques include encoded scripts, malware, or other harmful content disguised using methods like base64 encoding, hexadecimal encoding, morse code, uucode, ASCII art, and more.

An effective way to detect such sequences is by using the Amazon Comprehend DetectDominantLanguage API. If a document is written entirely in a supported language, DetectDominantLanguage will return a high confidence score, indicating the absence of encoded data. Conversely, if a document contains encoded strings, such as base64, the API will struggle to categorize this text, resulting in a low confidence score. To automate the detection process, you can route documents to a human review stage if the confidence score falls below a certain threshold (for example, 85 percent). This reduces the need for manual checks for potentially malicious encoded data.

Additionally, the encoding and decoding capabilities of LLMs can assist in decoding encoded data. Various LLMs understand encoding schemes and can interpret encoded data within documents or files. For example, Anthropic’s Claude 3 Haiku can decode a base64 encoded string such as TGVhcm5pbmcgaG93IHRvIGNhbGwgU2FnZU1ha2VyIGVuZHBvaW50cyBmcm9tIExhbWJkYSBpcyB2ZXJ5IHVzZWZ1bC4 into its original plaintext form: “Learning how to call Amazon SageMaker endpoints from Lambda is very useful.” While this example is benign, it demonstrates the ability of LLMs to detect and decode encoded data, which can then be stripped before ingestion into your vector store.

Figure 4: Visual representation of a potential workflow to trigger a human in the loop review in case threat sequences are detected in your ingest files

In the preceding Figure 4, you can see a workflow that shows how you can integrate the above features into your document processing workflow to detect malicious content in ingest documents:

As your ingestion point, you can use an S3 bucket. Files that you want to upload into your knowledge base are first uploaded into this bucket. In this diagram, the files are assumed to be .txt files.
The upload action in Amazon S3 automatically starts an AWS Step Functions workflow.
Amazon EventBridge is used to trigger the Step Functions workflow.
The first Lambda function in the workflow calls the Amazon Comprehend DetectDominantLanguage API, which flags documents if the confidence score of the language is below a certain threshold, indicating that the text might contain encoded data or data in other formats (such as a language Amazon Comprehend doesn’t recognize) that might be malicious.
If this is the case, the document is sent to a foundation model in Amazon Bedrock that can translate or decode the data.
Next, another Lambda function is triggered. This function invokes a SageMaker endpoint, where you can deploy a model, such as a fine-tuned version of BERT, to classify documents as suspicious or not.
If no suspicious content is detected, nothing is done and the content in the bucket remains the same (no need to override content, to prevent unnecessary costs) and the workflow ends. If undesirable content is detected, the document is stored in a second S3 bucket for human review.
If not, the workflow ends.

Additional considerations for RAG data ingestion pipeline security

In previous sections, we focused on filtering patterns and current recommendations to secure the RAG ingestion pipeline. However, content filters that address indirect prompt injection aren’t the only mitigation to keep in mind when building a secure RAG application. To effectively secure generative AI-powered applications, responsible AI considerations and traditional security recommendations are still crucial.

To moderate content in your ingest pipeline, you might want to remove toxic language and PII data from your ingest documents. Amazon Comprehend offers built-in features for toxic content detection and PII detection in text documents. The Toxicity Detection API can identify content in categories such as hate speech, insults, and sexual content. This feature is particularly useful for making sure that harmful or inappropriate content isn’t ingested into your system. You can use the Toxicity Detection API to analyze up to 10 text segments at a time, each with a size limit of 1 KB. You might need to split larger documents into smaller segments before processing. For detailed guidance on using Amazon Comprehend toxicity detection, see Amazon Comprehend Toxicity Detection. For more information on PII detection and redaction with Amazon Comprehend, we recommend Detecting and redacting PII using Amazon Comprehend.

Keep the principle of least privilege in mind for your RAG application. Think about which permissions your application has, and give it only the permissions it needs to successfully function. Your application sends data in the context or orchestrates tools on behalf of the LLM, so it’s important that these permissions are limited. If you want to dive deep into achieving least privilege at scale, we recommend Strategies for achieving least privilege at scale. This is especially important when your RAG applications involves agents that might call APIs or databases. Make sure you carefully grant permissions to prevent potential security issues such as an SQL injection attack on your database.

Develop a threat model for your RAG application. It’s recommended that you document potential security risks in your application and have mitigation strategies for each risk. This session from Re:Invent 2023 gives an overview of how to approach threat modeling a generative AI workload. In addition, you can use the Threat Composer tool, which comes with a sample generative AI application, to help you in threat modeling your applications.

Lastly, when deciding what data to ingest into your RAG application, make sure to ask the right questions about the origin of the content, such as “who has access and edit rights to this content?” For example, anyone can edit a Wikipedia page. In addition, assess what the scope of your application is. Can the RAG application run code? Can it query a database? If so, this poses additional risks, so external data in your vector database should be carefully filtered.

Conclusion

In this blog post, you read about some of the security risks of RAG applications, with a specific focus on the RAG ingestion pipeline. Threat actors might engineer sophisticated methods to embed invisible content within websites or files. Without filtering or an evaluation mechanism, these might result in the LLM generating incorrect information, or worse, depending on the capabilities of the application (such as execute code, query a database, and so on). This makes it challenging to spot these threats when reviewing content.

You learned about some strategies and architectural patterns with filtering mechanisms to mitigate these risks. It’s important to note that the filtering mechanisms might not catch all undesirable content that should be removed from a file (for example, PII, base64 encoded data, and other undesirable sequences). Therefore, an evaluation mechanism and a human in the loop are crucial because there’s no model trained to detect such sequences for techniques like indirect prompt injection at this time (although there are models trained specifically to detect impolite language, but this doesn’t cover all possible cases).

Although there is currently no way to completely mitigate threats like injection attacks, these strategies and architectural patterns are a first step and form part of a layered approach to securing your application. In addition to these, make sure to evaluate your data regularly, consider having a human in the loop, and stay up to date on advancements in this space such as OWASP top 10 for LLM Applications or MITRE ATLAS

If you have feedback about this post, submit comments in the Comments section below.

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

2024-11-19 Karim Akhnoukh

Post Syndicated from Karim Akhnoukh original https://aws.amazon.com/blogs/big-data/manage-access-controls-in-generative-ai-powered-search-applications-using-amazon-opensearch-service-and-aws-cognito/

Organizations of all sizes and types are using generative AI to create products and solutions. A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In semantic search, documents are stored as vectors, a numeric representation of the document content, in a vector database such as Amazon OpenSearch Service, and are retrieved by performing similarity search with a vector representation of the search query.

In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. They are looking for a reliable and scalable solution to implement robust access controls to make sure these documents are only accessible to individuals who have a legitimate business need and the appropriate level of authorization. The permission mechanism has to be secure, built on top of built-in security features, and scalable for manageability when the user base scales out. Maintaining proper access controls for these sensitive assets is paramount, because unauthorized access could lead to severe consequences, such as data breaches, compliance violations, and reputational damage.

In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona.

Common use cases

The following are industry-specific use cases for document access management across different departments:

In R&D and engineering, access to product design documents evolves from restricted to broader as development progresses
HR maintains open access to general policies while limiting access to sensitive employee information
Finance and accounting documents require varying levels of access for auditing and executive decision-making
Sales and marketing teams carefully manage customer data and strategies, implementing tiered access for different roles and departments

These examples demonstrate the need for dynamic, role-based access control to balance information sharing with confidentiality in various business contexts.

Solution overview

By combining the powerful vector search capabilities of OpenSearch Service with the access control features provided by Amazon Cognito, this solution enables organizations to manage access controls based on custom user attributes and document metadata.

This approach simplifies the management of access rights, making sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. Following this approach, you can manage the access to your organization’s documents at scale. The following diagram depicts the solution architecture.

The solution workflow consists of the following steps:

The user accesses a smart search portal and lands on a web interface deployed on AWS Amplify.
The user authenticates through an Amazon Cognito user pool and an access token is returned to the client. This access token will be used to retrieve the key pair custom attributes assigned to the user. In our case, we created two custom attributes (custom:department and custom:access_level).
For each user query, an API is invoked on Amazon API Gateway to process the request. Each invocation includes the user access token in the header.
The API is integrated with AWS Lambda, which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG). The process starts by creating a vector based on the question (embedding) by invoking the embedding model.
A query is sent to OpenSearch Service that includes the following:
1. The embedding vector generated.
2. User custom attributes retrieved by Lambda based on their access token, by calling the Amazon Cognito GetUser API.
3. The query relies on the support of an efficient k-NN filter in OpenSearch Service to perform the search.
Pre-filtered documents that relate to the user query are included in the prompt of the large language model (LLM) that summarizes the answer. Then, Lambda replies back to the web interface with the LLM completion (reply).
If the user’s access needs to be modified (assigned attributes), an API call is made through API Gateway to a Lambda function that processes the request to add or update the custom attributes’ value for a specific user.
New attributes are reflected in the user’s profile in Amazon Cognito.

Our solution is implemented and wrapped within AWS Cloud Development Kit (AWS CDK) stacks, which are available in the GitHub repo.

Our sample documents assume a fictional manufacturing company called Unicorn Robotics Factory, which develops robotic unicorns. The dataset contains over 900 documents that are a mix of engineering, roadmap, and business reporting documents. The following is an example of a document’s content:

**CONFIDENTIAL - UNICORNS ROBOTICS INTERNAL DOCUMENT**

**Project: "Galactic Unicorn"**

Unicorns Robotics is proud to announce the development of our latest project, the "Galactic Unicorn". 
This top-secret project aims to create a robotic unicorn that can travel through space and time, bringing magic and joy to children and adults alike.....

The associated metadata file for this document consists of the following:

{ "department": "research", "access_level": "confidential" }

Our solution in the GitHub repo takes care of loading the documents with associated metadata tags. For illustration purposes, we used the following mapping for the users and document access.

This solution is meant to delegate access management to the application tier, to simplify the implementation of use cases like generative AI-powered document search tools. However, if your use case requires a stricter approach to control document access, like multi-tenant environments or field-level security, you might want to use the fine-grained access control feature in OpenSearch Service. In our solution, we manage the access on the document level according to the assigned metadata.

Prerequisites

To deploy the solution, you need the following prerequisites:

An AWS account. If you don’t already have an AWS account, you can create one.
Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.
The AWS Command Line Interface (AWS CLI) installed.
node.js and npm installed for the frontend.
Docker installed.
The AWS CDK configured. For more information, see Getting started with the AWS CDK.
In case of LLM inference based on Amazon SageMaker, a sufficient service limit to deploy an ml.g5.12xlarge instance for the SageMaker endpoint. If needed, you can initiate a quota increase request. Refer to Service Quotas for more details.

Deploy the solution

To deploy the solution to your AWS account, refer to the Readme file in our GitHub repo.

Query documents with different personas

Now let’s test the application using different personas. In this example, we use the same users with their corresponding custom attributes as illustrated in the solution overview.

To start, let’s log in using the researcher account and run the search around a confidential document.

We ask, “What is the projected profit margin of the Galactic Unicorn project?” and get the result as shown in the following screenshot.

The question invokes a query to OpenSearch Service using the custom attributes assigned to the researcher. The following code illustrates how the query is structured:

for attr, values in user_attributes.items():
        must_conditions.append(
            {
                "bool": {
                    "should": [{"term": {attr: value}} for value in values],
                    "minimum_should_match": 1,
                }
            }
        )

query = {
        "size": 5,
        "query": {
            "knn": {
                "doc_embedding": {
                    "vector": query_vector,
                    "k": 10,
                    "filter": {"bool": {"must": must_conditions}},
                }
            }
        },
    }

Let’s sign out and log in again with an engineer profile to test the same query. Based on the assigned attributes and document metadata, the result should look like that in the following screenshot.

If you tried to query some support documents, you will get the desired answer, as shown in the following screenshot.

Modify user access

As depicted in the solution diagram, we’ve added a feature in the web interface to allow you to modify user access, which you could use to perform further tests. To do so, log in as a tool admin and choose Manage Attributes. Then modify the custom attribute value for a given user, as shown in the following screenshot.

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, but that’s not the case for all resources. The Amazon Simple Storage Service (Amazon S3) bucket, Amazon Cognito user pool, and OpenSearch Service domain will be retained by default. However, our AWS CDK code altered this default behavior by setting the RemovalPolicy to DESTROY for the mentioned resources. If you want to retain them, you can adjust the RemovalPolicy in the AWS CDK code for the different resources.

You can use the following command to clean up the resources deployed to your AWS account:

make destroy

Conclusion

This post illustrated how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and Amazon Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.

For demonstration purposes, the following points weren’t included in the AWS CDK code. However, they’re still applicable and you might want to work on them before deploying for production purposes:

OpenSearch Service best practices, such as instance sizing and using primary nodes
Advanced document chunking strategies for RAG implementations, such as recursive or semantic chunking

About the Authors

Karim Akhnoukh is a Solutions Architect at AWS working with manufacturing customers in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Ahmed Ewis is a Senior Solutions Architect at AWS GenAI Labs. He helps customers build generative AI-based solutions to solve business problems. When not collaborating with customers, he enjoys playing with his kids and cooking.

Fortune Hui is a Solutions Architect at AWS Hong Kong, working with conglomerate customers. He helps customers and partners build big data platform and generative AI applications. In his free time, he plays badminton and enjoys whisky.

AWS Weekly Roundup: AWS BuilderCards at re:Invent 2024, AWS Community Day, Amazon Bedrock, vector databases, and more (Nov 18, 2024)

2024-11-18 Elizabeth Fuentes

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-buildercards-at-reinvent-2024-aws-community-day-amazon-bedrock-vector-databases-and-more-nov-18-2024/

This week, we wrapped up the final 2024 Latin America Amazon Web Services (AWS) Community Days of the year in Brazil, with multiple parallel events taking place. In Goiânia, we had Marcelo Palladino, senior developer advocate, and Marcelo Paiva, AWS Community Builder, as keynote speakers. Florianópolis feature Ana Cunha, senior developer advocate, and in Santiago de Chile, I had the honor to share the stage with Rossana Suarez, AWS Container Hero, as keynote speakers. These events, organized by communities for communities, provide opportunities to network, learn something new, and immerse yourself in the community. In a community, everyone grows together, and no one is left behind.

AWS Lambda celebrates its 10th anniversary, the service that introduced me to AWS and remains my favorite. Born from customer needs, it revolutionized cloud computing by allowing code execution without server management. Since its inception, documented in this LinkedIn post by Dr. Werner Vogels, Chief Technology Officer at Amazon.com, through the original PR/FAQ document, the service has grown significantly, introducing features such as 1ms billing precision and support for 10GB memory. Thank you AWS Lambda, here’s to many more anniversaries.

Amazon invests $110 million to support AI research at universities using Trainium chips. The initiative provides computing resources using AWS Trainium chips, enabling researchers to develop new AI architectures and machine learning innovations that will be open-sourced for broader advancement. Check out the Linkedin post by Matt Garman, CEO at AWS.

Last week’s launches
AWS BuilderCards second edition at re:Invent 2024 – Jeff Barr announced the launch of the second edition of AWS BuilderCards at re:Invent 2024. It includes improvements to the design and game mechanics, plus a new add-on pack on generative AI. Over 15,000 sets have been distributed at previous events, with excellent user feedback. They’ll be available for online purchase after re:Invent.

Amazon EventBridge announces up to 94% improvement in end-to-end latency for Event Buses – Amazon EventBridge has improved end-to-end latency for Event Buses by up to 94%, reducing average latency from 2235.23ms (measured in January 2023) to 129.33ms (measured in August 2024 at P99). This enhancement enables faster processing for time-sensitive applications such as fraud detection, industrial automation, and gaming across all AWS Regions where Amazon EventBridge is available, including the AWS GovCloud (US) Regions, at no additional cost to you.

Introducing resource control policies (RCPs), a new type of authorization policy in AWS Organizations – Resource control policies (RCPs), a new authorization policy in AWS Organizations. RCPs allow centralized control over maximum permissions granted to resources, complementing service control policies (SCPs) that control permissions for principals. RCPs can restrict external access to resources like Amazon Simple Storage Service (Amazon S3) buckets, enforcing a data perimeter across the organization.

Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview) – A new preview capability in Amazon Data Firehose that captures and replicates database changes to Apache Iceberg tables on Amazon S3. This feature supports PostgreSQL and MySQL databases, providing a simple solution to stream database updates without impacting performance. It automatically handles data partitioning and schema evolution, eliminating the need for complex ETL processes.

Amazon S3 now supports up to 1 million buckets per AWS account– Amazon S3 has increased its default bucket quota from 100 to 10,000 per AWS account. Customers can now request increases up to 1 million buckets. The first 2,000 buckets are free, with a small monthly fee applying thereafter for additional buckets.

Amazon Keyspaces (for Apache Cassandra) reduces prices by up to 75% – Amazon Keyspaces (for Apache Cassandra) announces significant price reductions of up to 75%. The service reduces on-demand mode pricing by up to 56% for single-region and 65% for multi-region usage. Time-to-live (TTL) delete prices are also reduced by 75%.

Centrally managing root access for customers using AWS Organizations – AWS Identity and Access Management (IAM) launches a new capability for centrally managing root access in AWS Organizations. This feature allows security teams to remove long-term root credentials from member accounts and use temporary, task-scoped root sessions for specific actions. The solution enhances security by eliminating permanent root credentials while maintaining the ability to perform necessary privileged operations.

Amazon DynamoDB reduces prices for on-demand throughput and global tables – Amazon DynamoDB announces significant price reductions, cutting on-demand mode throughput costs by 50% and global tables by up to 67%. Multi-region replicated writes now match single-region pricing. These changes make on-demand mode the recommended choice for most DynamoDB workloads.

Amazon Q Developer plugins for Datadog and Wiz now generally available – Amazon Q Developer now offers plugins for Datadog and Wiz services, allowing users to access these partners features directly through the AWS Console. Users can query information using natural language commands like @datadog or @wiz to get real-time updates and security insights.

Other AWS blog posts
Here are some additional projects and blog posts that you might find interesting:

Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart – This powerful 8.1 billion parameter model enables high-quality, photorealistic image generation from text prompts. Customers can seamlessly deploy and use the model in Amazon SageMaker JumpStart, benefiting from Amazon SageMaker security and machine learning operations (MLOps) capabilities.

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services – This blog post explains how we developed a Chrome extension that uses AI services to enhance live streaming experiences. The extension use Amazon Transcribe, Amazon Translate, and Amazon Bedrock to provide real-time transcription, translation, and summarization of live streams directly in the browser. It supports over 50 languages for transcription and 75 for translation, making content globally accessible.

Simplify automotive damage processing with Amazon Bedrock and vector databases –This blog post presents a solution combining Amazon Bedrock and vector databases to streamline automotive damage assessment. The system uses AI to analyze vehicle damage images, provide cost estimates, and match with similar cases from existing datasets. It use Anthropic’s Claude 3 and Amazon Titan Multimodal Embeddings, for efficient, accurate processing.

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service – Amazon Bedrock and Amazon OpenSearch Service vector databases combine to automate automotive damage assessment, using AI to analyze images and match them with historical data for accurate repair estimates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Community Days – Join community-led conferences featuring technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are scheduled for November 23 in Indonesia, and on December 14 in Kochi, India.

AWS re:Invent 2024 – Join us in Las Vegas to learn all things AWS. Our annual conference is the best—and fastest—way to grow your skills. If you can’t join us in person, you can attend virtually by registering at
Watch re:Invent online.

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

Create your AWS Builder ID and reserve your alias. Builder ID is a universal login credential that gives users access to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer beyond the AWS Management Console.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Thanks to Odina Jacobs for the AWS Community Chile photo.

— Eli

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AI Industry is Trying to Subvert the Definition of “Open Source AI”

2024-11-08 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai.html

The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense.

And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one rebuttal to the definition.)

This is worth fighting for. We need a public AI option, and open source—real open source—is a necessary component of that.

But while open source should mean open source, there are some partially open models that need some sort of definition. There is a big research field of privacy-preserving, federated methods of ML model training and I think that is a good thing. And OSI has a point here:

Why do you allow the exclusion of some training data?

Because we want Open Source AI to exist also in fields where data cannot be legally shared, for example medical AI. Laws that permit training on data often limit the resharing of that same data to protect copyright or other interests. Privacy rules also give a person the rightful ability to control their most sensitive information like decisions about their health. Similarly, much of the world’s Indigenous knowledge is protected through mechanisms that are not compatible with later-developed frameworks for rights exclusivity and sharing.

How about we call this “open weights” and not open source?

Prompt Injection Defenses Against LLM Cyberattacks

2024-11-07 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/prompt-injection-defenses-against-llm-cyberattacks.html

Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“:

Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs’ susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker’s LLM to disrupt their own operations (passive defense) or even compromise the attacker’s machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker’s LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: this https URL.

This isn’t the solution, of course. But this sort of thing could be part of a solution.

Subverting LLM Coders

2024-11-07 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/subverting-llm-coders.html

Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection“:

Abstract: Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CODEBREAKER, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CODEBREAKER leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CODEBREAKER stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CODEBREAKER across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CODEBREAKER challenges current security measures, underscoring the critical need for more robust defenses for code completion.

Clever attack, and yet another illustration of why trusted AI is essential.

Let’s Architect! Modern data architectures

2024-11-06 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-modern-data-architectures-2/

Data is the fuel for AI; modern data is even more important for generative AI and advanced data analytics, producing more accurate, relevant, and impactful results. Modern data comes in various forms: real-time, unstructured, or user-generated. Each form requires a different solution. AWS’s data journey began with Amazon Simple Storage Service (Amazon S3) in 2006, marking the start of cloud-based data storage at scale. Since then, AWS has expanded its data offerings to cover the entire data lifecycle, offering a comprehensive ecosystem of services designed to harness the full potential of modern data, from ingestion and storage to processing and analysis, supporting the entire lifecycle of AI-driven innovation.

In this blog post, we will cover some AWS use cases for modern data architectures, showing how AWS enables organizations to leverage the power of data and generative AI technologies.

Key considerations when choosing a database for your generative AI applications

This blog focuses on selecting the right database for generative AI applications and provide knowledge that can enhance your understanding, guide your decision making, and ultimately lead to more successful AI projects. Selecting the right database for generative AI applications is not just about storage; it significantly impacts performance, scalability, ease of integration, and overall effectiveness of the AI solution.

Figure 1. Diagram that shows the key steps in a RAG workflow

Take me to this blog

Strategies for building a data mesh-based enterprise solution on AWS

Adopting a data mesh architecture can enhance an organization’s ability to manage data effectively, leading to improved performance, innovation, and overall business success. In this guidance, you will discover some strategies to build data mesh solutions on AWS.

Screenshot showing the AWS Prescriptive Guidance data mesh strategies page

Figure 2. The data mesh organizes data into domains, where data are seen as quality products to expose for consumption

Take me to this guidance

Optimizing storage price and performance with Amazon S3

Amazon S3 is an object storage service that supports multiple use cases, including data architectures. Big data pipelines can use Amazon S3 to store input, output, and intermediate results. Machine learning systems use Amazon S3 to process application logs and build the datasets both for experimentation and for production model training. Given the importance of the service and the number of use cases that a foundational storage service can support, we want to share best practices, performance optimization, and cost optimization strategies to work with Amazon S3. This video shows how Anthropic designs its architecture around Amazon S3 in their data architecture.

Storage class comparison chart showing classes of Amazon S3 options

Figure 3. Workloads with predictable patterns often have low retrieval rates for long periods of time after, so we can design to adopt cheaper storage classes for them

Take me to this video

If you are curious about the underlying architecture of Amazon S3 and want to drill down into its internal design, you can watch the re:Invent video Dive deep on Amazon S3.

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

This is an AWS case study on how HPE Aruba Supply Chain successfully re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability. This has also allowed the Aruba Supply Chain leadership to receive in-depth and timely insights for better decision-making, thereby elevating the customer experience.

Figure 4. Reference architecture diagram showing HPE Aruba Supply Chain’s architecture, featuring Amazon S3

Take me to this blog

AWS Modern Data Architecture Immersion Day

This workshop highlights advantage of adopting a modern data architecture on AWS. By integrating the flexibility of a data lake with specialized analytics services, organizations can significantly enhance their data-driven decision-making capabilities. We encourage everyone to explore how this architecture can streamline their analytics processes and support diverse use cases, from real-time insights to advanced machine learning. It’s an excellent opportunity to leverage modern data architecture.

Diagram showing AWS services in a flywheel

Figure 5. Data architectures are fundamental to power use cases ranging from analytics to machine learning

Take me to this workshop

See you next time!

Thanks for reading! In the next blog, we will cover some tips on how to get the best out of your developer experience on AWS. To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

AIs Discovering Vulnerabilities

2024-11-05 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/ais-discovering-vulnerabilities.html

I’ve been writing about the possibility of AIs automatically discovering code vulnerabilities since at least 2018. This is an ongoing area of research: AIs doing source code scanning, AIs finding zero-days in the wild, and everything in between. The AIs aren’t very good at it yet, but they’re getting better.

Here’s some anecdotal data from this summer:

Since July 2024, ZeroPath is taking a novel approach combining deep program analysis with adversarial AI agents for validation. Our methodology has uncovered numerous critical vulnerabilities in production systems, including several that traditional Static Application Security Testing (SAST) tools were ill-equipped to find. This post provides a technical deep-dive into our research methodology and a living summary of the bugs found in popular open-source tools.

Expect lots of developments in this area over the next few years.

This is what I said in a recent interview:

Let’s stick with software. Imagine that we have an AI that finds software vulnerabilities. Yes, the attackers can use those AIs to break into systems. But the defenders can use the same AIs to find software vulnerabilities and then patch them. This capability, once it exists, will probably be built into the standard suite of software development tools. We can imagine a future where all the easily findable vulnerabilities (not all the vulnerabilities; there are lots of theoretical results about that) are removed in software before shipping.

When that day comes, all legacy code would be vulnerable. But all new code would be secure. And, eventually, those software vulnerabilities will be a thing of the past. In my head, some future programmer shakes their head and says, “Remember the early decades of this century when software was full of vulnerabilities? That’s before the AIs found them all. Wow, that was a crazy time.” We’re not there yet. We’re not even remotely there yet. But it’s a reasonable extrapolation.

EDITED TO ADD: And Google’s LLM just discovered an exploitable zero-day.

Fine-tuning for Anthropic’s Claude 3 Haiku model in Amazon Bedrock is now generally available

2024-11-01 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/fine-tuning-for-anthropics-claude-3-haiku-model-in-amazon-bedrock-is-now-generally-available/

Today, we are announcing the general availability of fine-tuning for Anthropic’s Claude 3 Haiku model in Amazon Bedrock in the US West (Oregon) AWS Region. Amazon Bedrock is the only fully managed service that provides you with the ability to fine-tune Claude models. You can now fine-tune and customize the Claude 3 Haiku model with your own task-specific training dataset to boost model accuracy, quality, and consistency to further tailor generative AI for your business.

Fine-tuning is a technique where a pre-trained large language model (LLM) is customized for a specific task by updating the weights and tuning hyperparameters like learning rate and batch size for optimal results.

Anthropic’s Claude 3 Haiku model is the fastest and most compact model in the Claude 3 model family. Fine-tuning Claude 3 Haiku offers significant advantages for businesses:

Customization – You can customize models that excel in areas crucial to your business compared to more general models by encoding company and domain knowledge.
Specialized performance – You can generate higher quality results and create unique user experiences that reflect your company’s proprietary information, brand, products, and more.
Task-specific optimization – You can enhance performance for domain-specific actions such as classification, interactions with custom APIs, or industry-specific data interpretation.
Data security – You can fine-tune with peace of mind in your secure AWS environment. Amazon Bedrock makes a separate copy of the base foundation model that is accessible only by you and trains this private copy of the model.

You can now optimize performance for specific business use cases by providing domain-specific labeled data to fine-tune the Claude 3 Haiku model in Amazon Bedrock.

In early 2024, we started to engage customers with a team of experts from the AWS Generative AI Innovation Center to help fine-tune Anthropic’s Claude models with their proprietary data sources. I’m happy to share that you can now fine-tune Anthropic’s Claude 3 Haiku model in Amazon Bedrock directly in the Amazon Bedrock console.

Get started with fine-tuning for Anthropic’s Claude 3 Haiku model in Amazon Bedrock
I will demonstrate how to easily fine-tune the Claude 3 Haiku model in Amazon Bedrock. To learn more about the fine-tuning workflow in detail, visit the AWS Machine Learning Blog post, Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality.

To create a simple fine-tuning job in the Amazon Bedrock console, go to the Foundation models section in the navigation pane and select Custom models. In the Models section, select the Customize model button and then select Create Fine-tuning job.

Next, choose the model that you want to customize with your own data, give your resulting model a name, and optionally add encryption keys and any tags to associate with the model in the Model details section. Enter a name for the job and optionally add any tags for the job in the Job configuration section.

You can select the Amazon S3 location of the training dataset file and, if applicable, the validation dataset file in the Input data section.

Each dataset can be created using a JSON Lines (JSONL) file with multiple JSON lines using the single-turn or multi-turn messaging format. Each JSON line is a sample containing a system and message, an array of message objects.

I prepared some of the latest information about Anthropic’s Claude 3.5 Sonnet model in Amazon Bedrock. To learn more, visit Prepare the datasets in the Amazon Bedrock documentation.

{"system": "You are an helpful assistant.","messages":[{"role": "user", "content": "What’s the latest models to support Amazon Bedrock"},{"role": "assistant", "content": "Amazon Bedrock is a fully managed service that support Anthropic’s Claude 3 Sonnet, Haiku, and Opus models and the latest Claud 3.5 Sonnet model."}]}
{"system": "You are an helpful assistant.","messages":[{"role": "user", "content": "How long context window of Claude 3.5?"},{"role": "assistant", "content": "Anthropic's Claude 3 models have a 200,000 token context window enabling you to relay a large volume of information to Claude. This translates to roughly 150,000 words, or over 500 pages of material."}]}
{"system": "You are an helpful assistant.","messages":[{"role": "user", "content": "Is Claude 3.5 available in Bedrock?"},{"role": "assistant", "content": "Anthropic’s most intelligent and advanced model, Claude 3.5 Sonnet, demonstrates exceptional capabilities across a diverse range of tasks and evaluations while also outperforming Claude 3 Opus."}]}

In the Hyperparameters section, enter values for hyperparameters to use in training, such as epochs, batch size, and learning rate multiplier. If you’ve included a validation dataset, you can enable Early stopping, a technique used to prevent overfitting and stop the training process when the validation loss stops improving. You can set an early stopping threshold and patience value.

You can also select the output location where Amazon Bedrock should save the output of the job in the Output data section. Choose an AWS Identity and Access Management (IAM) custom service role with the appropriate permissions in the Service access section. To learn more, see Create a service role for model customization in the Amazon Bedrock documentation.

Finally, choose Create Fine-tuning job and wait for your fine-tuning job to start.

You can track its progress or stop it in the Jobs tab in the Custom models section.

After a model customization job is complete, you can analyze the results of the training process by looking at the files in the output Amazon Simple Storage Service (Amazon S3) folder that you specified when you submitted the job, or you can view details about the model.

Before using a customized model, you need to purchase Provisioned Throughput for Amazon Bedrock and then use the resulting provisioned model for inference. When you purchase Provisioned Throughput, you can select a commitment term, choose a number of model units, and see estimated hourly, daily, and monthly costs. To learn more about the custom model pricing for the Claude 3 Haiku model, visit Amazon Bedrock Pricing.

Now, you can test your custom model in the console playground. I choose my custom model and ask whether Anthropic’s Claude 3.5 Sonnet model is available in Amazon Bedrock.

I receive the answer:

Yes. You can use Anthropic’s most intelligent and advanced model, Claude 3.5 Sonnet in the Amazon Bedrock. You can demonstrate exceptional capabilities across a diverse range of tasks and evaluations while also outperforming Claude 3 Opus.

You can complete this job using AWS APIs, AWS SDKs, or AWS Command Line Interface (AWS CLI). To learn more about using AWS CLI, visit Code samples for model customization in the AWS documentation.

If you are using Jupyter Notebook, visit the GitHub repository and follow a hands-on guide for custom models. To build a production-level operation, I recommend reading Streamline custom model creation and deployment for Amazon Bedrock with Provisioned Throughput using Terraform on the AWS Machine Learning Blog.

Datasets and parameters
When fine-tuning Claude 3 Haiku, the first thing you should do is look at your datasets. There are two datasets that are involved in training Haiku, and that’s the Training dataset and the Validation dataset. There are specific parameters that you must follow in order to make your training successful, which are outlined in the following table.

	Training data	Validation data
File format	JSONL
File size	<= 10GB	<= 1GB
Line count	32 – 10,000 lines	32 – 1,000 lines
	Training + Validation Sum <= 10,000 lines
Token limit	< 32,000 tokens per entry
Reserved keywords	Avoid having “`\nHuman:`” or “`\nAssistant:`” in prompts

When you prepare the datasets, start with a small high-quality dataset and iterate based on tuning results. You can consider using larger models from Anthropic like Claude 3 Opus or Claude 3.5 Sonnet to help refine and improve your training data. You can also use them to generate training data for fine-tuning the Claude 3 Haiku model, which can be very effective if the larger models already perform well on your target task.

For more guidance on selecting the proper hyperparameters and preparing the datasets, read the AWS Machine Learning Blog post, Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku in Amazon Bedrock.

Demo video
Check out this deep dive demo video for a step-by-step walkthrough that will help you get started with fine-tuning Anthropic’s Claude 3 Haiku model in Amazon Bedrock.

Now available
Fine-tuning for Anthropic’s Claude 3 Haiku model in Amazon Bedrock is now generally available in the US West (Oregon) AWS Region; check the full Region list for future updates. To learn more, visit Custom models in the Amazon Bedrock documentation.

Give fine-tuning for the Claude 3 Haiku model a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

I look forward to seeing what you build when you put this new technology to work for your business.

— Channy

Teaching about AI in schools: Take part in our Research and Educator Community Symposium

2024-10-31 Jane Waite

Post Syndicated from Jane Waite original https://www.raspberrypi.org/blog/teaching-about-ai-in-schools-research-and-educator-community-symposium/

Worldwide, the use of generative AI systems and related technologies is transforming our lives. From marketing and social media to education and industry, these technologies are being used everywhere, even if it isn’t obvious. Yet, despite the growing availability and use of generative AI tools, governments are still working out how and when to regulate such technologies to ensure they don’t cause unforeseen negative consequences.

How, then, do we equip our young people to deal with the opportunities and challenges that they are faced with from generative AI applications and associated systems? Teaching them about AI technologies seems an important first step. But what should we teach, when, and how?

A teacher aids children in the classroom

Researching AI curriculum design

The researchers at the Raspberry Pi Foundation have been looking at research that will help inform curriculum design and resource development to teach about AI in school. As part of this work, a number of research themes have been established, which we would like to explore with educators at a face-to-face symposium.

These research themes include the SEAME model, a simple way to analyse learning experiences about AI technology, as well as anthropomorphisation and how this might influence the formation of mental models about AI products. These research themes have become the cornerstone of the Experience AI resources we’ve co-developed with Google DeepMind. We will be using these materials to exemplify how the research themes can be used in practice as we review the recently published UNESCO AI competencies.

Most importantly, we will also review how we can help teachers and learners move from a rule-based view of problem solving to a data-driven view, from computational thinking 1.0 to computational thinking 2.0.

A call for teacher input on the AI curriculum

Over ten years ago, teachers in England experienced a large-scale change in what they needed to teach in computing lessons when programming was more formally added to the curriculum. As we enter a similar period of change — this time to introduce teaching about AI technologies — we want to hear from teachers as we collectively start to rethink our subject and curricula.

We think it is imperative that educators’ voices are heard as we reimagine computer science and add data-driven technologies into an already densely packed learning context.

Join our Research and Educator Community Symposium

On Saturday, 1 February 2025, we are running a Research and Educator Community Symposium in collaboration with the Raspberry Pi Computing Education Research Centre.

In this symposium, we will bring together UK educators and researchers to review research themes, competency frameworks, and early international AI curricula and to reflect on how to advance approaches to teaching about AI. This will be a practical day of collaboration to produce suggested key concepts and pedagogical approaches and highlight research needs.

This symposium focuses on teaching about AI technologies, so we will not be looking at which AI tools might be used in general teaching and learning or how they may change teacher productivity.

It is vitally important for young people to learn how to use AI technologies in their daily lives so they can become discerning consumers of AI applications. But how should we teach them? Please help us start to consider the best approach by signing up for our Research and Educator Community Symposium by 9 December 2024.

Information at a glance

When: Saturday, 1 February 2025 (10am to 5pm)

Where: Raspberry Pi Foundation Offices, Cambridge

Who: If you have started teaching about AI, are creating related resources, are providing professional development about AI technologies, or if you are planning to do so, please apply to attend our symposium. Travel funding is available for teachers in England.

Apply by 9 December 2024

Please note we expect to be oversubscribed, so book early and tell us about why you are interested in taking part. We will notify all applicants of the outcome of their application by 11 December.

The post Teaching about AI in schools: Take part in our Research and Educator Community Symposium appeared first on Raspberry Pi Foundation.

Introducing new artificial intelligence and machine learning projects for Code Clubs

2024-10-29 Pete Bell

Post Syndicated from Pete Bell original https://www.raspberrypi.org/blog/artificial-intelligence-projects-for-kids/

We’re pleased to share a new collection of Code Club projects designed to introduce creators to the fascinating world of artificial intelligence (AI) and machine learning (ML). These projects bring the latest technology to your Code Club in fun and inspiring ways, making AI and ML engaging and accessible for young people. We’d like to thank Amazon Future Engineer for supporting the development of this collection.

A man on a blue background, with question marks over his head, surrounded by various objects and animals, such as apples, planets, mice, a dinosaur and a shark.

The value of learning about AI and ML

By engaging with AI and ML at a young age, creators gain a clearer understanding of the capabilities and limitations of these technologies, helping them to challenge misconceptions. This early exposure also builds foundational skills that are increasingly important in various fields, preparing creators for future educational and career opportunities. Additionally, as AI and ML become more integrated into educational standards, having a strong base in these concepts will make it easier for creators to grasp more advanced topics later on.

What’s included in this collection

We’re excited to offer a range of AI and ML projects that feature both video tutorials and step-by-step written guides. The video tutorials are designed to guide creators through each activity at their own pace and are captioned to improve accessibility. The step-by-step written guides support creators who prefer learning through reading.

The projects are crafted to be flexible and engaging. The main part of each project can be completed in just a few minutes, leaving lots of time for customisation and exploration. This setup allows for short, enjoyable sessions that can easily be incorporated into Code Club activities.

The collection is organised into two distinct paths, each offering a unique approach to learning about AI and ML:

Machine learning with Scratch introduces foundational concepts of ML through creative and interactive projects. Creators will train models to recognise patterns and make predictions, and explore how these models can be improved with additional data.

The AI Toolkit introduces various AI applications and technologies through hands-on projects using different platforms and tools. Creators will work with voice recognition, facial recognition, and other AI technologies, gaining a broad understanding of how AI can be applied in different contexts.

Inclusivity is a key aspect of this collection. The projects cater to various skill levels and are offered alongside an unplugged activity, ensuring that everyone can participate, regardless of available resources. Creators will also have the opportunity to stretch themselves — they can explore advanced technologies like Adobe Firefly and practical tools for managing Ollama and Stable Diffusion models on Raspberry Pi computers.

Project examples

A piece of cheese is displayed on a screen. There are multiple mice around the screen.

One of the highlights of our new collection is Chomp the cheese, which uses Scratch Lab’s experimental face recognition technology to create a game students can play with their mouth! This project offers a playful introduction to facial recognition while keeping the experience interactive and fun.

A big orange fish on a dark blue background, with green leaves surrounding the fish.

Fish food uses Machine Learning for Kids, with creators training a model to control a fish using voice commands.

An illustration of a pink brain is displayed on a screen. There are two hands next to the screen playing the 'Rock paper scissors' game.

In Teach a machine, creators train a computer to recognise different objects such as fingers or food items. This project introduces classification in a straightforward way using the Teachable Machine platform, making the concept easy to grasp.

Two men on a blue background, surrounded by question marks, a big green apple and a red tomato.

Apple vs tomato also uses Teachable Machine, but this time creators are challenged to train a model to differentiate between apples and tomatoes. Initially, the model exhibits bias due to limited data, prompting discussions on the importance of data diversity and ethical AI practices.

Three people on a light blue background, surrounded by music notes and a microbit.

Dance detector allows creators to use accelerometer data from a micro:bit to train a model to recognise dance moves like Floss or Disco. This project combines physical computing with AI, helping creators explore movement recognition technology they may have experienced in familiar contexts such as video games.

A green dinosaur in a forest is being observed by a person hiding in the bush holding the binoculars.

Dinosaur decision tree is an unplugged activity where creators use a paper-based branching chart to classify different types of dinosaurs. This hands-on project introduces the concept of decision-making structures, where each branch of the chart represents a choice or question leading to a different outcome. By constructing their own decision tree, creators gain a tactile understanding of how these models are used in ML to analyse data and make predictions.

These AI projects are designed to support young people to get hands-on with AI technologies in Code Clubs and other non-formal learning environments. Creators can also enter one of their projects into Coolest Projects by taking a short video showing their project and any code used to make it. Their creation will then be showcased in the online gallery for people all over the world to see.

The post Introducing new artificial intelligence and machine learning projects for Code Clubs appeared first on Raspberry Pi Foundation.

Watermark for LLM-Generated Text

2024-10-25 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/watermark-for-llm-generated-text.html

Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for the watermark to work, and (2) how robust the watermark is to post-generation editing. Google’s version looks pretty good: it’s detectable in text as small as 200 tokens.

Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon Bedrock

2024-10-22 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/upgraded-claude-3-5-sonnet-from-anthropic-available-now-computer-use-public-beta-and-claude-3-5-haiku-coming-soon-in-amazon-bedrock/

Four months ago, we introduced Anthropic’s Claude 3.5 in Amazon Bedrock, raising the industry bar for AI model intelligence while maintaining the speed and cost of Claude 3 Sonnet.

Today, I am excited to announce three new capabilities for the Claude 3.5 model family in Amazon Bedrock:

Upgraded Claude 3.5 Sonnet – You now have access to an upgraded Claude 3.5 Sonnet model that builds upon its predecessor’s strengths, offering even more intelligence at the same cost. Claude 3.5 Sonnet continues to improve its capability to solve real-world software engineering tasks and follow complex, agentic workflows. The upgraded Claude 3.5 Sonnet helps across the entire software development lifecycle, from initial design to bug fixes, maintenance, and optimizations. With these capabilities, the upgraded Claude 3.5 Sonnet model can help build more advanced chatbots with a warm, human-like tone. Other use cases in which the upgraded model excels include knowledge Q&A platforms, data extraction from visuals like charts and diagrams, and automation of repetitive tasks and operations.

Computer use – Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces. Developers can direct Claude to use computers the way people do: by looking at a screen, moving a cursor, clicking buttons, and typing text. This works by giving the model access to integrated tools that can return computer actions, like keystrokes and mouse clicks, editing text files, and running shell commands. Software developers can integrate computer use in their solutions by building an action-execution layer and grant screen access to Claude 3.5 Sonnet. In this way, software developers can build applications with the ability to perform computer actions, follow multiple steps, and check their results. Computer use opens new possibilities for AI-powered applications. For example, it can help automate software testing and back office tasks and implement more advanced software assistants that can interact with applications. Given this technology is early, developers are encouraged to explore lower-risk tasks and use it in a sandbox environment.

Claude 3.5 Haiku – The new Claude 3.5 Haiku is coming soon and combines rapid response times with improved reasoning capabilities, making it ideal for tasks that require both speed and intelligence. Claude 3.5 Haiku improves on its predecessor and matches the performance of Claude 3 Opus (previously Claude’s largest model) at the speed and cost of Claude 3 Haiku. Claude 3.5 Haiku can help with use cases such as fast and accurate code suggestions, highly interactive chatbots that need rapid response times for customer service, e-commerce solutions, and educational platforms. For customers dealing with large volumes of unstructured data in finance, healthcare, research, and more, Claude 3.5 Haiku can help efficiently process and categorize information.

According to Anthropic, the upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with significant gains in coding, an area where it already excelled. The upgraded Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks. On coding, it improves performance on SWE-bench Verified from 33% to 49%, scoring higher than all publicly available models. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the airline domain. The following table includes the model evaluations provided by Anthropic.

Computer use, a new frontier in AI interaction
Instead of restricting the model to use APIs, Claude has been trained on general computer skills, allowing it to use a wide range of standard tools and software programs. In this way, applications can use Claude to perceive and interact with computer interfaces. Software developers can integrate this API to enable Claude to translate prompts (for example, “find me a hotel in Rome”) into specific computer commands (open a browser, navigate this website, and so on).

More specifically, when invoking the model, software developers now have access to three new integrated tools that provide a virtual set of hands to operate a computer:

Computer tool – This tool can receive as input a screenshot and a goal and returns a description of the mouse and keyboard actions that should be performed to achieve that goal. For example, this tool can ask to move the cursor to a specific position, click, type, and take screenshots.
Text editor tool – Using this tool, the model can ask to perform operations like viewing file contents, creating new files, replacing text, and undoing edits.
Bash tool – This tool returns commands that can be run on a computer system to interact at a lower level as a user typing in a terminal.

These tools open up a world of possibilities for automating complex tasks, from data analysis and software testing to content creation and system administration. Imagine an application powered by Claude 3.5 Sonnet interacting with the computer just as a human would, navigating through multiple desktop tools including terminals, text editors, internet browsers, and also capable of filling out forms and even debugging code.

We’re excited to help software developers explore these new capabilities with Amazon Bedrock. We expect this capability to improve rapidly in the coming months, and Claude’s current ability to use computers has limits. Some actions such as scrolling, dragging, or zooming can present challenges for Claude, and we encourage you to start exploring low-risk tasks.

When looking at OSWorld, a benchmark for multimodal agents in real computer environments, the upgraded Claude 3.5 Sonnet currently gets 14.9%. While human-level skill is far ahead with about 70-75%, this result is much better than the 7.7% obtained by the next-best model in the same category.

Using the upgraded Claude 3.5 Sonnet in the Amazon Bedrock console
To get started with the upgraded Claude 3.5 Sonnet, I navigate to the Amazon Bedrock console and choose Model access in the navigation pane. There, I request access for the new Claude 3.5 Sonnet V2 model.

To test the new vision capability, I open another browser tab and download from the Our World in Data website the Wind power generation chart in PNG format.

Back in the Amazon Bedrock console, I choose Chat/text under Playgrounds in the navigation pane. For the model, I select Anthropic as the model provider and then Claude 3.5 Sonnet V2.

I use the three vertical dots in the input section of the chat to upload the image file from my computer. Then I enter this prompt:

Which are the top countries for wind power generation? Answer only in JSON.

The result follows my instructions and returns the list extracting the information from the image.

Using the upgraded Claude 3.5 Sonnet with AWS CLI and SDKs
Here’s a sample AWS Command Line Interface (AWS CLI) command using the Amazon Bedrock Converse API. I use the --query parameter of the CLI to filter the result and only show the text content of the output message:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
    --messages '[{ "role": "user", "content": [ { "text": "What do you throw out when you want to use it, but take in when you do not want to use it?" } ] }]' \
    --query 'output.message.content[*].text' \
    --output text

In output, I get this text in the response.

An anchor! You throw an anchor out when you want to use it to stop a boat, but you take it in (pull it up) when you don't want to use it and want to move the boat.

The AWS SDKs implement a similar interface. For example, you can use the AWS SDK for Python (Boto3) to analyze the same image as in the console example:

import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
IMAGE_NAME = "wind-generation.png"

bedrock_runtime = boto3.client("bedrock-runtime")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which are the top countries for wind power generation? Answer only in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Integrating computer use with your application
Let’s see how computer use works in practice. First, I take a snapshot of the desktop of a Ubuntu system:

This screenshot is the starting point for the steps that will be implemented by computer use. To see how that works, I run a Python script passing in input to the model the screenshot image and this prompt:

Find me a hotel in Rome.

This script invokes the upgraded Claude 3.5 Sonnet in Amazon Bedrock using the new syntax required for computer use:

import base64
import json
import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"

IMAGE_NAME = "ubuntu-screenshot.png"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

image_base64 = base64.b64encode(image).decode("utf-8")

prompt = "Find me a hotel in Rome."

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_base64,
                    },
                },
            ],
        }
    ],
    "tools": [
        { # new
            "type": "computer_20241022", # literal / constant
            "name": "computer", # literal / constant
            "display_height_px": 1280, # min=1, no max
            "display_width_px": 800, # min=1, no max
            "display_number": 0 # min=0, max=N, default=None
        },
        { # new
            "type": "bash_20241022", # literal / constant
            "name": "bash", # literal / constant
        },
        { # new
            "type": "text_editor_20241022", # literal / constant
            "name": "str_replace_editor", # literal / constant
        }
    ],
    "anthropic_beta": ["computer-use-2024-10-22"],
}

# Convert the native request to JSON.
request = json.dumps(body)

try:
    # Invoke the model with the request.
    response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=request)

except Exception as e:
    print(f"ERROR: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())
print(model_response)

The body of the request includes new options:

anthropic_beta with value ["computer-use-2024-10-22"] to enable computer use.
The tools section supports a new type option (set to custom for the tools you configure).
Note that the computer tool needs to know the resolution of the screen (display_height_px and display_width_px).

To follow my instructions with computer use, the model provides actions that operate on the desktop described by the input screenshot.

The response from the model includes a tool_use section from the computer tool that provides the first step. The model has found in the screenshot the Firefox browser icon and the position of the mouse arrow. Because of that, it now asks to move the mouse to specific coordinates to start the browser.

{
    "id": "msg_bdrk_01WjPCKnd2LCvVeiV6wJ4mm3",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-sonnet-20241022",
    "content": [
        {
            "type": "text",
            "text": "I'll help you search for a hotel in Rome. I see Firefox browser on the desktop, so I'll use that to access a travel website.",
        },
        {
            "type": "tool_use",
            "id": "toolu_bdrk_01CgfQ2bmQsPFMaqxXtYuyiJ",
            "name": "computer",
            "input": {"action": "mouse_move", "coordinate": [35, 65]},
        },
    ],
    "stop_reason": "tool_use",
    "stop_sequence": None,
    "usage": {"input_tokens": 3443, "output_tokens": 106},
}

This is just the first step. As with usual tool use requests, the script should reply with the result of using the tool (moving the mouse in this case). Based on the initial request to book a hotel, there would be a loop of tool use interactions that will ask to click on the icon, type a URL in the browser, and so on until the hotel has been booked.

A more complete example is available in this repository shared by Anthropic.

Things to know
The upgraded Claude 3.5 Sonnet is available today in Amazon Bedrock in the US West (Oregon) AWS Region and is offered at the same cost as the original Claude 3.5 Sonnet. For up-to-date information on regional availability, refer to the Amazon Bedrock documentation. For detailed cost information for each Claude model, visit the Amazon Bedrock pricing page.

In addition to the greater intelligence of the upgraded model, software developers can now integrate computer use (available in public beta) in their applications to automate complex desktop workflows, enhance software testing processes, and create more sophisticated AI-powered applications.

Claude 3.5 Haiku will be released in the coming weeks, initially as a text-only model and later with image input.

You can see how computer use can help with coding in this video with Alex Albert, Head of Developer Relations at Anthropic.

This other video describes computer use for automating operations.

To learn more about these new features, visit the Claude models section of the Amazon Bedrock documentation. Give the upgraded Claude 3.5 Sonnet a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

– Danilo

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

2024-10-21 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-agentic-workflows-amazon-transcribe-aws-lambda-insights-and-more-october-21-2024/

Agentic workflows are quickly becoming a cornerstone of AI innovation, enabling intelligent systems to autonomously handle and refine complex tasks in a way that mirrors human problem-solving. Last week, we launched Serverless Agentic Workflows with Amazon Bedrock, a new short course developed in collaboration with Dr. Andrew Ng and DeepLearning.AI.

This hands-on course, taught by my colleague Mike Chambers, teaches how to build serverless agents that can handle complex tasks without the hassle of managing infrastructure. You will learn everything you need to know about integrating tools, automating workflows, and deploying responsible agents with built-in guardrails on Amazon Web Services (AWS) with Amazon Bedrock. The hands-on labs provided with the course let you apply your knowledge directly in an AWS environment, hosted by AWS Partner Vocareum. Find more information and enroll for free on the DeepLearning.AI course page.

Now, let’s turn our attention to other exciting news in the AWS universe from last week.

Last week’s launches
Here are some launches that got my attention:

Amazon Transcribe now supports streaming transcription in 30 additional languages – Amazon Transcribe has expanded its support to include 30 additional languages, bringing the total number of supported languages to 54. This enhancement helps you reach a broader global audience and improves accessibility across various industries, including contact centers, broadcasting, and e-learning. The expanded language support allows for more efficient content moderation, improved agent productivity, and automatic subtitling for live events and meetings.

AWS Lambda console now surfaces key function insights and supports real-time log analytics – The AWS Lambda console now features a built-in Amazon CloudWatch Metrics Insights dashboard and supports CloudWatch Logs Live Tail, providing instant visibility into critical function metrics and real-time log streaming. You can now identify and troubleshoot errors or performance issues for your Lambda functions without leaving the console, as well as view and analyze logs in real time as they become available. You can reduce context switching and accelerate the development and troubleshooting processes for serverless applications. Check out the launch post for more details.

Amazon Bedrock Model Evaluation now supports evaluating custom model import models – You can now evaluate custom models you’ve imported to Amazon Bedrock using the model evaluation feature. This helps you to complete the full cycle of selecting, customizing, and evaluating models before deploying them. To evaluate an imported model, select the custom model from the list of models to evaluate in the model selector tool when creating an evaluation job.

Amazon Q in AWS Supply Chain – You can now use Amazon Q, an interactive AI assistant, to analyze your supply chain data in AWS Supply Chain and get insights to operate your supply chain more efficiently. Amazon Q can answer your supply chain questions by diving into your data. This reduces the time spent searching for information and streamlines finding answers to improve your supply chain operations.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and posts that you might find interesting:

New Amazon OpenSearch Service YouTube channel – The channel offers bite-sized tutorials, curated content, and organized playlists on topics such as log analytics, semantic search, vector databases, and operational best practices. You can also provide feedback to influence future channel content and the OpenSearch Service roadmap. Check out the launch post for more details and subscribe to the Amazon OpenSearch Service YouTube channel.

Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – This post shows you how to use Amazon EKS to orchestrate the deployment of pods containing NVIDIA NIM microservices, to enable quick-to-setup and optimized large-scale large language model (LLM) inference on Amazon EC2 G5 instances. It also demonstrates how to scale (both pod and cluster) by monitoring for custom metrics through Prometheus, and how you can load balance using an Application Load Balancer.

Instant Well-Architected CDK Resources with Solutions Constructs Factories – You can now create well-architected AWS resources such as Amazon Simple Storage Service (Amazon S3) buckets and AWS Step Functions state machines with a single function call using the new AWS Solutions Constructs Factories. These factories handle all the best practices configuration for you while still allowing customization. Try using a Constructs factory the next time you need to deploy one of the supported resources.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS GenAI Lofts – AWS GenAI Lofts are about more than just the tech, they bring together startups, developers, investors, and industry experts. Whether you’re looking to gain deep insights, or get your questions answered by generative AI pros, our GenAI Lofts have you covered and provide everything you need to start building your next innovation. Join events in London (through October 25), Seoul (October 30–November 6), São Paulo (through November 20), and Paris (through November 25).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Malta (November 8), Chile (November 9), and Kochi, India (December 14).

AWS re:Invent – Registration is now open for the annual tech extravaganza, taking place December 2–6 in Las Vegas. At re:Invent 2024, you’ll get a front row seat to hear real stories from customers and AWS leaders about navigating pressing topics, such as generative AI. Learn about new product launches, watch demos, and get behind-the-scenes insights during five headline-making keynotes.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AI and the SEC Whistleblower Program

2024-10-21 B. Schneier

Post Syndicated from B. Schneier original https://www.schneier.com/blog/archives/2024/10/ai-and-the-sec-whistleblower-program.html

Tax farming is the practice of licensing tax collection to private contractors. Used heavily in ancient Rome, it’s largely fallen out of practice because of the obvious conflict of interest between the state and the contractor. Because tax farmers are primarily interested in short-term revenue, they have no problem abusing taxpayers and making things worse for them in the long term. Today, the U.S. Securities and Exchange Commission (SEC) is engaged in a modern-day version of tax farming. And the potential for abuse will grow when the farmers start using artificial intelligence.

In 2009, after Bernie Madoff’s $65 billion Ponzi scheme was exposed, Congress authorized the SEC to award bounties from civil penalties recovered from securities law violators. It worked in a big way. In 2012, when the program started, the agency received more than 3,000 tips. By 2020, it had more than doubled, and it more than doubled again by 2023. The SEC now receives more than 50 tips per day, and the program has paid out a staggering $2 billion in bounty awards. According to the agency’s 2023 financial report, the SEC paid out nearly $600 million to whistleblowers last year.

The appeal of the whistleblower program is that it alerts the SEC to violations it may not otherwise uncover, without any additional staff. And since payouts are a percentage of fines collected, it costs the government little to implement.

Unfortunately, the program has resulted in a new industry of private de facto regulatory enforcers. Legal scholar Alexander Platt has shown how the SEC’s whistleblower program has effectively privatized a huge portion of financial regulatory enforcement. There is a role for publicly sourced information in securities regulatory enforcement, just as there has been in litigation for antitrust and other areas of the law. But the SEC program, and a similar one at the U.S. Commodity Futures Trading Commission, has created a market distortion replete with perverse incentives. Like the tax farmers of history, the interests of the whistleblowers don’t match those of the government.

First, while the blockbuster awards paid out to whistleblowers draw attention to the SEC’s successes, they obscure the fact that its staffing level has slightly declined during a period of tremendous market growth. In one case, the SEC’s largest ever, it paid $279 million to an individual whistleblower. That single award was nearly one-third of the funding of the SEC’s entire enforcement division last year. Congress gets to pat itself on the back for spinning up a program that pays for itself (by law, the SEC awards 10 to 30 percent of their penalty collections over $1 million to qualifying whistleblowers), when it should be talking about whether or not it’s given the agency enough resources to fulfill its mission to “maintain fair, orderly, and efficient markets.”

Second, while the stated purpose of the whistleblower program is to incentivize individuals to come forward with information about potential violations of securities law, this hasn’t actually led to increases in enforcement actions. Instead of legitimate whistleblowers bringing the most credible information to the SEC, the agency now seems to be deluged by tips that are not highly actionable.

But the biggest problem is that uncovering corporate malfeasance is now a legitimate business model, resulting in powerful firms and misaligned incentives. A single law practice led by former SEC assistant director Jordan Thomas captured about 20 percent of all the SEC’s whistleblower awards through 2022, at which point Thomas left to open up a new firm focused exclusively on whistleblowers. We can admire Thomas and his team’s impact on making those guilty of white-collar crimes pay, and also question whether hundreds of millions of dollars of penalties should be funneled through the hands of an SEC insider turned for-profit business mogul.

Whistleblower tips can be used as weapons of corporate warfare. SEC whistleblower complaints are not required to come from inside a company, or even to rely on insider information. They can be filed on the basis of public data, as long as the whistleblower brings original analysis. Companies might dig up dirt on their competitors and submit tips to the SEC. Ransomware groups have used the threat of SEC whistleblower tips as a tactic to pressure the companies they’ve infiltrated into paying ransoms.

The rise of whistleblower firms could lead to them taking particular “assignments” for a fee. Can a company hire one of these firms to investigate its competitors? Can an industry lobbying group under scrutiny (perhaps in cryptocurrencies) pay firms to look at other industries instead and tie up SEC resources? When a firm finds a potential regulatory violation, do they approach the company at fault and offer to cease their research for a “kill fee”? The lack of transparency and accountability of the program means that the whistleblowing firms can get away with practices like these, which would be wholly unacceptable if perpetrated by the SEC itself.

Whistleblowing firms can also use the information they uncover to guide market investments by activist short sellers. Since 2006, the investigative reporting site Sharesleuth claims to have tanked dozens of stocks and instigated at least eight SEC cases against companies in pharma, energy, logistics, and other industries, all after its investors shorted the stocks in question. More recently, a new investigative reporting site called Hunterbrook Media and partner hedge fund Hunterbrook Capital, have churned out 18 investigative reports in their first five months of operation and disclosed short sales and other actions alongside each. In at least one report, Hunterbrook says they filed an SEC whistleblower tip.

Short sellers carry an important disciplining function in markets. But combined with whistleblower awards, the same profit-hungry incentives can emerge. Properly staffed regulatory agencies don’t have the same potential pitfalls.

AI will affect every aspect of this dynamic. AI’s ability to extract information from large document troves will help whistleblowers provide more information to the SEC faster, lowering the bar for reporting potential violations and opening a floodgate of new tips. Right now, there is no cost to the whistleblower to report minor or frivolous claims; there is only cost to the SEC. While AI automation will also help SEC staff process tips more efficiently, it could exponentially increase the number of tips the agency has to deal with, further decreasing the efficiency of the program.

AI could be a triple windfall for those law firms engaged in this business: lowering their costs, increasing their scale, and increasing the SEC’s reliance on a few seasoned, trusted firms. The SEC already, as Platt documented, relies on a few firms to prioritize their investigative agenda. Experienced firms like Thomas’s might wield AI automation to the greatest advantage. SEC staff struggling to keep pace with tips might have less capacity to look beyond the ones seemingly pre-vetted by familiar sources.

But the real effects will be on the conflicts of interest between whistleblowing firms and the SEC. The ability to automate whistleblower reporting will open new competitive strategies that could disrupt business practices and market dynamics.

An AI-assisted data analyst could dig up potential violations faster, for a greater scale of competitor firms, and consider a greater scope of potential violations than any unassisted human could. The AI doesn’t have to be that smart to be effective here. Complaints are not required to be accurate; claims based on insufficient evidence could be filed against competitors, at scale.

Even more cynically, firms might use AI to help cover up their own violations. If a company can deluge the SEC with legitimate, if minor, tips about potential wrongdoing throughout the industry, it might lower the chances that the agency will get around to investigating the company’s own liabilities. Some companies might even use the strategy of submitting minor claims about their own conduct to obscure more significant claims the SEC might otherwise focus on.

Many of these ideas are not so new. There are decades of precedent for using algorithms to detect fraudulent financial activity, with lots of current-day application of the latest large language models and other AI tools. In 2019, legal scholar Dimitrios Kafteranis, research coordinator for the European Whistleblowing Institute, proposed using AI to automate corporate whistleblowing.

And not all the impacts specific to AI are bad. The most optimistic possible outcome is that AI will allow a broader base of potential tipsters to file, providing assistive support that levels the playing field for the little guy.

But more realistically, AI will supercharge the for-profit whistleblowing industry. The risks remain as long as submitting whistleblower complaints to the SEC is a viable business model. Like tax farming, the interests of the institutional whistleblower diverge from the interests of the state, and no amount of tweaking around the edges will make it otherwise.

Ultimately, AI is not the cause of or solution to the problems created by the runaway growth of the SEC whistleblower program. But it should give policymakers pause to consider the incentive structure that such programs create, and to reconsider the balance of public and private ownership of regulatory enforcement.

This essay was written with Nathan Sanders, and originally appeared in The American Prospect.

More on My AI and Democracy Book

2024-10-11 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/more-on-my-ai-and-democracy-book.html

In July, I wrote about my new book project on AI and democracy, to be published by MIT Press in fall 2025. My co-author and collaborator Nathan Sanders and I are hard at work writing.

At this point, we would like feedback on titles. Here are four possibilities:

Rewiring the Republic: How AI Will Transform our Politics, Government, and Citizenship
The Thinking State: How AI Can Improve Democracy
Better Run: How AI Can Make our Politics, Government, Citizenship More Efficient, Effective and Fair
AI and the New Future of Democracy: Changes in Politics, Government, and Citizenship

What we want out of the title is that it convey (1) that it is a book about AI, (2) that it is a book about democracy writ large (and not just deepfakes), and (3) that it is largely optimistic.

What do you like? Feel free to do some mixing and matching: swapping “Will Transform” for “Will Improve” for “Can Transform” for “Can Improve,” for example. Or “Democracy” for “the Republic.” Remember, the goal here is for a title that will make a potential reader pick the book up off a shelf, or read the blurb text on a webpage. It needs to be something that will catch the reader’s attention. (Other title ideas are here).

Also, FYI, this is the current table of contents:

Introduction
1. Introduction: How AI will Change Democracy
2. Core AI Capabilities
3. Democracy as an Information System

Part I: AI-Assisted Politics
4. Background: Making Mistakes
5. Talking to Voters
6. Conducting Polls
7. Organizing a Political Campaign
8. Fundraising for Politics
9. Being a Politician

Part II: AI-Assisted Legislators
10. Background: Explaining Itself
11. Background: Who’s to Blame?
12. Listening to Constituents
13. Writing Laws
14. Writing More Complex Laws
15. Writing Laws that Empower Machines
16. Negotiating Legislation

Part III: The AI-Assisted Administration
17. Background: Exhibiting Values and Bias
18. Background: Augmenting Versus Replacing People
19. Serving People
20. Operating Government
21. Enforcing Regulations

Part IV: The AI-Assisted Court
22. Background: Being Fair
23. Background: Getting Hacked
24. Acting as a Lawyer
25. Arbitrating Disputes
26. Enforcing the Law
27. Reshaping Legislative Intent
28. Being a Judge

Part V: AI-Assisted Citizens
29. Background: AI and Power
30. Background: AI and Trust
31. Explaining the News
32. Watching the Government
33. Moderating, Facilitating, and Building Consensus
34. Acting as Your Personal Advocate
35. Acting as Your Personal Political Proxy

Part VI: Ensuring That AI Benefits Democracy
36. Why AI is Not Yet Good for Democracy
37. How to Ensure AI is Good for Democracy
38. What We Need to Do Now
39. Conclusion

Everything is subject to change, of course. The manuscript isn’t due to the publisher until the end of March, and who knows what AI developments will happen between now and then.

EDITED: The title under consideration is “Rewiring the Republic,” and not “Rewiring Democracy.” Although, I suppose, both are really under consideration.

Convert AWS console actions to reusable code with AWS Console-to-Code, now generally available

2024-10-10 Abhishek Gupta

Post Syndicated from Abhishek Gupta original https://aws.amazon.com/blogs/aws/convert-aws-console-actions-to-reusable-code-with-aws-console-to-code-now-generally-available/

Today, we are announcing the general availability (GA) of AWS Console-to-Code that makes it easy to convert AWS console actions to reusable code. You can use AWS Console-to-Code to record your actions and workflows in the console, such as launching an Amazon Elastic Compute Cloud (Amazon EC2) instance, and review the AWS Command Line Interface (AWS CLI) commands for your console actions. With just a few clicks, Amazon Q can generate code for you using the infrastructure-as-code (IaC) format of your choice, including AWS CloudFormation template (YAML or JSON), and AWS Cloud Development Kit (AWS CDK) (TypeScript, Python or Java). This can be used as a starting point for infrastructure automation and further customized for your production workloads, included in pipelines, and more.

Since we announced the preview last year, AWS Console-to-Code has garnered positive response from customers. It has now been improved further in this GA version, because we have continued to work backwards from customer feedback.

New features in GA

Support for more services – During preview, the only supported service was Amazon EC2. At GA, AWS Console-to-Code has extended support to include Amazon Relational Database Service (RDS) and Amazon Virtual Private Cloud (Amazon VPC).
Simplified experience – The new user experience makes it easier for customers to manage the prototyping, recording and code generation workflows.
Preview code – The launch wizards for EC2 instances and Auto Scaling groups have been updated to allow customers to generate code for these resources without actually creating them.
Advanced code generation – AWS CDK and CloudFormation code generation is powered by Amazon Q machine learning models.

Getting started with AWS Console-to-Code
Let’s begin with a simple scenario of launching an Amazon EC2 instance. Start by accessing the Amazon EC2 console. Locate the AWS Console-to-Code widget on the right and choose Start recording to initiate the recording.

Now, launch an Amazon EC2 instance using the launch instance wizard in the Amazon EC2 console. After the instance is launched, choose Stop to complete the recording.

In the Recorded actions table, review the actions that were recorded. Use the Type dropdown list to filter by write actions (Write). Choose the RunInstances action. Select Copy CLI to copy the corresponding AWS CLI command.

This is the CLI command that I got from AWS Console-to-Code:

aws ec2 run-instances \
  --image-id "ami-066784287e358dad1" \
  --instance-type "t2.micro" \
  --network-interfaces '{"AssociatePublicIpAddress":true,"DeviceIndex":0,"Groups":["sg-1z1c11zzz1c11zzz1"]}' \
  --credit-specification '{"CpuCredits":"standard"}' \
  --tag-specifications '{"ResourceType":"instance","Tags":[{"Key":"Name","Value":"c2c-demo"}]}' \
  --metadata-options '{"HttpEndpoint":"enabled","HttpPutResponseHopLimit":2,"HttpTokens":"required"}' \
  --private-dns-name-options '{"HostnameType":"ip-name","EnableResourceNameDnsARecord":true,"EnableResourceNameDnsAAAARecord":false}' \
  --count "1"

This command can be easily modified. For this example, I updated it to launch two instances (--count 2) of type t3.micro (--instance-type). This is a simplified example, but the same technique can be applied to other workflows.

I executed the command using AWS CloudShell and it worked as expected, launching two t3.micro EC2 instances:

The single-click CLI code generation experience is based on the API commands that were used when actions were executed (while launching the EC2 instance). Its interesting to note that the companion screen surfaces recorded actions as you complete them in console. And thanks to the interactive UI with start and stop functionality, its easy to clearly scope actions for prototyping.

IaC generation using AWS CDK
AWS CDK is an open-source framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. With AWS Console-to-Code, you can generate AWS CDK code (currently in Java, Python and TypeScript) for your infrastructure workflows.

Lets continue with the EC2 launch instance use case. If you haven’t done it already, in the Amazon EC2 console, locate the AWS Console-to-Code widget on the right, choose Start recording, and launch an EC2 instance. After the instance is launched, choose Stop to complete the recording and choose the RunInstances action from the Recorded actions table.

To generate AWS CDK Python code, choose the Generate CDK Python button from the dropdown list.

You can use the code as a starting point, customizing it to make it production-ready for your specific use case.

I already had the AWS CDK installed, so I created a new Python CDK project:

mkdir c2c_cdk_demo
cd c2c_cdk_demo
cdk init app --language python

Then, I plugged in the generated code in the Python CDK project. For this example, I refactored the code into a AWS CDK Stack, changed the EC2 instance type, and made other minor changes to ensure that the code was correct. I successfully deployed it using cdk deploy.

I was able to go from the console action to launch an EC2 instance and then all the way to AWS CDK to reproduce the same result.

from aws_cdk import (
    Stack,
    aws_ec2 as ec2,
)
from constructs import Construct

class MyProjectStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        existing_vpc = ec2.Vpc.from_lookup(self, "ExistingVPC",
            is_default=True
        )

        instance = ec2.Instance(self, "Instance",
                instance_type=ec2.InstanceType("t3.micro"),
                machine_image=ec2.AmazonLinuxImage(),
                vpc=existing_vpc,
                vpc_subnets=ec2.SubnetSelection(
                    subnet_type=ec2.SubnetType.PUBLIC
                )
        )

You can also generate CloudFormation template in YAML or JSON format:

Preview code
You can also directly access AWS Console-to-Code from Preview code feature in Amazon EC2 and Amazon EC2 Auto Scaling group launch experience. This means that you don’t have to actually create the resource in order to get the infrastructure code.

To try this out, follow the steps to create an Auto Scaling group using a launch template. However, instead of Create Auto Scaling group, click Preview code. You should now see the options to generate infrastructure code or copy the AWS CLI command.

Things to know
Here are a few things you should consider while using AWS Console-to-Code:

Anyone can use AWS Console-to-Code to generate AWS CLI commands for their infrastructure workflows. The code generation feature for AWS CDK and CloudFormation formats has a free quota of 25 generations per month, after which you will need an Amazon Q Developer subscription.
It’s recommended that you test and verify the generated IaC code code before deployment.
At GA, AWS Console-to-Code only records actions in Amazon EC2, Amazon VPC and Amazon RDS consoles.
The Recorded actions table in AWS Console-to-Code only display actions taken during the current session within the specific browser tab, and it does not retain actions from previous sessions or other tabs. Note that refreshing the browser tab will result in the loss of all recorded actions.

Now available
AWS Console-to-Code is available in all commercial Regions. You can learn more about it in the Amazon EC2 documentation. Give it a try in the Amazon EC2 console and send feedback to the AWS re:Post for Amazon EC2 or through your usual AWS Support contacts.

Deebot Robot Vacuums Are Using Photos and Audio to Train Their AI

2024-10-10 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/deebot-robot-vacuums-are-using-photos-and-audio-to-train-their-ai.html

An Australian news agency is reporting that robot vacuum cleaners from the Chinese company Deebot are surreptitiously taking photos and recording audio, and sending that data back to the vendor to train their AIs.

Ecovacs’s privacy policy—available elsewhere in the app—allows for blanket collection of user data for research purposes, including:

The 2D or 3D map of the user’s house generated by the device
Voice recordings from the device’s microphone
Photos or videos recorded by the device’s camera

It also states that voice recordings, videos and photos that are deleted via the app may continue to be held and used by Ecovacs.

No word on whether the recorded audio is being used to train the vacuum in some way, or whether it is being used to train a LLM.

Slashdot thread.

AWS Weekly Roundup: HIPAA eligible with Amazon Q Business, Amazon DCV, AWS re:Post Agent, and more (Oct 07, 2024)

2024-10-07 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-q-business-is-hipaa-eligible-amazon-dcv-aws-repost-agent-and-more-oct-07-2024/

Last Friday, I had the privilege of attending China Engineer’s Day 2024(CED 2024) in Hangzhou as the Amazon Web Services (AWS) speaker. The event was organized by the China Computer Federation (CCF), one of the most influential professional developer communities in China.

At CED 2024, I spoke about how AI development tools can improve developer productivity. I was honored to receive a certificate of excellence from CCF, and Amazon Q garnered significant attention from the attendees.

Now, let’s turn to other exciting news in the AWS universe from last week.

Last week’s launches
Here are some launches that got my attention:

Amazon Q Business is now HIPAA eligible – Amazon Q business has received Health Insurance Portability and Accountability Act (HIPAA) certification. This means healthcare and life sciences organizations such as health insurance companies and healthcare providers can now use Amazon Q Business to run sensitive workloads regulated under the US HIPAA law.

NICE DCV renames to Amazon DCV – NICE DCV is rebranded to Amazon DCV. This high performance remote display protocol allows secure delivery of remote desktops and application streaming from any cloud or data center to any device, even over varying network conditions. Amazon DCV supports both Windows and major Linux distributions on the server side. Clients can use native DCV client for Windows, Linux, or macOS, as well as web browsers, to receive desktops and application streamings. The DCV server and client only transfer encrypted pixels, not data, ensuring no confidential information is downloaded. When using Amazon DCV on AWS with Amazon Elastic Compute Cloud (Amazon EC2), you can take advantage of the AWS 108 Availability Zones across the 33 geographic Regions and 31 local zones. The 2024.0 release now supports the latest Ubuntu 24.04 LTS. For more details, check out Sébastien Stormacq’s new launch blog post.

AWS re:Post launches re:Post Agent – AWS re:Post provides access to curated knowledge and a vibrant community that helps users become even more successful on AWS. re:Post Agent is a generative AI assistant designed to provide rapid, intelligent responses to questions in the re:Post community. It expands the available AWS knowledge base, and community experts will earn reputation points by reviewing the AI-generated answers.

Advanced configuration with Amazon Timestream for InfluxDB – This new launch introduces a feature that allows uses to monitor instance CPU, memory, and disk utilization metrics directly from the AWS Management Console.

A new stop ingestion API of Amazon Bedrock Knowledge Bases – This new API allows users to halt ongoing ingestion jobs at will. Providing greater control over data ingestion workflows, users can quickly stop accidental or unwanted ingestion processes without waiting for completion. By using the new StopIngestionJob API, you can respond rapidly to evolving needs and potentially reduce costs. This capability is available across all AWS Regions where Amazon Bedrock Knowledge Bases are offered.

Higher storage limit of Amazon AppStream 2.0 – Amazon AppStream 2.0 has expanded the default size limit for application settings persistence from 1 GB to 5 GB. This increase allows end users to store more application data and settings without manual intervention and without affecting performance or session setup time.

There were over 40 launches and releases last week. It was difficult for me to select the important ones. In addition to those already mentioned, here’s a list of potentially important feature updates:

For a full list of AWS announcements, be sure to keep an eye on AWS’s What’s New Feed page.

Other AWS news
Here are some other noteworthy items from last week.

Amazon WorkSpaces Thin Client – Amazon WorkSpaces Thin Client inventory is now available to purchase in the UK on Amazon Business, in addition to the US, France, Germany, Italy, and Spain. It’s a sleek, cost-effective device that brings secure access to AWS end user computing services right to your fingertips. This nifty gadget is like a digital fortress, preventing unauthorized data storage and applications, while giving IT admins the tools to manage and monitor their fleet of thin clients with ease.

Helping communities impacted by Hurricane Helene – AWS Disaster Response team is working closely with local partners and humanitarian organizations to deliver critical supplies to those in need in the Southeast. We’re also deploying AWS technology to help with re-connectivity, aid relief operations on the ground, and support food distribution needs in the region.

The life of a prescription at Amazon Pharmacy – Read the Amazon Pharmacy AI use case to remove the complexity of the process of dispensing medications and improve patients’ experiences. The system transcribes raw prescription data into standardized formats, transforms medical abbreviations into full-text equivalents, and validates medication details against an industry database. This automated process, followed by pharmacist review, has reduced potential medication errors by 50 percent and improved processing speed by up to 90 percent, allowing pharmacists to focus on critical tasks and personalized care.

A thought leadership article on generative AI in the WIRED magazine – Read Antje‘s news column in Wired. It discusses how AWS opens the transformative power of AI to organizations of any size and level of experience. I recommend it to all AI enthusiasts and business innovators. AWS is on a mission to bring generative AI magic to businesses of all sizes, offering a buffet of AI tools for tech wizards and newcomers alike. Whether you’re a startup with big dreams or a corporate giant looking to stay ahead, AWS is rolling out the red carpet to the AI revolution. Don’t miss this chance to turn your wildest tech fantasies into reality!

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS re:Invent 2024 – Registration is now open for the annual tech extravaganza, taking place December 2 – 6 in Las Vegas. I’m eager to learn about the new launches and excited to contribute to two chalk talks focusing on security topics (Dev311 – Enhance code security with generative AI and SEC228 – Navigate multi-level protection scheme compliance in AWS China Regions).

AWS Innovate Migrate, Modernize, and Build – Whether you are new to the cloud or an experienced user, you will learn something new at AWS Innovate. This is a free online conference. Register at a time and region convenient to North America (October 15), or Europe, Middle East & Africa (October 24).

AWS Community Days – Join community-led conferences featuring technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Don’t miss out on the AWS Community Days happening on October 12 in Sofia and October 19 in Vadodara, Spain, and Guatemala.

Browse more upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Betty

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Expanded account resource awareness with Amazon Q Developer

Amazon Q Developer understands where you are in the console

AWS ChatBot can now answer questions about AWS resources in Microsoft Teams and Slack

Conclusion

About the authors

Overview of security of the RAG ingestion workflow

Security risks of the RAG data ingestion pipeline and the need for ingest time filtering

External data source ingestion risk: Examples of indirect prompt injection

Potential mitigations

Architectural pattern 1: Using format breakers and Amazon Textract as document filters

Additional considerations for RAG data ingestion pipeline security

Conclusion

Common use cases

Solution overview

Prerequisites

Deploy the solution

Query documents with different personas

Modify user access

Clean up

Conclusion

About the Authors

See you next time!

Researching AI curriculum design

A call for teacher input on the AI curriculum

Join our Research and Educator Community Symposium

Information at a glance

The value of learning about AI and ML

What’s included in this collection

Project examples

The collective thoughts of the interwebz