Tag Archives: artificial intelligence

Decoding the Social Effects Of Media with Machine Learning

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/decoding-the-social-effects-of-media-with-machine-learning/

What if media were optimized to benefit people? This thought-provoking question is at the core of Harmony Labs‘ mission. A nonprofit organization headquartered in New York City, Harmony Labs strives to better understand the impact of media on society, and build communities and tools to reform and transform media systems.

Brian WaniewskiAs Brian Wanieswki, Executive Director at Harmony Labs puts it: “The media systems that we have now, for better or worse, have become outrage machines and sorting machines that put people into groups of like minds. The business incentive structures of these systems are such that the more outrage there is, the more profit there is. Political events across the world in recent years have borne out what these media systems produce, and it’s really pretty toxic, and pretty hard to get anything done within. There are all kinds of natural divisions between people, but these media systems tend to reinforce these divisions. So, the first question that we’re asking is, What’s the scope of this problem? And then, What can we do to solve it?”

Harmony Labs use data science and machine learning to answer these questions. Starting from user surveys and media data, they developed advanced natural language processing pipelines that can identify how social issues are represented in media, how they are consumed by different audiences, and what kind of influence that consumption has.

So how do you get that data in first place? Brian says: “All the media data that we need lives inside private companies. We knew that data sharing would be central to our mission, and this is why we’re structured as a nonprofit. We are working in the public’s interest in a non-partisan way. We’ve done big data sharing deals with large companies, as well as startups scraping different corners of the media ecosystem that we’re interested in: internet TV, internet radio, and so on. We have about 10 data partners at the moment, and we’re always looking to expand.

Thanks to their data partners, Harmony Labs has collected over 50 Terabytes of diverse media: TV, web, mobile, song lyrics, closed captions, social media, and more. This definitely fits the definition of big data (volume, velocity and variety). Working with languages like Golang, Python and R, the Harmony Labs data science and engineering teams rely on AWS services such as Amazon Aurora, Amazon Athena, AWS Glue and Amazon Elastic Kubernetes Service (EKS) to build their data ingestion and processing workflows.

Laura EdelsonOnce the data is in-house, Harmony Labs make it safe, secure, and accessible to a network of academic researchers who use it to investigate the influence of media systems on politics, society, and culture. Laura Edelson is one of these researchers. A Ph.D. Candidate in Computer Science at NYU’s Tandon School of Engineering, she studies online political communication and develops methods to identify inauthentic content and activity. Harmony Labs supported her on the Ad Observatory project, an exploration of political ads on Facebook.

Harmony Labs also work on their own projects, such as the Narrative Observatory. “A narrative is a story pattern that recurs across different kinds of stories and media. You’ll find them in song lyrics, TV shows, news articles, and more“, says Brian. The Narrative Observatory helps identify narratives on particular topics and track them over long periods of time, and across different media types.

With initial funding from the Bill & Melinda Gates Foundation, Harmony Labs studied narratives linked to the topic of poverty and economic mobility in the United States. Collecting millions of documents (online news, social media, music), they first identified the main narratives present in media. Then, using segmentation techniques, behavioral data on over 50,000 Americans and surveys, Harmony Labs defined four audiences, as well as their dominant narrative, their core values, and their views on specific social issues. Finally, Harmony Labs studied how each audience consumed narratives.

To enable funders, partners, and media companies to gain a deeper understanding of the cultural spaces their audiences occupy, they built a fascinating website, obiaudiences.org, where you can pick an audience and view the associated media feed. In other words, you can literally see the world in someone else’s eyes: what issues they care most about, what media they read most, and so on. This helps to understand the perceptions that different people have on certain issues, and as Brian puts it: “If you’re trying to reach people, it’s important to understand the media world they inhabit, and what can be actually relevant to that world.

Narrative Observatory

Recently, Harmony Labs led a project funded by the Mozilla Foundation on defining a healthy narrative for artificial intelligence (AI). Studying the TV consumption habits of over 80,000 US adults, and connecting them with closed captioned transcripts and ads, they identified and named the main media narratives on AI. Each narrative includes a definition of AI, the emotion that it creates in people, and whether they think that AI will lead to a happy or an unhappy ending.

Harmony Labs identified four main narratives on AI. Two are extremely negative and fear-inducing. “Tool of Tyranny” says that AI will be used by governments to oppress people. “Robot Overlords” says that we’ll never be able to control AI, and it will end up ruling us. At the other end of the spectrum, the “Wishes Granted” narrative is extremely positive: sure, we don’t understand AI, but it’s a magic wand that will solve all our issues. The last narrative, “Augmented Intelligence”, is more balanced: yes, AI is a great opportunity to improve our daily lives, but it’s also capable of being unfair and even dangerous. We are responsible for designing it, controlling it, and making sure it’s used to help us, not to hurt us.

Harmony Labs found that the “Wishes Granted” narrative was the most prominent (67%). It shines a positive light on AI, but its naive and over-optimistic vision can hide the legitimate questions that AI raises. Still, it’s a good starting point to engage audiences, educate them with the “Augmented Intelligence” narrative, and increase their awareness on both opportunities and challenges.

Closing this post, I’m wondering which AI narrative I’ve actually promoted here, willingly or not! What do you think? One thing is certain: Harmony Labs is using AI to help us understand how media influences us every day, and how we can create a more democratic society. This is important work, and we’re humbled that they picked AWS to help them reach their goals.

For more information on Harmony Labs, please visit harmonylabs.org and harmonylabs.medium.com.

– Julien

Improve the performance of Lambda applications with Amazon CodeGuru Profiler

Post Syndicated from Vishaal Thanawala original https://aws.amazon.com/blogs/devops/improve-performance-of-lambda-applications-amazon-codeguru-profiler/

As businesses expand applications to reach more users and devices, they risk higher latencies that could degrade the  customer experience. Slow applications can frustrate or even turn away customers. Developers therefore increasingly face the challenge of ensuring that their code runs as efficiently as possible, since application performance can make or break businesses.

Amazon CodeGuru Profiler helps developers improve their application speed by analyzing its runtime. CodeGuru Profiler analyzes single and multi-threaded applications and then generates visualizations to help developers understand the latency sources. Afterwards, CodeGuru Profiler provides recommendations to help resolve the root cause.

CodeGuru Profiler recently began providing recommendations for applications written in Python. Additionally, the new automated onboarding process for AWS Lambda functions makes it even easier to use CodeGuru Profiler with serverless applications built on AWS Lambda.

This post highlights these new features by explaining how to set up and utilize CodeGuru Profiler on an AWS Lambda function written in Python.

Prerequisites

This post focuses on improving the performance of an application written with AWS Lambda, so it’s important to understand the Lambda functions that work best with CodeGuru Profiler. You will get the most out of CodeGuru Profiler on long-duration Lambda functions (>10 seconds) or frequently invoked shorter generation Lambda functions (~100 milliseconds). Because CodeGuru Profiler requires five minutes of runtime data before the Lambda container is recycled, very short duration Lambda functions with an execution time of 1-10 milliseconds may not provide sufficient data for CodeGuru Profiler to generate meaningful results.

The automated CodeGuru Profiler onboarding process, which automatically creates the profiling group for you, supports Lambda functions running on Java 8 (Amazon Corretto), Java 11, and Python 3.8 runtimes. Additional runtimes, without the automated onboarding process, are supported and can be found in the Java documentation and the Python documentation.

Getting Started

Let’s quickly demonstrate the new Lambda onboarding process and the new Python recommendations. This example assumes you have already created a Lambda function, so we will just walk through the process of turning on CodeGuru Profiler and viewing results. If you don’t already have a Lambda function created, you can create one by following these set up instructions. If you would like to replicate this example, the code we used can be found on GitHub here.

  1. On the AWS Lambda Console page, open your Lambda function. For this example, we’re using a function with a Python 3.8 runtime.

 This image shows the Lambda console page for the Lambda function we are referencing.

2. Navigate to the Configuration tab, go to the Monitoring and operations tools page, and click Edit on the right side of the page.

This image shows the instructions to open the page to turn on profiling

3. Scroll down to “Amazon CodeGuru Profiler” and click the button next to “Code profiling” to turn it on. After enabling Code profiling, click Save.

This image shows how to turn on CodeGuru Profiler

4. Verify that CodeGuru Profiler has been turned on within the Monitoring and operations tools page

This image shows how to validate Code profiling has been turned on

That’s it! You can now navigate to CodeGuru Profiler within the AWS console and begin viewing results.

Viewing your results

CodeGuru Profiler requires 5 minutes of Lambda runtime data to generate results. After your Lambda function provides this runtime data, which may need multiple runs if your lambda has a short runtime, it will display within the “Profiling group” page in the CodeGuru Profiler console. The profiling group will be given a default name (i.e., aws-lambda-<lambda-function-name>), and it will take approximately 15 minutes after CodeGuru Profiler receives the runtime data before it appears on this page.

This image shows where you can see the profiling group created after turning on CodeGuru Profiler on your Lambda function

After the profile appears, customers can view their profiling results by analyzing the flame graphs. Additionally, after approximately 1 hour, customers will receive their first recommendation set (if applicable). For more information on how to reading the CodeGuru Profiler results, see Investigating performance issues with Amazon CodeGuru Profiler.

The below images show the two flame graphs (CPU Utilization and Latency) generated from profiling the Lambda function. Note that the highlighted horizontal bar (also referred to as a frame) in both images corresponds with one of the three frames that generates a recommendation. We’ll dive into more details on the recommendation in the following sections.

CPU Utilization Flame Graph:

This image shows the CPU Utilization flame graph for the Lambda function

Latency Flame Graph:

This image shows the Latency flame graph for the Lambda function

Here are the three recommendations generated from the above Lambda function:

This image shows the 3 recommendations generated by CodeGuru Profiler

Addressing a recommendation

Let’s dive even further into it an example recommendation. The first recommendation above notices that the Lambda function is spending more than the normal amount of runnable time (6.8% vs <1%) creating AWS SDK service clients. It recommends ensuring that the function doesn’t unnecessarily create AWS SDK service clients, which wastes CPU time.

This image shows the detailed recommendation for 'Recreation of AWS SDK service clients'

Based on the suggested resolution step, we made a quick and easy code change, moving the client creation outside of the lambda-handler function. This ensures that we don’t create unnecessary AWS SDK clients. The code change below shows how we would resolve the issue.

...
s3_client = boto3.client('s3')
cw_client = boto3.client('cloudwatch')

def lambda_handler(event, context):
...

Reviewing your savings

After making each of the three changes recommended above by CodeGuru Profiler, look at the new flame graphs to see how the changes impacted the applications profile. You’ll notice below that we no longer see the previously wide frames for boto3 clients, put_metric_data, or the logger in the S3 API call.

CPU Utilization Flame Graph:

This image shows the CPU Utilization flame graph after making the recommended changes

Latency Flame Graph:

This image shows the Latency flame graph after making the recommended changes

Moreover, we can run the Lambda function for one day (1439 invocations) and see the results in Lambda Insights in order to understand our total savings. After every recommendation was addressed, this Lambda function, with a 128 MB memory and 10 second timeout, decreased in CPU time by 10% and dropped in maximum memory usage and network IO leading to a 30% drop in GB-s. Decreasing GB-s leads to 30% lower cost for the Lambda’s duration bill as explained in the AWS Lambda Pricing.

Latency (before and after CodeGuru Profiler):

The graph below displays the duration change while running the Lambda function for 1 day.

This image shows the duration change while running the Lambda function for 1 day.

Cost (before and after CodeGuru Profiler):

The graph below displays the function cost change while running the lambda function for 1 day.

This image shows the function cost change while running the lambda function for 1 day.

Conclusion

This post showed how developers can easily onboard onto CodeGuru Profiler in order to improve the performance of their serverless applications built on AWS Lambda. Get started with CodeGuru Profiler by visiting the CodeGuru console.

All across Amazon, numerous teams have utilized CodeGuru Profiler’s technology to generate performance optimizations for customers. It has also reduced infrastructure costs, saving millions of dollars annually.

Onboard your Python applications onto CodeGuru Profiler by following the instructions on the documentation. If you’re interested in the Python agent, note that it is open-sourced on GitHub. Also for more demo applications using the Python agent, check out the GitHub repository for additional samples.

About the authors

Headshot of Vishaal

Vishaal is a Product Manager at Amazon Web Services. He enjoys getting to know his customers’ pain points and transforming them into innovative product solutions.

 

 

 

 

 

Headshot of Mirela

Mirela is working as a Software Development Engineer at Amazon Web Services as part of the CodeGuru Profiler team in London. Previously, she has worked for Amazon.com’s teams and enjoys working on products that help customers improve their code and infrastructure.

 

 

 

 

Headshot of Parag

Parag is a Software Development Engineer at Amazon Web Services as part of the CodeGuru Profiler team in Seattle. Previously, he was working for Amazon.com’s catalog and selection team. He enjoys working on large scale distributed systems and products that help customers build scalable, and cost-effective applications.

Automate Document Processing in Logistics using AI

Post Syndicated from Manikanth Pasumarti original https://aws.amazon.com/blogs/architecture/automate-document-processing-in-logistics-using-ai/

Multi-modal transportation is one of the biggest developments in the logistics industry. There has been a successful collaboration across different transportation partners in supply chain freight forwarding for many decades. But there’s still a considerable overhead of paperwork processing for each leg of the trip. Tens of billions of documents are processed in ocean freight forwarding alone. Using manual labor to process these documents (purchase orders, invoices, bills of lading, delivery receipts, and more) is both expensive and error-prone.

In this blog post, we’ll address how to automate the document processing in the logistics industry. We’ll also show you how to integrate it with a centralized workflow management.

Automated document processing architecture

Figure 1. Architecture of document processing workflow

Figure 1. Architecture of document processing workflow

The solution workflow shown in Figure 1 is as follows:

  1. Documents that belong to the same transaction are collected in an S3 bucket
  2. The document processing workflow is initiated
  3. The workflow orchestration is as follows:
    • Document is processed via automation
    • Relevant entities are extracted
    • Extracted data is reviewed
    • Order data is consolidated

This architecture uses Amazon Simple Storage Service (S3) for document storage, and Amazon Simple Queue Service (SQS) for workflow initiation. Amazon Textract is used for text extraction, Amazon Comprehend for entity extraction, and Amazon Augmented AI (A2I) for human review. This will ensure correct results in cases of low confidence predictions.

We use AWS Step Functions for the orchestration of document processing workflow. Step functions also help to improve the application resiliency with less code.

AWS Lambda functions are used to:

  • Detect if all required documents for a given transaction are available in Amazon S3
  • Kick off the process by creating an Amazon SQS message
  • Detect a new processing job from a generated SQS message
  • Extract text from PDFs using a Step Function
  • Extract entities from generated text using a Step Function
  • Control data completeness and accuracy
  • Initiate a human loop when needed using a Step Function
  • Consolidate the data collected from documents
  • Store the data into the database

Document ingestion and classification

There are several data ingestion options available such as AWS Transfer Family, AWS DataSync, and Amazon Kinesis Data Firehose. Choose the appropriate ingestion blueprints based on the type of data sources. Typical real-time ingestion blueprints include AWS Lambda processing and an Amazon CloudWatch event. The batch pipeline can leverage AWS Step Functions. This can be used to orchestrate the Lambda function that initiates the document processing workflow.

Here are some things to consider when building your document ingestion and storage solution:

  • Choose your bucket strategy. Amazon S3 is an object store. Analyze your data pipeline ingestion carefully and choose the correct S3 bucket strategy for each document type (bills, supplier invoices, and others.)
  • Organize your data. The data is organized in S3 buckets by layers: Raw, Staging, and Processed. Each has their own respective bucket policy and access control.
  • Build a creation tool. This is an automated data lake bucket/folder structure tool, based on your data ingestion requirements. You can use this same structure for user-created data.
  • Define data security requirements. Do this before you begin the ingestion process. Before ingesting new or current data sources into AWS, secure access to the data.
  • Review security credentials needed for access. After copying these credentials into AWS Systems Manager (SSM), apply an AWS Key Management Service (KMS) key to encrypt the file. This encrypted key string is stored in SSM to use for authentication.

Document processing workflow

Overview

The workflow checks the input buckets until it detects all the documents types necessary for a complete dataset. In our case, it is the invoice document and customs authorization form. Once both are detected, it generates a job request as a message in Amazon SQS. A Lambda function then processes the message and kicks off the Step Function flow (see Figure 2). The state machine then initiates the document processing, text extraction, and optional human review steps. AWS Step Functions are well suited for our use case due to its ability to manage long-running workflows.

Figure 2. Visual workflow of document processing in AWS Step Functions

Figure 2. Visual workflow of document processing in AWS Step Functions

Entity extraction

For each document, entities are extracted using Amazon Textract and Amazon Comprehend. These entities can include date, company, address, bill of materials, total cost, and invoice number.

Following is a sample invoice document that is fed to Amazon Textract, which extracts the form data and creates key-value pairs.

Figure 3. Highlighted different entities in the sample invoice document

Figure 3. Highlighted different entities in the sample invoice document

See Figure 4 for an example of the key-value pairs extracted for the sample invoice. The keys here represent the form labels (“SHIP TO”) and the values represent form values (shipping address).

Figure 4. Key-value pairs of the invoice data, extracted by Amazon Textract

Figure 4. Key-value pairs of the invoice data, extracted by Amazon Textract

Amazon Textract also generates a raw text output that contains the entire text, as shown in Figure 5 following.

Figure 5. Raw text output of the invoice data extracted by Amazon Textract

Figure 5. Raw text output of the invoice data extracted by Amazon Textract

To achieve a higher degree of confidence, Amazon Comprehend is used to identify and extract the custom entities. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to identify and extracts insights and entities from text data. You can train Amazon Comprehend to identify entities relevant to your organization. These can be product names, part numbers, department names, or other entities. You can also train Amazon Comprehend to categorize documents or assign relevant labels to text.

An Amazon Comprehend entity recognizer comes with a set of pre-built entity types. Amazon Comprehend can introduce custom entities to match our specific business needs. Some of the entities we want to identify are address and company name. We trained a custom recognizer to detect company names and addresses, see Figure 6.

Figure 6. Training details of custom entity recognizer

Figure 6. Training details of custom entity recognizer

Figure 7 shows the resulting output from Amazon Comprehend:

Figure 7. Amazon Comprehend entity recognition output

Figure 7. Amazon Comprehend entity recognition output

The document is processed top-down, from left to right, from the sample invoice in Figure 3. We know that the first company and first address belongs to the Billing Company. And the second set belongs to the Shipment recipient. Along with detecting custom entities, Amazon Comprehend also outputs the confidence score of the extracted result.

Confidence scores can vary depending on how close training data is to actual data. In the example preceding, the first company entity came back with a score of 0.941. Let’s assume that we have set a minimum confidence score of 0.95. Anything below that threshold should be reviewed by a human. The following section describes the last step of our workflow.

Human review

Amazon Augmented AI (A2I) allows you to create and manage human loops. A human loop is a manual review task that gets assigned to a workforce. The workforce can be public, such as Mechanical Turk, or private, such as internal team or a paid contractor. In our example, we created a private workforce to review the entities we were not confident about. Figure 8 shows an example of the user interface that the reviewers use to assign entities to the proper text sections.

Figure 8. Manual review interface of Amazon A2I

Figure 8. Manual review interface of Amazon A2I

Review tasks can be automatically submitted to the workforce based on dynamic criteria, after both AI-related steps are completed. It can be used to review the text detected by Amazon Textract when key data elements are missing (such as order amount or quantity). It can also review entities after invoking Amazon Comprehend.

Figure 9. Consolidated dataset of processed invoice and customs authorization data

Figure 9. Consolidated dataset of processed invoice and customs authorization data

After the manual review step, data can be consolidated (as shown in Figure 9) and stored into a relational database. It can also be shared with other business units such as Accounting or Customer Services. You can apply the same process to other document types such as custom forms, which are linked to the same transaction. This allows us to process and combine information that comes from disparate paper sources more efficiently.

Conclusion

This post demonstrates how document processing can be automated to process business documentation by using Amazon Textract, Amazon Comprehend and Amazon Augmented AI.

Deploying an automated solution in the logistics industry takes away the undifferentiated heavy lifting involved in manual document processing. This helps to cut down the delivery delays and track any missed deliveries. By providing a comprehensive view of the shipment, it increases the efficiency of back-office processing. It can also further simplify the data collection for audit purposes.

To learn more:

Using AI to Scale Spear Phishing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/08/using-ai-to-scale-spear-phishing.html

The problem with spear phishing is that it takes time and creativity to create individualized enticing phishing emails. Researchers are using GPT-3 to attempt to solve that problem:

The researchers used OpenAI’s GPT-3 platform in conjunction with other AI-as-a-service products focused on personality analysis to generate phishing emails tailored to their colleagues’ backgrounds and traits. Machine learning focused on personality analysis aims to be predict a person’s proclivities and mentality based on behavioral inputs. By running the outputs through multiple services, the researchers were able to develop a pipeline that groomed and refined the emails before sending them out. They say that the results sounded “weirdly human” and that the platforms automatically supplied surprising specifics, like mentioning a Singaporean law when instructed to generate content for people living in Singapore.

While they were impressed by the quality of the synthetic messages and how many clicks they garnered from colleagues versus the human-composed ones, the researchers note that the experiment was just a first step. The sample size was relatively small and the target pool was fairly homogenous in terms of employment and geographic region. Plus, both the human-generated messages and those generated by the AI-as-a-service pipeline were created by office insiders rather than outside attackers trying to strike the right tone from afar.

It’s just a matter of time before this is really effective. Combine it with voice and video synthesis, and you have some pretty scary scenarios. The real risk isn’t that AI-generated phishing emails are as good as human-generated ones, it’s that they can be generated at much greater scale.

Defcon presentation and slides. Another news article

Field Notes: Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow

Post Syndicated from Kevin Soucy original https://aws.amazon.com/blogs/architecture/field-notes-building-an-automated-scene-detection-pipeline-for-autonomous-driving/

This Field Notes blog post in 2020 explains how to build an Autonomous Driving Data Lake using this Reference Architecture. Many organizations face the challenge of ingesting, transforming, labeling, and cataloging massive amounts of data to develop automated driving systems. In this re:Invent session, we explored an architecture to solve this problem using Amazon EMR, Amazon S3, Amazon SageMaker Ground Truth, and more. You learn how BMW Group collects 1 billion+ km of anonymized perception data from its worldwide connected fleet of customer vehicles to develop safe and performant automated driving systems.

Architecture Overview

The objective of this post is to describe how to design and build an end-to-end Scene Detection pipeline which:

This architecture integrates an event-driven ROS bag ingestion pipeline running Docker containers on Elastic Container Service (ECS). This includes a scalable batch processing pipeline based on Amazon EMR and Spark. The solution also leverages AWS Fargate, Spot Instances, Elastic File System, AWS Glue, S3, and Amazon Athena.

reference architecture - build automated scene detection pipeline - Autonomous Driving

Figure 1 – Architecture Showing how to build an automated scene detection pipeline for Autonomous Driving

The data included in this demo was produced by one vehicle across four different drives in the United States. As the ROS bag files produced by the vehicle’s on-board software contains very complex data, such as Lidar Point Clouds, the files are usually very large (1+TB files are not uncommon).

These files usually need to be split into smaller chunks before being processed, as is the case in this demo. These files also may need to have post-processing algorithms applied to them, like lane detection or object detection.

In our case, the ROS bag files are split into approximately 10GB chunks and include topics for post-processed lane detections before they land in our S3 bucket. Our scene detection algorithm assumes the post processing has already been completed. The bag files include object detections with bounding boxes, and lane points representing the detected outline of the lanes.

Prerequisites

This post uses an AWS Cloud Development Kit (CDK) stack written in Python. You should follow the instructions in the AWS CDK Getting Started guide to set up your environment so you are ready to begin.

You can also use the config.json to customize the names of your infrastructure items, to set the sizing of your EMR cluster, and to customize the ROS bag topics to be extracted.

You will also need to be authenticated into an AWS account with permissions to deploy resources before executing the deploy script.

Deployment

The full pipeline can be deployed with one command: * `bash deploy.sh deploy true` . The progress of the deployment can be followed on the command line, but also in the CloudFormation section of the AWS console. Once deployed, the user must upload 2 or more bag files to the rosbag-ingest bucket to initiate the pipeline.

The default configuration requires two bag files to be processed before an EMR Pipeline is initiated. You would also have to manually initiate the AWS  Glue Crawler to be able to explore the parquet data with tools like Athena or Quicksight.

ROS bag ingestion with ECS Tasks, Fargate, and EFS

This solution provides an end-to-end scene detection pipeline for ROS bag files, ingesting the ROS bag files from S3, and transforming the topic data to perform scene detection in PySpark on EMR. This then exposes scene descriptions via DynamoDB to downstream consumers.

The pipeline starts with an S3 bucket (Figure 1 – #1) where incoming ROS bag files can be uploaded from local copy stations as needed. We recommend, using Amazon Direct Connect for a private, high-throughout connection to the cloud.

This ingestion bucket is configured to initiate S3 notifications each time an object ending in the prefix “.bag” is created. An AWS Lambda function then initiates a Step Function for orchestrating the ECS Task. This passes the bucket and bag file prefix to the ECS task as environment variables in the container.

The ECS Task (Figure 1 – #2) runs serverless leveraging Fargate as the capacity provider, This avoids the need to provision and autoscale EC2 instances in the ECS cluster. Each ECS Task processes exactly one bag file. We use Elastic FileStore to provide virtually unlimited file storage to the container, in order to easily work with larger bag files. The container uses the open-source bagpy python library to extract structured topic data (for example, GPS, detections, inertial measurement data,). The topic data is uploaded as parquet files to S3, partitioned by topic and source bag file. The application writes metadata about each file, such as the topic names found in the file and the number of messages per topic, to a DynamoDB table (Figure 1 – #4).

This module deploys an AWS  Glue Crawler configured to crawl this bucket of topic parquet files. These files populate the AWS Glue Catalog with the schemas of each topic table and make this data accessible in Athena, Glue jobs, Quicksight, and Spark on EMR.  We use the AWS Glue Catalog (Figure 1 – #5) as a permanent Hive Metastore.

Glue Data Catalog of parquet datasets on S3

Figure 2 – Glue Data Catalog of parquet datasets on S3

 

Run ad-hoc queries against the Glue tables using Amazon Athena

Figure 3 – Run ad-hoc queries against the Glue tables using Amazon Athena

The topic parquet bucket also has an S3 Notification configured for all newly created objects, which is consumed by an EMR-Trigger Lambda (Figure 1 – #5). This Lambda function is responsible for keeping track of bag files and their respective parquet files in DynamoDB (Figure 1 – #6). Once in DynamoDB, bag files are assigned to batches, initiating the EMR batch processing step function. Metadata is stored about each batch including the step function execution ARN in DynamoDB.

EMR pipeline orchestration with AWS Step Functions

Figure 4 – EMR pipeline orchestration with AWS Step Functions

The EMR batch processing step function (Figure 1 – #7) orchestrates the entire EMR pipeline, from provisioning an EMR cluster using the open-source EMR-Launch CDK library to submitting Pyspark steps to the cluster, to terminating the cluster and handling failures.

Batch Scene Analytics with Spark on EMR

There are two PySpark applications running on our cluster. The first performs synchronization of ROS bag topics for each bagfile. As the various sensors in the vehicle have different frequencies, we synchronize the various frequencies to a uniform frequency of 1 signal per 100 ms per sensor. This makes it easier to work with the data.

We compute the minimum and maximum timestamp in each bag file, and construct a unified timeline. For each 100 ms we take the most recent signal per sensor and assign it to the 100 ms timestamp. After this is performed, the data looks more like a normal relational table and is easier to query and analyze.

Batch Scene Analytics with Spark on EMR

Figure 5 – Batch Scene Analytics with Spark on EMR

Scene Detection and Labeling in PySpark

The second spark application enriches the synchronized topic dataset (Figure 1 – #8), analyzing the detected lane points and the object detections. The goal is to perform a simple lane assignment algorithm for objects detected by the on-board ML models and to save this enriched dataset (Figure 1 – #9) back to S3 for easy-access by analysts and data scientists.

Object Lane Assignment Example

Figure 9 – Object Lane Assignment example

 

Synchronized topics enriched with object lane assignments

Figure 9 – Synchronized topics enriched with object lane assignments

Finally, the last step takes this enriched dataset (Figure 1 – #9) to summarize specific scenes or sequences where a person was identified as being in a lane. The output of this pipeline includes two new tables as parquet files on S3 – the synchronized topic dataset (Figure 1 – #8) and the synchronized topic dataset enriched with object lane assignments (Figure 1 – #9), as well as a DynamoDB table with scene metadata for all person-in-lane scenarios (Figure 1 – #10).

Scene Metadata

The Scene Metadata DynamoDB table (Figure 1 – #10) can be queried directly to find sequences of events, as will be covered in a follow up post for visually debugging scene detection algorithms using WebViz/RViz. Using WebViz, we were able to detect that the on-board object detection model labels Crosswalks and Walking Signs as “person” even when a person is not crossing the street, for example:

Example DynamoDB item from the Scene Metadata table

Example DynamoDB item from the Scene Metadata table

Figure 10 – Example DynamoDB item from the Scene Metadata table

These scene descriptions can also be converted to Open Scenario format and pushed to an ElasticSearch cluster to support more complex scenario-based searches. For example, downstream simulation use cases or for visualization in QuickSight. An example of syncing DynamoDB tables to ElasticSearch using DynamoDB streams and Lambda can be found here (https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/). As DynamoDB is a NoSQL data store, we can enrich the Scene Metadata table with scene parameters. For example, we can identify the maximum or minimum speed of the car during the identified event sequence, without worrying about breaking schema changes. It is also straightforward to save a dataframe from PySpark to DynamoDB using open-source libraries.

As a final note, the modules are built to be exactly that, modular. The three modules that are easily isolated are:

  1. the ECS Task pipeline for extracting ROS bag topic data to parquet files
  2. the EMR Trigger Lambda for tracking incoming files, creating batches, and initiating a batch processing step function
  3. the EMR Pipeline for running PySpark applications leveraging Step Functions and EMR Launch

Clean Up

To clean up the deployment, you can run bash deploy.sh destroy false. Some resources like S3 buckets and DynamoDB tables may have to be manually emptied and deleted via the console to be fully removed.

Limitations

The bagpy library used in this pipeline does not yet support complex or non-structured data types like images or LIDAR data. Therefore its usage is limited to data that can be stored in a tabular csv format before being converted to parquet.

Conclusion

In this post, we showed how to build an end-to-end Scene Detection pipeline at scale on AWS to perform scene analytics and scenario detection with Spark on EMR from raw vehicle sensor data. In a subsequent blog post, we will cover how how to extract and catalog images from ROS bag files, create a labelling job with SageMaker GroundTruth and then train a Machine Learning Model to detect cars.

Recommended Reading: Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

Extract Insights From Customer Conversations with Amazon Transcribe Call Analytics

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/extract-insights-from-customer-conversations-with-amazon-transcribe-call-analytics/

In 2017, we launched Amazon Transcribe, an automatic speech recognition (ASR) service that makes it easy to add speech-to-text capabilities to any application. Today, I’m very happy to announce the availability of Amazon Transcribe Call Analytics, a new feature that lets you easily extract valuable insights from customer conversations with a single API call.

Each discussion with potential or existing customers is an opportunity to learn about their needs and expectations. For example, it’s important for customer service teams to figure out the main reasons why customers are calling them, and measure customer satisfaction during these calls. Likewise, salespeople try to gauge customer interest, and their reaction to a particular sales pitch.

Thus, many customers and partners would like to add call analytics capabilities in different applications, regardless of their contact center provider. They often need to analyze more than phone calls, for example web-based audio and video calls. So far, they’ve typically done this by stitching AI services and dedicated ML models together, and they’ve asked us for a simpler solution.

We got to work and built Amazon Transcribe Call Analytics, a new addition to Transcribe and a key enhancement to AWS Contact Center Intelligence. If you can’t wait to try it, feel free to jump now to the AWS console. If you’d like to learn more, read on!

Introducing Amazon Transcribe Call Analytics
Based on ASR implemented in Transcribe, Transcribe Call Analytics adds natural language processing (NLP) capabilities specifically trained on customer calls, and optimized to provide highly accurate call transcripts and actionable insights. With a simple API call, developers can now easily add call analytics to any application, and extract customer insights from conversations without having to build AI pipelines and train custom ML models.

Key features of Transcribe Call Analytics include:

  • Timestamped turn-by-turn call transcription in 21 languages.
  • Issue detection, which picks up the shortest set of contiguous words in a conversation turn that represents the reason why the customer is calling. This works out of the box without any configuration or training.
  • Call categorization based on conversational characteristics:
    • Matching specific words and phrases,
    • Detecting non-talk time,
    • Detecting interruptions,
    • Analyzing sentiment for the customer and the agent.
  • Call characteristics such as:
    • How quickly and loudly a customer or agent are speaking,
    • Detecting non-talk time,
    • Detecting interruptions.
  • Redaction of sensitive data from the text transcript and the corresponding audio file.

For example, you can create rules to flag calls where customers interrupt the agent, exhibit negative sentiment, and say “I want to speak with the manager”. These calls certainly did not go well, and are worth analyzing in detail! You can also look for calls where agents don’t use pre-defined greetings (“Welcome to ACME Support, how can I help you today?”) within the first 15 seconds, to measure script compliance and help supervisors identify agent coaching opportunities. Another popular scenario is to create rules that flag mentions of your specific products and services (“Your ACME Turbo 2000 vacuum cleaner isn’t working like it should”), in order to pick up any emerging trends you’d need to be aware of.

Last but not least, you can further process the text transcript with other AI services such as Amazon Translate, or with custom NLP models built with Amazon SageMaker.

Now, let’s do a quick demo.

Extracting Insights with Amazon Transcribe Call Analytics
Here’s a fictitious support call, where a lady calls her bank to report that she’s lost her credit and debit cards. The sound file is a stereo WAV file (16-bit, 8KHz).

Transcribe Call Analytics requires that the agent and the customer are recorded in their own channel. We’ll also need to tell which is the agent channel. In a stereo file, the left channel is usually the first channel (channel #0), and the right channel is the second one (channel #1). This is the case for this call.

If you’re not sure which is which, you can easily use the versatile ffmpeg open source tool to extract each channel to a separate audio file.

$ ffmpeg -i demo-call.wav -map_channel 0.0.0 channel0.wav -map_channel 0.0.1 channel1.wav

You can use the same technique to extract audio channels from other file types, such as video files, and recombine them to a stereo audio file. You’ll find more information in the ffmpeg documentation.

Now that I’m sure that the agent is in channel #1, I use the AWS CLI to upload the audio file to an S3 bucket.

$ aws s3 cp launch-call.wav s3://jsimon-transcribe-useast1/demo-call.wav --region us-east-1

Opening the Transcribe Call Analytics console, I see that call category templates are available.

Call categories

I decide to create one for supervisor escalations. Then, with a couple of clicks, I create a custom call category named welcome-message, to check if the agent starts the call with an appropriate welcome. I could add several phrases to check for if needed. We recommend that you use short sentences to minimize the chance of filler words popping up (‘hmm’, ‘err’, and so on).

Call category

Then, I create a call analytics job using the general model available in Transcribe. I also enable automatic language detection.

Creating a job

Then, I define the location of the audio file in S3, flagging channel #1 as the agent channel.

Creating a job

I decide to store the transcript in the default S3 bucket created by Transcribe in my account. I could also use my own bucket if needed. Then, I pick an AWS Identity and Access Management (IAM) role with sufficient permissions, and I launch the job.

A minute later or so, the job is complete. The console contains a preview of the text transcript, as well as a link to the full JSON transcript.

Viewing the transcript

As the agent used the proper welcome sentence in the first 15 seconds, the call is tagged with the category I created earlier.

Call categories

Downloading the JSON transcript, each sentence in the conversation is enriched with metadata on per-word loudness, measured on a 0-100 range with 100 being extremely loud. Here’s the first sentence:

"BeginOffsetMillis":440,"EndOffsetMillis":4960,
"Sentiment":"NEUTRAL",
"ParticipantRole":"AGENT",
"LoudnessScores":[78.68,80.4,81.91,78.95,82.34],
"Content":"Hello and thank you for calling the bank. This is Ashley speaking, how may I help you today?"

Looking at the next sentence, I see that Transcribe Call Analytics automatically detected what the customer issue is. The corresponding text is in bold:

"Content": "Hi um uh you just need to cancel my card. Um I have a debit card and a credit card.",
"IssuesDetected":[{"UnredactedCharacterOffsets":{"Begin": 26,"End": 40}}. . .

At the end of the transcript, I see global call statistics (duration, talk time, words per minute, matched categories). Transcribe also gives me overall sentiment information, meaured from -5 (extremely negative) to +5 (extremely positive). I also get a a breakdown in four quarters.

"Sentiment":{"OverallSentiment":{"AGENT":2.6,"CUSTOMER":0.2},
"SentimentByPeriod":{"QUARTER":
{"AGENT":[
{"Score":1.9,"BeginOffsetMillis":0,"EndOffsetMillis":68457},
{"Score":-0.7,"BeginOffsetMillis":68457,"EndOffsetMillis":136915},
{"Score":5.0,"BeginOffsetMillis":136915,"EndOffsetMillis":205372},
{"Score":3.0,"BeginOffsetMillis":205372,"EndOffsetMillis":273830}],
"CUSTOMER":[
{"Score":-1.7,"BeginOffsetMillis":0,"EndOffsetMillis":68165},
{"Score":0.0,"BeginOffsetMillis":68165,"EndOffsetMillis":136330},
{"Score":0.0,"BeginOffsetMillis":136330,"EndOffsetMillis":204495},
{"Score":2.1,"BeginOffsetMillis":204495,"EndOffsetMillis":272660}]}}}

We can see that the customer started the call with negative sentiment, moving quickly to neutral sentiment, and ending the call with positive sentiment. This is a good sign that the call was handled satisfactorily, and that the customer problem was solved.

If you’d like to convert the transcript to a Word document with additional visualizations, my colleague Andrew Kane has built a nice tool and made it available on Github. Here’s a sample report produced by his tool.

Andrew's tool

AWS Customers and Partners Are Using Amazon Transcribe Call Analytics

Ben Rigby, the SVP, Global Head of Product & Engineering, Artificial Intelligence, Automation, and Workforce at Talkdesk told us, “Our customers are processing millions of customer service calls in their contact centers a year and have a critical need to extract actionable conversation insights to ensure positive business outcomes. As an AWS Contact Center Intelligence partner, we further enhanced our call transcription capabilities with Amazon Transcribe. With the launch of Amazon Transcribe Call Analytics, we’re excited to add even more AI capabilities to our Speech Analytics and QM Assist products. These deeper insights can provide agents and supervisors with the data they need to improve the speed and quality of their customer service while boosting workforce productivity.

Praphul Kumar, the Chief Product Officer of SuccessKPI adds, “Amazon Transcribe Call Analytics API enables us to add ML-based capabilities to our platform faster and at a lower cost. This new API removes the need to integrate multiple AI services together and develop custom machine learning models in certain areas. With Transcribe Call Analytics, we will be able to provide conversation insights such as sentiment, non-talk time, and call categories to gauge agent performance. This helps to drive better call outcomes, reduce agent turnover, uncover agent coaching opportunities, and measure call script compliance. Combining AWS services into SuccessKPI’s experience analytics platform was a no brainer. We are looking forward to bringing this valuable capability into the hands of large enterprises and government agencies.

Getting Started
A single API call is all it takes to extract rich insights from your customer conversations. You can start using Amazon Transcribe Call Analytics today in the following regions:

  • US West (Oregon), US East (N. Virginia),
  • Canada (Central),
  • Europe (London), Europe (Frankfurt),
  • Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Tokyo), Asia Pacific (Sydney).

Please give this new feature a try in the AWS console, and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for Amazon Transcribe.

One last thing: if you’re looking for an easy to use omnichannel cloud contact center, you should definitely take a look at Amazon Connect and its ML powered analytics, Contact Lens.

– Julien

Educating young people in AI, machine learning, and data science: new seminar series

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/ai-machine-learning-data-science-education-seminars/

A recent Forbes article reported that over the last four years, the use of artificial intelligence (AI) tools in many business sectors has grown by 270%. AI has a history dating back to Alan Turing’s work in the 1940s, and we can define AI as the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.

A woman explains a graph on a computer screen to two men.
Recent advances in computing technology have accelerated the rate at which AI and data science tools are coming to be used.

Four key areas of AI are machine learning, robotics, computer vision, and natural language processing. Other advances in computing technology mean we can now store and efficiently analyse colossal amounts of data (big data); consequently, data science was formed as an interdisciplinary field combining mathematics, statistics, and computer science. Data science is often presented as intertwined with machine learning, as data scientists commonly use machine learning techniques in their analysis.

Venn diagram showing the overlaps between computer science, AI, machine learning, statistics, and data science.
Computer science, AI, statistics, machine learning, and data science are overlapping fields. (Diagram from our forthcoming free online course about machine learning for educators)

AI impacts everyone, so we need to teach young people about it

AI and data science have recently received huge amounts of attention in the media, as machine learning systems are now used to make decisions in areas such as healthcare, finance, and employment. These AI technologies cause many ethical issues, for example as explored in the film Coded Bias. This film describes the fallout of researcher Joy Buolamwini’s discovery that facial recognition systems do not identify dark-skinned faces accurately, and her journey to push for the first-ever piece of legislation in the USA to govern against bias in the algorithms that impact our lives. Many other ethical issues concerning AI exist and, as highlighted by UNESCO’s examples of AI’s ethical dilemmas, they impact each and every one of us.

Three female teenagers and a teacher use a computer together.
We need to make sure that young people understand AI technologies and how they impact society and individuals.

So how do such advances in technology impact the education of young people? In the UK, a recent Royal Society report on machine learning recommended that schools should “ensure that key concepts in machine learning are taught to those who will be users, developers, and citizens” — in other words, every child. The AI Roadmap published by the UK AI Council in 2020 declared that “a comprehensive programme aimed at all teachers and with a clear deadline for completion would enable every teacher confidently to get to grips with AI concepts in ways that are relevant to their own teaching.” As of yet, very few countries have incorporated any study of AI and data science in their school curricula or computing programmes of study.

A teacher and a student work on a coding task at a laptop.
Our seminar speakers will share findings on how teachers can help their learners get to grips with AI concepts.

Partnering with The Alan Turing Institute for a new seminar series

Here at the Raspberry Pi Foundation, AI, machine learning, and data science are important topics both in our learning resources for young people and educators, and in our programme of research. So we are delighted to announce that starting this autumn we are hosting six free, online seminars on the topic of AI, machine learning, and data science education, in partnership with The Alan Turing Institute.

A woman teacher presents to an audience in a classroom.
Everyone with an interest in computing education research is welcome at our seminars, from researchers to educators and students!

The Alan Turing Institute is the UK’s national institute for data science and artificial intelligence and does pioneering work in data science research and education. The Institute conducts many different strands of research in this area and has a special interest group focused on data science education. As such, our partnership around the seminar series enables us to explore our mutual interest in the needs of young people relating to these technologies.

This promises to be an outstanding series drawing from international experts who will share examples of pedagogic best practice […].

Dr Matt Forshaw, The Alan Turing Institute

Dr Matt Forshaw, National Skills Lead at The Alan Turing Institute and Senior Lecturer in Data Science at Newcastle University, says: “We are delighted to partner with the Raspberry Pi Foundation to bring you this seminar series on AI, machine learning, and data science. This promises to be an outstanding series drawing from international experts who will share examples of pedagogic best practice and cover critical topics in education, highlighting ethical, fair, and safe use of these emerging technologies.”

Our free seminar series about AI, machine learning, and data science

At our computing education research seminars, we hear from a range of experts in the field and build an international community of researchers, practitioners, and educators interested in this important area. Our new free series of seminars runs from September 2021 to February 2022, with some excellent and inspirational speakers:

  • Tues 7 September: Dr Mhairi Aitken from The Alan Turing Institute will share a talk about AI ethics, setting out key ethical principles and how they apply to AI before discussing the ways in which these relate to children and young people.
  • Tues 5 October: Professor Carsten Schulte, Yannik Fleischer, and Lukas Höper from Paderborn University in Germany will use a series of examples from their ProDaBi programme to explore whether and how AI and machine learning should be taught differently from other topics in the computer science curriculum at school. The speakers will suggest that these topics require a paradigm shift for some teachers, and that this shift has to do with the changed role of algorithms and data, and of the societal context.
  • Tues 3 November: Professor Matti Tedre and Dr Henriikka Vartiainen from the University of Eastern Finland will focus on machine learning in the school curriculum. Their talk will map the emerging trajectories in educational practice, theory, and technology related to teaching machine learning in K-12 education.
  • Tues 7 December: Professor Rose Luckin from University College London will be looking at the breadth of issues impacting the teaching and learning of AI.
  • Tues 11 January: We’re delighted that Dr Dave Touretzky and Dr Fred Martin (Carnegie Mellon University and University of Massachusetts Lowell, respectively) from the AI4K12 Initiative in the USA will present some of the key insights into AI that the researchers hope children will acquire, and how they see K-12 AI education evolving over the next few years.
  • Tues 1 February: Speaker to be confirmed

How you can join our online seminars

All seminars start at 17:00 UK time (18:00 Central European Time, 12 noon Eastern Time, 9:00 Pacific Time) and take place in an online format, with a presentation, breakout discussion groups, and a whole-group Q&A.

Sign up now and we’ll send you the link to join on the day of each seminar — don’t forget to put the dates in your diary!

In the meantime, you can explore some of our educational resources related to machine learning and data science:

The post Educating young people in AI, machine learning, and data science: new seminar series appeared first on Raspberry Pi.

Hiding Malware in ML Models

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/hiding-malware-in-ml-models.html

Interesting research: “EvilModel: Hiding Malware Inside of Neural Network Models”.

Abstract: Delivering malware covertly and detection-evadingly is critical to advanced malware campaigns. In this paper, we present a method that delivers malware covertly and detection-evadingly through neural network models. Neural network models are poorly explainable and have a good generalization ability. By embedding malware into the neurons, malware can be delivered covertly with minor or even no impact on the performance of neural networks. Meanwhile, since the structure of the neural network models remains unchanged, they can pass the security scan of antivirus engines. Experiments show that 36.9MB of malware can be embedded into a 178MB-AlexNet model within 1% accuracy loss, and no suspicious are raised by antivirus engines in VirusTotal, which verifies the feasibility of this method. With the widespread application of artificial intelligence, utilizing neural networks becomes a forwarding trend of malware. We hope this work could provide a referenceable scenario for the defense on neural network-assisted attacks.

News article.

Paging Doctor Cloud! Amazon HealthLake Is Now Generally Available

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/paging-doctor-cloud-amazon-healthlake-is-now-generally-available/

At AWS re:Invent 2020, we previewed Amazon HealthLake, a fully managed, HIPAA-eligible service that allows healthcare and life sciences customers to aggregate their health information from different silos and formats into a structured, centralized AWS data lake, and extract insights from that data with analytics and machine learning (ML). Today, I’m very happy to announce that Amazon HealthLake is generally available to all AWS customers.

The ability to store, transform, and analyze health data quickly and at any scale is critical in driving high-quality health decisions. In their daily practice, doctors need a complete chronological view of patient history to identify the best course of action. During an emergency, giving medical teams the right information at the right time can dramatically improve patient outcomes. Likewise, healthcare and life sciences researchers need high-quality, normalized data that they can analyze and build models with, to identify population health trends or drug trial recipients.

Traditionally, most health data has been locked in unstructured text such as clinical notes, and stored in IT silos. Heterogeneous applications, infrastructure, and data formats have made it difficult for practitioners to access patient data, and extract insights from it. We built Amazon HealthLake to solve that problem.

If you can’t wait to get started, you can jump to the AWS console for Amazon HealthLake now. If you’d like to learn more, read on!

Introducing Amazon HealthLake
Amazon HealthLake is backed by fully-managed AWS infrastructure. You won’t have to procure, provision, or manage a single piece of IT equipment. All you have to do is create a new data store, which only takes a few minutes. Once the data store is ready, you can immediately create, read, update, delete, and query your data. HealthLake exposes a simple REST Application Programming Interface (API) available in the most popular languages, which customers and partners can easily integrate in their business applications.

Security is job zero at AWS. By default, HealthLake encrypts data at rest with AWS Key Management Service (KMS). You can use an AWS-managed key or your own key. KMS is designed so that no one, including AWS employees, can retrieve your plaintext keys from the service. For data in transit, HealthLake uses industry-standard TLS 1.2 encryption end to end.

At launch, HealthLake supports both structured and unstructured text data typically found in clinical notes, lab reports, insurance claims, and so on. The service stores this data in the Fast Healthcare Interoperability Resource (FHIR, pronounced ‘fire’) format, a standard designed to enable exchange of health data. HealthLake is compatible with the latest revision (R4) and currently supports 71 FHIR resource types, with additional resources to follow.

If your data is already in FHIR format, great! If not, you can convert it yourself, or rely on partner solutions available in AWS Marketplace. At launch, HealthLake includes validated connectors for Redox, HealthLX, Diameter Health, and InterSystems applications. They make it easy to convert your HL7v2, CCDA, and flat file data to FHIR, and to upload it to HealthLake.

As data is uploaded, HealthLake uses integrated natural language processing to extract entities present in your documents and stores the corresponding metadata. These entities include anatomy, medical conditions, medication, protected health information, test, treatments, and procedures. They are also matched to industry-standard ICD-10-CM and RxNorm entities.

After you’ve uploaded your data, you can start querying it, by assigning parameter values to FHIR resources and extracted entities. Whether you need to access information on a single patient, or want to export many documents to build a research dataset, all it takes is a single API call.

Let’s do a quick demo.

Querying FHIR Data in Amazon HealthLake
Opening the AWS console for HealthLake, I click on ‘Create a Data Store’. Then, I simply pick a name for my data store, and decide to encrypt it with an AWS managed key. I also tick the box that preloads sample synthetic data, which is a great way to quickly kick the tires of the service without having to upload my own data.

Creating a data store

After a few minutes, the data store is active, and I can send queries to its HTTPS endpoint. In the example below, I look for clinical notes (and clinical notes only) that contain the ICD-CM-10 entity for ‘hypertension’ with a confidence score of 99% or more. Under the hood, the AWS console is sending an HTTP GET request to the endpoint. I highlighted the corresponding query string.

Querying HealthLake

The query runs in seconds. Examining the JSON response in my browser, I see that it contains two documents. For each one, I can see lots of information: when it was created, which organization owns it, who the author is, and more. I can also see that HealthLake has automatically extracted a long list of entities, with names, descriptions, and confidence scores, and added them to the document.

HealthLake entities

The document is attached in the response in base64 format.

HealthLake document

Saving the string to a text file, and decoding it with a command-line tool, I see the following:

Mr Nesser is a 52 year old Caucasian male with an extensive past medical history that includes coronary artery disease , atrial fibrillation , hypertension , hyperlipidemia , presented to North ED with complaints of chills , nausea , acute left flank pain and some numbness in his left leg

This document is spot on. As you can see, it’s really easy to query and retrieve data stored in Amazon HealthLake.

Analyzing Data Stored in Amazon HealthLake
You can export data from HealthLake, store it in an Amazon Simple Storage Service (Amazon S3) bucket and use it for analytics and ML tasks. For example, you could transform your data with AWS Glue, query it with Amazon Athena, and visualize it with Amazon QuickSight. You could also use this data to build, train and deploy ML models on Amazon SageMaker.

The following blog posts show you end-to-end analytics and ML workflows based on data stored in HealthLake:

Last but not least, this self-paced workshop will show you how to import and export data with HealthLake, process it with AWS Glue and Amazon Athena, and build an Amazon QuickSight dashboard.

Now, let’s see what our customers are building with HealthLake.

Customers Are Already Using Amazon HealthLake
Based in Chicago, Rush University Medical Center is an early adopter of HealthLake. They used it to build a public health analytics platform on behalf of the Chicago Department of Public Health. The platform aggregates, combines, and analyzes multi-hospital data related to patient admissions, discharges and transfers, electronic lab reporting, hospital capacity, and clinical care documents for COVID-19 patients who are receiving care in and across Chicago hospitals. 17 of the 32 hospitals in Chicago are currently submitting data, and Rush plans to integrate all 32 hospitals by this summer. You can learn more in this blog post.

Recently, Rush launched another project to identify communities that are most exposed to high blood pressure risks, understand the social determinants of health, and improve healthcare access. For this purpose, they collect all sorts of data, such as clinical notes, ambulatory blood pressure measurements from the community, and Medicare claims data. This data is then ingested it into HealthLake and stored in FHIR format for further analysis.

Dr. Hota

Says Dr. Bala Hota, Vice President and Chief Analytics Officer at Rush University Medical Center: “We don’t have to spend time building extraneous items or reinventing something that already exists. This allows us to move to the analytics phase much quicker. Amazon HealthLake really accelerates the sort of insights that we need to deliver results for the population. We don’t want to be spending all our time building infrastructure. We want to deliver the insights.

 

Cortica is on a mission to revolutionize healthcare for children with autism and other developmental differences. Today, Cortica use HealthLake to store all patient data in a standardized, secured, and compliant manner. Building ML models with that data, they can track the progress of their patients with sentiment analysis, and they can share with parents the progress that their children are doing on speech development and motor skills. Cortical can also validate the effectiveness of treatment models and optimize medication regimens.

Ernesto DiMarinoErnesto DiMarino, Head of Enterprise Applications and Data at Cortica told us: “In a matter of weeks rather than months, Amazon HealthLake empowered us to create a centralized platform that securely stores patients’ medical history, medication history, behavioral assessments, and lab reports. This platform gives our clinical team deeper insight into the care progression of our patients. Using predefined notebooks in Amazon SageMaker with data from Amazon HealthLake, we can apply machine learning models to track and prognosticate each patient’s progression toward their goals in ways not otherwise possible. Through this technology, we can also share HIPAA-compliant data with our patients, researchers, and healthcare partners in an interoperable manner, furthering important research into autism treatment.

MEDHOST provides products and services to more than 1,000 healthcare facilities of all types and sizes. These customers want to develop solutions to standardize patient data in FHIR format and build dashboards and advanced analytics to improve patient care, but that is difficult and time consuming today.

Says Pandian Velayutham, Sr. Director Of Engineering at MEDHOST: “With Amazon HealthLake we can meet our customers’ needs by creating a compliant FHIR data store in just days rather than weeks with integrated natural language processing and analytics to improve hospital operational efficiency and provide better patient care.

 

 

Getting Started
Amazon HealthLake is available today in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions.

Give our self-paced workshop a try, and let us know what you think. As always, we look forward to your feedback. You can send it through your usual AWS Support contacts, or post it on the AWS Forums.

– Julien

AI-Piloted Fighter Jets

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/ai-piloted-fighter-jets.html

News from Georgetown’s Center for Security and Emerging Technology:

China Claims Its AI Can Beat Human Pilots in Battle: Chinese state media reported that an AI system had successfully defeated human pilots during simulated dogfights. According to the Global Times report, the system had shot down several PLA pilots during a handful of virtual exercises in recent years. Observers outside China noted that while reports coming out of state-controlled media outlets should be taken with a grain of salt, the capabilities described in the report are not outside the realm of possibility. Last year, for example, an AI agent defeated a U.S. Air Force F-16 pilot five times out of five as part of DARPA’s AlphaDogfight Trial (which we covered at the time). While the Global Times report indicated plans to incorporate AI into future fighter planes, it is not clear how far away the system is from real-world testing. At the moment, the system appears to be used only for training human pilots. DARPA, for its part, is aiming to test dogfights with AI-piloted subscale jets later this year and with full-scale jets in 2023 and 2024.

Amazon CodeGuru Reviewer Updates: New Java Detectors and CI/CD Integration with GitHub Actions

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/amazon_codeguru_reviewer_updates_new_java_detectors_and_cicd_integration_with_github_actions/

Amazon CodeGuru allows you to automate code reviews and improve code quality, and thanks to the new pricing model announced in April you can get started with a lower and fixed monthly rate based on the size of your repository (up to 90% less expensive). CodeGuru Reviewer helps you detect potential defects and bugs that are hard to find in your Java and Python applications, using the AWS Management Console, AWS SDKs, and AWS CLI.

Today, I’m happy to announce that CodeGuru Reviewer natively integrates with the tools that you use every day to package and deploy your code. This new CI/CD experience allows you to trigger code quality and security analysis as a step in your build process using GitHub Actions.

Although the CodeGuru Reviewer console still serves as an analysis hub for all your onboarded repositories, the new CI/CD experience allows you to integrate CodeGuru Reviewer more deeply with your favorite source code management and CI/CD tools.

And that’s not all! Today we’re also releasing 20 new security detectors for Java to help you identify even more issues related to security and AWS best practices.

A New CI/CD Experience for CodeGuru Reviewer
As a developer or development team, you push new code every day and want to identify security vulnerabilities early in the development cycle, ideally at every push. During a pull-request (PR) review, all the CodeGuru recommendations will appear as a comment, as if you had another pair of eyes on the PR. These comments include useful links to help you resolve the problem.

When you push new code or schedule a code review, recommendations will appear in the Security > Code scanning alerts tab on GitHub.

Let’s see how to integrate CodeGuru Reviewer with GitHub Actions.

First of all, create a .yml file in your repository under .github/workflows/ (or update an existing action). This file will contain all your actions’ step. Let’s go through the individual steps.

The first step is configuring your AWS credentials. You want to do this securely, without storing any credentials in your repository’s code, using the Configure AWS Credentials action. This action allows you to configure an IAM role that GitHub will use to interact with AWS services. This role will require a few permissions related to CodeGuru Reviewer and Amazon S3. You can attach the AmazonCodeGuruReviewerFullAccess managed policy to the action role, in addition to s3:GetObject, s3:PutObject and s3:ListBucket.

This first step will look as follows:

- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v1
  with:
    aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
    aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    aws-region: eu-west-1

These access key and secret key correspond to your IAM role and will be used to interact with CodeGuru Reviewer and Amazon S3.

Next, you add the CodeGuru Reviewer action and a final step to upload the results:

- name: Amazon CodeGuru Reviewer Scanner
  uses: aws-actions/codeguru-reviewer
  if: ${{ always() }} 
  with:
    build_path: target # build artifact(s) directory
    s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
- name: Upload review result
  if: ${{ always() }}
  uses: github/codeql-action/upload-sarif@v1
  with:
    sarif_file: codeguru-results.sarif.json

The CodeGuru Reviewer action requires two input parameters:

  • build_path: Where your build artifacts are in the repository.
  • s3_bucket: The name of an S3 bucket that you’ve created previously, used to upload the build artifacts and analysis results. It’s a customer-owned bucket so you have full control over access and permissions, in case you need to share its content with other systems.

Now, let’s put all the pieces together.

Your .yml file should look like this:

name: CodeGuru Reviewer GitHub Actions Integration
on: [pull_request, push, schedule]
jobs:
  CodeGuru-Reviewer-Actions:
    runs-on: ubuntu-latest
    steps:
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-2
	  - name: Amazon CodeGuru Reviewer Scanner
        uses: aws-actions/codeguru-reviewer
        if: ${{ always() }} 
        with:
          build_path: target # build artifact(s) directory
          s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
      - name: Upload review result
        if: ${{ always() }}
        uses: github/codeql-action/upload-sarif@v1
        with:
          sarif_file: codeguru-results.sarif.json

It’s important to remember that the S3 bucket name needs to start with codeguru_reviewer- and that these actions can be configured to run with the pull_request, push, or schedule triggers (check out the GitHub Actions documentation for the full list of events that trigger workflows). Also keep in mind that there are minor differences in how you configure GitHub-hosted runners and self-hosted runners, mainly in the credentials configuration step. For example, if you run your GitHub Actions in a self-hosted runner that already has access to AWS credentials, such as an EC2 instance, then you don’t need to provide any credentials to this action (check out the full documentation for self-hosted runners).

Now when you push a change or open a PR CodeGuru Reviewer will comment on your code changes with a few recommendations.

Or you can schedule a daily or weekly repository scan and check out the recommendations in the Security > Code scanning alerts tab.

New Security Detectors for Java
In December last year, we launched the Java Security Detectors for CodeGuru Reviewer to help you find and remediate potential security issues in your Java applications. These detectors are built with machine learning and automated reasoning techniques, trained on over 100,000 Amazon and open-source code repositories, and based on the decades of expertise of the AWS Application Security (AppSec) team.

For example, some of these detectors will look at potential leaks of sensitive information or credentials through excessively verbose logging, exception handling, and storing passwords in plaintext in memory. The security detectors also help you identify several web application vulnerabilities such as command injection, weak cryptography, weak hashing, LDAP injection, path traversal, secure cookie flag, SQL injection, XPATH injection, and XSS (cross-site scripting).

The new security detectors for Java can identify security issues with the Java Servlet APIs and web frameworks such as Spring. Some of the new detectors will also help you with security best practices for AWS APIs when using services such as Amazon S3, IAM, and AWS Lambda, as well as libraries and utilities such as Apache ActiveMQ, LDAP servers, SAML parsers, and password encoders.

Available Today at No Additional Cost
The new CI/CD integration and security detectors for Java are available today at no additional cost, excluding the storage on S3 which can be estimated based on size of your build artifacts and the frequency of code reviews. Check out the CodeGuru Reviewer Action in the GitHub Marketplace and the Amazon CodeGuru pricing page to find pricing examples based on the new pricing model we launched last month.

We’re looking forward to hearing your feedback, launching more detectors to help you identify potential issues, and integrating with even more CI/CD tools in the future.

You can learn more about the CI/CD experience and configuration in the technical documentation.

Alex

Amazon SageMaker Named as the Outright Leader in Enterprise MLOps Platforms

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms/

Over the last few years, Machine Learning (ML) has proven its worth in helping organizations increase efficiency and foster innovation. As ML matures, the focus naturally shifts from experimentation to production. ML processes need to be streamlined, standardized, and automated to build, train, deploy, and manage models in a consistent and reliable way. Perennial IT concerns such as security, high availability, scaling, monitoring, and automation also become critical. Great ML models are not going to do much good if they can’t serve fast and accurate predictions to business applications, 24/7 and at any scale.

In November 2017, we launched Amazon SageMaker to help ML Engineers and Data Scientists not only build the best models, but also operate them efficiently. Striving to give our customers the most comprehensive service, we’ve since then added hundreds of features covering every step of the ML lifecycle, such as data labeling, data preparation, feature engineering, bias detection, AutoML, training, tuning, hosting, explainability, monitoring, and automation. We’ve also integrated these features in our web-based development environment, Amazon SageMaker Studio.

Thanks to the extensive ML capabilities available in SageMaker, tens of thousands of AWS customers across all industry segments have adopted ML to accelerate business processes, create innovative user experiences, improve revenue, and reduce costs. Examples include Engie (energy), Deliveroo (food delivery), SNCF (railways), Nerdwallet (financial services), Autodesk (computer-aided design), Formula 1 (auto racing), as well as our very own Amazon Fulfillment Technologies and Amazon Robotics.

Today, we’re happy to announce that in his latest report on Enterprise MLOps Platforms, Bradley Shimmin, Chief Analyst at Omdia, paid SageMaker this compliment: “AWS is the outright leader in the Omdia comparative review of enterprise MLOps platforms. Across almost every measure, the company significantly outscored its rivals, delivering consistent value across the entire ML lifecycle. AWS delivers highly differentiated functionality that targets highly impactful areas of concern for enterprise AI practitioners seeking to not just operationalize but also scale AI across the business.

OMDIA

You can download the full report to learn more.

Getting Started
Curious about Amazon SageMaker? The developer guide will show you how to set it up and start running your notebooks in minutes.

As always, we look forward to your feedback. You can send it through your usual AWS Support contacts or post it on the AWS Forum for Amazon SageMaker.

– Julien

Amazon Redshift ML Is Now Generally Available – Use SQL to Create Machine Learning Models and Make Predictions from Your Data

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-redshift-ml-is-now-generally-available-use-sql-to-create-machine-learning-models-and-make-predictions-from-your-data/

With Amazon Redshift, you can use SQL to query and combine exabytes of structured and semi-structured data across your data warehouse, operational databases, and data lake. Now that AQUA (Advanced Query Accelerator) is generally available, you can improve the performance of your queries by up to 10 times with no additional costs and no code changes. In fact, Amazon Redshift provides up to three times better price/performance than other cloud data warehouses.

But what if you want to go a step further and process this data to train machine learning (ML) models and use these models to generate insights from data in your warehouse? For example, to implement use cases such as forecasting revenue, predicting customer churn, and detecting anomalies? In the past, you would need to export the training data from Amazon Redshift to an Amazon Simple Storage Service (Amazon S3) bucket, and then configure and start a machine learning training process (for example, using Amazon SageMaker). This process required many different skills and usually more than one person to complete. Can we make it easier?

Today, Amazon Redshift ML is generally available to help you create, train, and deploy machine learning models directly from your Amazon Redshift cluster. To create a machine learning model, you use a simple SQL query to specify the data you want to use to train your model, and the output value you want to predict. For example, to create a model that predicts the success rate for your marketing activities, you define your inputs by selecting the columns (in one or more tables) that include customer profiles and results from previous marketing campaigns, and the output column you want to predict. In this example, the output column could be one that shows whether a customer has shown interest in a campaign.

After you run the SQL command to create the model, Redshift ML securely exports the specified data from Amazon Redshift to your S3 bucket and calls Amazon SageMaker Autopilot to prepare the data (pre-processing and feature engineering), select the appropriate pre-built algorithm, and apply the algorithm for model training. You can optionally specify the algorithm to use, for example XGBoost.

Architectural diagram.

Redshift ML handles all of the interactions between Amazon Redshift, S3, and SageMaker, including all the steps involved in training and compilation. When the model has been trained, Redshift ML uses Amazon SageMaker Neo to optimize the model for deployment and makes it available as a SQL function. You can use the SQL function to apply the machine learning model to your data in queries, reports, and dashboards.

Redshift ML now includes many new features that were not available during the preview, including Amazon Virtual Private Cloud (VPC) support. For example:

Architectural diagram.

  • You can also create SQL functions that use existing SageMaker endpoints to make predictions (remote inference). In this case, Redshift ML is batching calls to the endpoint to speed up processing.

Before looking into how to use these new capabilities in practice, let’s see the difference between Redshift ML and similar features in AWS databases and analytics services.

ML Feature Data Training
from SQL
Predictions
using SQL Functions
Amazon Redshift ML

Data warehouse

Federated relational databases

S3 data lake (with Redshift Spectrum)

Yes, using
Amazon SageMaker Autopilot
Yes, a model can be imported and executed inside the Amazon Redshift cluster, or invoked using a SageMaker endpoint.
Amazon Aurora ML Relational database
(compatible with MySQL or PostgreSQL)
No

Yes, using a SageMaker endpoint.

A native integration with Amazon Comprehend for sentiment analysis is also available.

Amazon Athena ML

S3 data lake

Other data sources can be used through Athena Federated Query.

No Yes, using a SageMaker endpoint.

Building a Machine Learning Model with Redshift ML
Let’s build a model that predicts if customers will accept or decline a marketing offer.

To manage the interactions with S3 and SageMaker, Redshift ML needs permissions to access those resources. I create an AWS Identity and Access Management (IAM) role as described in the documentation. I use RedshiftML for the role name. Note that the trust policy of the role allows both Amazon Redshift and SageMaker to assume the role to interact with other AWS services.

From the Amazon Redshift console, I create a cluster. In the cluster permissions, I associate the RedshiftML IAM role. When the cluster is available, I load the same dataset used in this super interesting blog post that my colleague Julien wrote when SageMaker Autopilot was announced.

The file I am using (bank-additional-full.csv) is in CSV format. Each line describes a direct marketing activity with a customer. The last column (y) describes the outcome of the activity (if the customer subscribed to a service that was marketed to them).

Here are the first few lines of the file. The first line contains the headers.

age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y 56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
57,services,married,high.school,unknown,no,no,telephone,may,mon,149,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no

I store the file in one of my S3 buckets. The S3 bucket is used to unload data and store SageMaker training artifacts.

Then, using the Amazon Redshift query editor in the console, I create a table to load the data.

CREATE TABLE direct_marketing (
	age DECIMAL NOT NULL, 
	job VARCHAR NOT NULL, 
	marital VARCHAR NOT NULL, 
	education VARCHAR NOT NULL, 
	credit_default VARCHAR NOT NULL, 
	housing VARCHAR NOT NULL, 
	loan VARCHAR NOT NULL, 
	contact VARCHAR NOT NULL, 
	month VARCHAR NOT NULL, 
	day_of_week VARCHAR NOT NULL, 
	duration DECIMAL NOT NULL, 
	campaign DECIMAL NOT NULL, 
	pdays DECIMAL NOT NULL, 
	previous DECIMAL NOT NULL, 
	poutcome VARCHAR NOT NULL, 
	emp_var_rate DECIMAL NOT NULL, 
	cons_price_idx DECIMAL NOT NULL, 
	cons_conf_idx DECIMAL NOT NULL, 
	euribor3m DECIMAL NOT NULL, 
	nr_employed DECIMAL NOT NULL, 
	y BOOLEAN NOT NULL
);

I load the data into the table using the COPY command. I can use the same IAM role I created earlier (RedshiftML) because I am using the same S3 bucket to import and export the data.

COPY direct_marketing 
FROM 's3://my-bucket/direct_marketing/bank-additional-full.csv' 
DELIMITER ',' IGNOREHEADER 1
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
REGION 'us-east-1';

Now, I create the model straight form the SQL interface using the new CREATE MODEL statement:

CREATE MODEL direct_marketing
FROM direct_marketing
TARGET y
FUNCTION predict_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
SETTINGS (
  S3_BUCKET 'my-bucket'
);

In this SQL command, I specify the parameters required to create the model:

  • FROM – I select all the rows in the direct_marketing table, but I can replace the name of the table with a nested query (see example below).
  • TARGET – This is the column that I want to predict (in this case, y).
  • FUNCTION – The name of the SQL function to make predictions.
  • IAM_ROLE – The IAM role assumed by Amazon Redshift and SageMaker to create, train, and deploy the model.
  • S3_BUCKET – The S3 bucket where the training data is temporarily stored, and where model artifacts are stored if you choose to retain a copy of them.

Here I am using a simple syntax for the CREATE MODEL statement. For more advanced users, other options are available, such as:

  • MODEL_TYPE – To use a specific model type for training, such as XGBoost or multilayer perceptron (MLP). If I don’t specify this parameter, SageMaker Autopilot selects the appropriate model class to use.
  • PROBLEM_TYPE – To define the type of problem to solve: regression, binary classification, or multiclass classification. If I don’t specify this parameter, the problem type is discovered during training, based on my data.
  • OBJECTIVE – The objective metric used to measure the quality of the model. This metric is optimized during training to provide the best estimate from data. If I don’t specify a metric, the default behavior is to use mean squared error (MSE) for regression, the F1 score for binary classification, and accuracy for multiclass classification. Other available options are F1Macro (to apply F1 scoring to multiclass classification) and area under the curve (AUC). More information on objective metrics is available in the SageMaker documentation.

Depending on the complexity of the model and the amount of data, it can take some time for the model to be available. I use the SHOW MODEL command to see when it is available:

SHOW MODEL direct_marketing

When I execute this command using the query editor in the console, I get the following output:

Console screenshot.

As expected, the model is currently in the TRAINING state.

When I created this model, I selected all the columns in the table as input parameters. I wonder what happens if I create a model that uses fewer input parameters? I am in the cloud and I am not slowed down by limited resources, so I create another model using a subset of the columns in the table:

CREATE MODEL simple_direct_marketing
FROM (
        SELECT age, job, marital, education, housing, contact, month, day_of_week, y
 	  FROM direct_marketing
)
TARGET y
FUNCTION predict_simple_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
SETTINGS (
  S3_BUCKET 'my-bucket'
);

After some time, my first model is ready, and I get this output from SHOW MODEL. The actual output in the console is in multiple pages, I merged the results here to make it easier to follow:

Console screenshot.

From the output, I see that the model has been correctly recognized as BinaryClassification, and F1 has been selected as the objective. The F1 score is a metrics that considers both precision and recall. It returns a value between 1 (perfect precision and recall) and 0 (lowest possible score). The final score for the model (validation:f1) is 0.79. In this table I also find the name of the SQL function (predict_direct_marketing) that has been created for the model, its parameters and their types, and an estimation of the training costs.

When the second model is ready, I compare the F1 scores. The F1 score of the second model is lower (0.66) than the first one. However, with fewer parameters the SQL function is easier to apply to new data. As is often the case with machine learning, I have to find the right balance between complexity and usability.

Using Redshift ML to Make Predictions
Now that the two models are ready, I can make predictions using SQL functions. Using the first model, I check how many false positives (wrong positive predictions) and false negatives (wrong negative predictions) I get when applying the model on the same data used for training:

SELECT predict_direct_marketing, y, COUNT(*)
  FROM (SELECT predict_direct_marketing(
                   age, job, marital, education, credit_default, housing,
                   loan, contact, month, day_of_week, duration, campaign,
                   pdays, previous, poutcome, emp_var_rate, cons_price_idx,
                   cons_conf_idx, euribor3m, nr_employed), y
          FROM direct_marketing)
 GROUP BY predict_direct_marketing, y;

The result of the query shows that the model is better at predicting negative rather than positive outcomes. In fact, even if the number of true negatives is much bigger than true positives, there are much more false positives than false negatives. I added some comments in green and red to the following screenshot to clarify the meaning of the results.

Console screenshot.

Using the second model, I see how many customers might be interested in a marketing campaign. Ideally, I should run this query on new customer data, not the same data I used for training.

SELECT COUNT(*)
  FROM direct_marketing
 WHERE predict_simple_direct_marketing(
           age, job, marital, education, housing,
           contact, month, day_of_week) = true;

Wow, looking at the results, there are more than 7,000 prospects!

Console screenshot.

Availability and Pricing
Redshift ML is available today in the following AWS Regions: US East (Ohio), US East (N Virginia), US West (Oregon), US West (San Francisco), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm), Asia Pacific (Hong Kong) Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and South America (São Paulo). For more information, see the AWS Regional Services list.

With Redshift ML, you pay only for what you use. When training a new model, you pay for the Amazon SageMaker Autopilot and S3 resources used by Redshift ML. When making predictions, there is no additional cost for models imported into your Amazon Redshift cluster, as in the example I used in this post.

Redshift ML also allows you to use existing Amazon SageMaker endpoints for inference. In that case, the usual SageMaker pricing for real-time inference applies. Here you can find a few tips on how to control your costs with Redshift ML.

To learn more, you can see this blog post from when Redshift ML was announced in preview and the documentation.

Start getting better insights from your data with Redshift ML.

Danilo

Forwarding emails automatically based on content with Amazon Simple Email Service

Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/forwarding-emails-automatically-based-on-content-with-amazon-simple-email-service/

Introduction

Email is one of the most popular channels consumers use to interact with support organizations. In its most basic form, consumers will send their email to a catch-all email address where it is further dispatched to the correct support group. Often, this requires a person to inspect content manually. Some IT organizations even have a dedicated support group that handles triaging the incoming emails before assigning them to specialized support teams. Triaging each email can be challenging, and delays in email routing and support processes can reduce customer satisfaction. By utilizing Amazon Simple Email Service’s deep integration with Amazon S3, AWS Lambda, and other AWS services, the task of categorizing and routing emails is automated. This automation results in increased operational efficiencies and reduced costs.

This blog post shows you how a serverless application will receive emails with Amazon SES and deliver them to an Amazon S3 bucket. The application uses Amazon Comprehend to identify the dominant language from the message body.  It then looks it up in an Amazon DynamoDB table to find the support group’s email address specializing in the email subject. As the last step, it forwards the email via Amazon SES to its destination. Archiving incoming emails to Amazon S3 also enables further processing or auditing.

Architecture

By completing the steps in this post, you will create a system that uses the architecture illustrated in the following image:

Architecture showing how to forward emails by content using Amazon SES

The flow of events starts when a customer sends an email to the generic support email address like [email protected]. This email is listened to by Amazon SES via a recipient rule. As per the rule, incoming messages are written to a specified Amazon S3 bucket with a given prefix.

This bucket and prefix are configured with S3 Events to trigger a Lambda function on object creation events. The Lambda function reads the email object, parses the contents, and sends them to Amazon Comprehend for language detection.

Amazon DynamoDB looks up the detected language code from an Amazon DynamoDB table, which includes the mappings between language codes and support group email addresses for these languages. One support group could answer English emails, while another support group answers French emails. The Lambda function determines the destination address and re-sends the same email address by performing an email forward operation. Suppose the lookup does not return any destination address, or the language was not be detected. In that case, the email is forwarded to a catch-all email address specified during the application deployment.

In this example, Amazon SES hosts the destination email addresses used for forwarding, but this is not a requirement. External email servers will also receive the forwarded emails.

Prerequisites

To use Amazon SES for receiving email messages, you need to verify a domain that you own. Refer to the documentation to verify your domain with Amazon SES console. If you do not have a domain name, you will register one from Amazon Route 53.

Deploying the Sample Application

Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS Identity and Access Management (IAM) user.

You will use AWS SAM to deploy the remaining parts of this serverless architecture.

The AWS SAM template creates the following resources:

  • An Amazon DynamoDB mapping table (language-lookup) contains information about language codes and associates them with destination email addresses.
  • An AWS Lambda function (BlogEmailForwarder) that reads the email content parses it, detects the language, looks up the forwarding destination email address, and sends it.
  • An Amazon S3 bucket, which will store the incoming emails.
  • IAM roles and policies.

To start the AWS SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. You will refer to the documentation to learn how to create an Amazon S3 bucket. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

At the command line, enter the following command to package the application:

sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

AWS SAM packages the application and copies it into this Amazon S3 bucket.

When the AWS SAM package command finishes running, enter the following command to deploy the package:

sam deploy --template-file output_template.yaml --stack-name blogstack --capabilities CAPABILITY_IAM --parameter-overrides FromEmailAddress=info@ YOUR_DOMAIN_NAME_HERE CatchAllEmailAddress=catchall@ YOUR_DOMAIN_NAME_HERE

In the preceding command, change the YOUR_DOMAIN_NAME_HERE with the domain name you validated with Amazon SES. This domain also applies to other commands and configurations that will be introduced later.

This example uses “blogstack” as the stack name, you will change this to any other name you want. When you run this command, AWS SAM shows the progress of the deployment.

Configure the Sample Application

Now that you have deployed the application, you will configure it.

Configuring Receipt Rules

To deliver incoming messages to Amazon S3 bucket, you need to create a Rule Set and a Receipt rule under it.

Note: This blog uses Amazon SES console to create the rule sets. To create the rule sets with AWS CloudFormation, refer to the documentation.

  1. Navigate to the Amazon SES console. From the left navigation choose Rule Sets.
  2. Choose Create a Receipt Rule button at the right pane.
  3. Add info@YOUR_DOMAIN_NAME_HERE as the first recipient addresses by entering it into the text box and choosing Add Recipient.

 

 

Choose the Next Step button to move on to the next step.

  1. On the Actions page, select S3 from the Add action drop-down to reveal S3 action’s details. Select the S3 bucket that was created by the AWS SAM template. It is in the format of your_stack_name-inboxbucket-randomstring. You will find the exact name in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console. Set the Object key prefix to info/. This tells Amazon SES to add this prefix to all messages destined to this recipient address. This way, you will re-use the same bucket for different recipients.

Choose the Next Step button to move on to the next step.

In the Rule Details page, give this rule a name at the Rule name field. This example uses the name info-recipient-rule. Leave the rest of the fields with their default values.

Choose the Next Step button to move on to the next step.

  1. Review your settings on the Review page and finalize rule creation by choosing Create Rule

  1. In this example, you will be hosting the destination email addresses in Amazon SES rather than forwarding the messages to an external email server. This way, you will be able to see the forwarded messages in your Amazon S3 bucket under different prefixes. To host the destination email addresses, you need to create different rules under the default rule set. Create three additional rules for catchall@YOUR_DOMAIN_NAME_HERE , english@ YOUR_DOMAIN_NAME_HERE and french@YOUR_DOMAIN_NAME_HERE email addresses by repeating the steps 2 to 5. For Amazon S3 prefixes, use catchall/, english/, and french/ respectively.

 

Configuring Amazon DynamoDB Table

To configure the Amazon DynamoDB table that is used by the sample application

  1. Navigate to Amazon DynamoDB console and reach the tables view. Inspect the table created by the AWS SAM application.

language-lookup table is the table where languages and their support group mappings are kept. You need to create an item for each language, and an item that will hold the default destination email address that will be used in case no language match is found. Amazon Comprehend supports more than 60 different languages. You will visit the documentation for the supported languages and add their language codes to this lookup table to enhance this application.

  1. To start inserting items, choose the language-lookup table to open table overview page.
  2. Select the Items tab and choose the Create item From the dropdown, select Text. Add the following JSON content and choose Save to create your first mapping object. While adding the following object, replace Destination attribute’s value with an email address you own. The email messages will be forwarded to that address.

{

  “language”: “en”,

  “destination”: “english@YOUR_DOMAIN_NAME_HERE”

}

Lastly, create an item for French language support.

{

  “language”: “fr”,

  “destination”: “french@YOUR_DOMAIN_NAME_HERE”

}

Testing

Now that the application is deployed and configured, you will test it.

  1. Use your favorite email client to send the following email to the domain name info@ email address.

Subject: I need help

Body:

Hello, I’d like to return the shoes I bought from your online store. How can I do this?

After the email is sent, navigate to the Amazon S3 console to inspect the contents of the Amazon S3 bucket that is backing the Amazon SES Rule Sets. You will also see the AWS Lambda logs from the Amazon CloudWatch console to confirm that the Lambda function is triggered and run successfully. You should receive an email with the same content at the address you defined for the English language.

  1. Next, send another email with the same content, this time in French language.

Subject: j’ai besoin d’aide

Body:

Bonjour, je souhaite retourner les chaussures que j’ai achetées dans votre boutique en ligne. Comment puis-je faire ceci?

 

Suppose a message is not matched to a language in the lookup table. In that case, the Lambda function will forward it to the catchall email address that you provided during the AWS SAM deployment.

You will inspect the new email objects under english/, french/ and catchall/ prefixes to observe the forwarding behavior.

Continue experimenting with the sample application by sending different email contents to info@ YOUR_DOMAIN_NAME_HERE address or adding other language codes and email address combinations into the mapping table. You will find the available languages and their codes in the documentation. When adding a new language support, don’t forget to associate a new email address and Amazon S3 bucket prefix by defining a new rule.

Cleanup

To clean up the resources you used in your account,

  1. Navigate to the Amazon S3 console and delete the inbox bucket’s contents. You will find the name of this bucket in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console.
  2. Navigate to AWS CloudFormation console and delete the stack named “blogstack”.
  3. After the stack is deleted, remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
  4. From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
  5. On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
  6. On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
  7. Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the BlogEmailForwarderFunction resource and delete it by selecting it and choosing Actions, Delete log group(s).
  8. You will also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.

 

Conclusion

This solution shows how to use Amazon SES to classify email messages by the dominant content language and forward them to respective support groups. You will use the same techniques to implement similar scenarios. You will forward emails based on custom key entities, like product codes, or you will remove PII information from emails before forwarding with Amazon Comprehend.

With its native integrations with AWS services, Amazon SES allows you to enhance your email applications with different AWS Cloud capabilities easily.

To learn more about email forwarding with Amazon SES, you will visit documentation and AWS blogs.

AWS DeepRacer League’s 2021 Season Launches With New Open and Pro Divisions

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-deepracer-leagues-2021-season-launches-with-new-open-and-pro-divisions/

AWS DeepRacer League LogoAs a developer, I have been hearing a lot of stories lately about how companies have solved their business problems using machine learning (ML), so one of my goals for 2021 is to learn more about it.

For the last few years I have been using artificial intelligence (AI) services such as, Amazon Rekognition, Amazon Comprehend, and others extensively. AI services provide a simple API to solve common ML problems such as image recognition, text to speech, and analysis of sentiment in a text. When using these high-level APIs, you don’t need to understand how the underlying ML model works, nor do you have to train it, or maintain it in any way.

Even though those services are great and I can solve most of my business cases with them, I want to understand how ML algorithms work, and that is how I started tinkering with AWS DeepRacer.

AWS DeepRacer, a service that helps you learn reinforcement learning (RL), has been around since 2018. RL is an advanced ML technique that takes a very different approach to training models than other ML methods. Basically, it can learn very complex behavior without requiring any labeled training data, and it can make short-term decisions while optimizing for a long-term goal.

AWS DeepRacer is an autonomous 1/18th scale race car designed to test RL models by racing virtually in the AWS DeepRacer console or physically on a track at AWS and customer events. AWS DeepRacer is for developers of all skill levels, even if you don’t have any ML experience. When learning RL using AWS DeepRacer, you can take part in the AWS DeepRacer League where you get experience with machine learning in a fun and competitive environment.

Over the past year, the AWS DeepRacer League’s races have gone completely virtual and participants have competed for different kinds of prizes. However, the competition has become dominated by experts and newcomers haven’t had much of a chance to win.

The 2021 season introduces new skill-based Open and Pro racing divisions, where racers of all skill levels have five times more opportunities to win rewards than in previous seasons.

Image of the leagues in the console

How the New AWS DeepRacer Racing Divisions Work

The 2021 AWS DeepRacer league runs from March 1 through the end of October. When it kicks off, all participants will enter the Open division, a place to have fun and develop your RL knowledge with other community members.

At the end of every month, the top 10% of the Open division leaderboard will advance to the Pro division for the remainder of the season; they’ll also receive a Pro Welcome kit full of AWS DeepRacer swag. Pro division racers can win DeepRacer Evo cars and AWS DeepRacer merchandise such as hats and T-shirts.

At the end of every month, the top 16 racers in the Pro division will compete against each other in a live race console. That race will determine who will advance that month to the 2021 Championship Cup at re:Invent 2021.

The monthly Pro division winner gets an expenses-paid trip to re:Invent 2021 and participates in the Championship Cup to get a chance to win a Machine Learning education sponsorship worth $20k.

In both divisions, you can collect digital rewards, including vehicle customizations and accessories which will be released to participants once the winners are announced each month. 

You can start racing in the Open division any time during the 2021 season. Get started here!

Image of my racer profileNew Racer Profiles Increase the Fun

At the end of March, you will be able to create a new racer profile with an avatar and show the world which country you are representing.

I hope to see you in the new AWS DeepRacer season, where I’ll start in the Open division as MaVi.

Start racing today and train your first model for free! 

Marcia

Improving the CPU and latency performance of Amazon applications using AWS CodeGuru Profiler

Post Syndicated from Neha Gupta original https://aws.amazon.com/blogs/devops/improving-the-cpu-and-latency-performance-of-amazon-applications-using-aws-codeguru-profiler/

Amazon CodeGuru Profiler is a developer tool powered by machine learning (ML) that helps identify an application’s most expensive lines of code and provides intelligent recommendations to optimize it. You can identify application performance issues and troubleshoot latency and CPU utilization issues in your application.

You can use CodeGuru Profiler to optimize performance for any application running on AWS Lambda, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), AWS Fargate, or AWS Elastic Beanstalk, and on premises.

This post gives a high-level overview of how CodeGuru Profiler has reduced CPU usage and latency by approximately 50% and saved around $100,000 a year for a particular Amazon retail service.

Technical and business value of CodeGuru Profiler

CodeGuru Profiler is easy and simple to use, just turn it on and start using it. You can keep it running in the background and you can just look into the CodeGuru Profiler findings and implement the relevant changes.

It’s fairly low cost and unlike traditional tools that take up lot of CPU and RAM, running CodeGuru Profiler has less than 1% impact on total CPU usage overhead to applications and typically uses no more than 100 MB of memory.

You can run it in a pre-production environment to test changes to ensure no impact occurs on your application’s key metrics.

It automatically detects performance anomalies in the application stack traces that start consuming more CPU or show increased latency. It also provides visualizations and recommendations on how to fix performance issues and the estimated cost of running inefficient code. Detecting the anomalies early prevents escalating the issue in production. This helps you prioritize remediation by giving you enough time to fix the issue before it impacts your service’s availability and your customers’ experience.

How we used CodeGuru Profiler at Amazon

Amazon has on-boarded many of its applications to CodeGuru Profiler, which has resulted in an annual savings of millions of dollars and latency improvements. In this post, we discuss how we used CodeGuru Profiler on an Amazon Prime service. A simple code change resulted in saving around $100,000 for the year.

Opportunity to improve

After a change to one of our data sources that caused its payload size to increase, we expected a slight increase to our service latency, but what we saw was higher than expected. Because CodeGuru Profiler is easy to integrate, we were able to quickly make and deploy the changes needed to get it running on our production environment.

After loading up the profile in Amazon CodeGuru Profiler, it was immediately apparent from the visualization that a very large portion of the service’s CPU time was being taken up by Jackson deserialization (37%, across the two call sites). It was also interesting that most of the blocking calls in the program (in blue) was happening in the jackson.databind method _createAndCacheValueDeserializer.

Flame graphs represent the relative amount of time that the CPU spends at each point in the call graph. The wider it is, the more CPU usage it corresponds to.

The following flame graph is from before the performance improvements were implemented.

The Flame Graph before the deployment

Looking at the source for _createAndCacheValueDeserializer confirmed that there was a synchronized block. From within it, _createAndCache2 was called, which actually did the adding to the cache. Adding to the cache was guarded by a boolean condition which had a comment that indicated that caching would only be enabled for custom serializers if @JsonCachable was set.

Solution

Checking the documentation for @JsonCachable confirmed that this annotation looked like the correct solution for this performance issue. After we deployed a quick change to add @JsonCachable to our four custom deserializers, we observed that no visible time was spent in _createAndCacheValueDeserializer.

Results

Adding a one-line annotation in four different places made the code run twice as fast. Because it was holding a lock while it recreated the same deserializers for every call, this was allowing only one of the four CPU cores to be used and therefore causing latency and inefficiency. Reusing the deserializers avoided repeated work and saved us lot of resources.

After the CodeGuru Profiler recommendations were implemented, the amount of CPU spent in Jackson reduced from 37% to 5% across the two call paths, and there was no visible blocking. With the removal of the blocking, we could run higher load on our hosts and reduce the fleet size, saving approximately $100,000 a year in Amazon EC2 costs, thereby resulting in overall savings.

The following flame graph shows performance after the deployment.

The Flame Graph after the deployment

Metrics

The following graph shows that CPU usage reduced by almost 50%. The blue line shows the CPU usage the week before we implemented CodeGuru Profiler recommendations, and green shows the dropped usage after deploying. We could later safely scale down the fleet to reduce costs, while still having better performance than prior to the change.

Average Fleet CPU Utilization

 

The following graph shows the server latency, which also dropped by almost 50%. The latency dropped from 100 milliseconds to 50 milliseconds as depicted in the initial portion of the graph. The orange line depicts p99, green p99.9, and blue p50 (mean latency).

Server Latency

 

Conclusion

With a few lines of changed code and a half-hour investigation, we removed the bottleneck which led to lower utilization of resources and  thus we were able to decrease the fleet size. We have seen many similar cases, and in one instance, a change of literally six characters of inefficient code, reduced CPU usage from 99% to 5%.

Across Amazon, CodeGuru Profiler has been used internally among various teams and resulted in millions of dollars of savings and performance optimization. You can use CodeGuru Profiler for quick insights into performance issues of your application. The more efficient the code and application is, the less costly it is to run. You can find potential savings for any application running in production and significantly reduce infrastructure costs using CodeGuru Profiler. Reducing fleet size, latency, and CPU usage is a major win.

 

 

About the Authors

Neha Gupta

Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.

Ian Clark

Ian is a Senior Software engineer with the Last Mile organization at Amazon. In his spare time, he enjoys exploring the Vancouver area with his family.

Machine learning and depth estimation using Raspberry Pi

Post Syndicated from David Plowman original https://www.raspberrypi.org/blog/machine-learning-and-depth-estimation-using-raspberry-pi/

One of our engineers, David Plowman, describes machine learning and shares news of a Raspberry Pi depth estimation challenge run by ETH Zürich (Swiss Federal Institute of Technology).

Spoiler alert – it’s all happening virtually, so you can definitely make the trip and attend, or maybe even enter yourself.

What is Machine Learning?

Machine Learning (ML) and Artificial Intelligence (AI) are some of the top engineering-related buzzwords of the moment, and foremost among current ML paradigms is probably the Artificial Neural Network (ANN).

They involve millions of tiny calculations, merged together in a giant biologically inspired network – hence the name. These networks typically have millions of parameters that control each calculation, and they must be optimised for every different task at hand.

This process of optimising the parameters so that a given set of inputs correctly produces a known set of outputs is known as training, and is what gives rise to the sense that the network is “learning”.

A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel
A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel

Machine Learning frameworks

A number of well known companies produce free ML frameworks that you can download and use on your own computer. The network training procedure runs best on machines with powerful CPUs and GPUs, but even using one of these pre-trained networks (known as inference) can be quite expensive.

One of the most popular frameworks is Google’s TensorFlow (TF), and since this is rather resource intensive, they also produce a cut-down version optimised for less powerful platforms. This is TensorFlow Lite (TFLite), which can be run effectively on Raspberry Pi.

Depth estimation

ANNs have proven very adept at a wide variety of image processing tasks, most notably object classification and detection, but also depth estimation. This is the process of taking one or more images and working out how far away every part of the scene is from the camera, producing a depth map.

Here’s an example:

Depth estimation example using a truck

The image on the right shows, by the brightness of each pixel, how far away the objects in the original (left-hand) image are from the camera (darker = nearer).

We distinguish between stereo depth estimation, which starts with a stereo pair of images (taken from marginally different viewpoints; here, parallax can be used to inform the algorithm), and monocular depth estimation, working from just a single image.

The applications of such techniques should be clear, ranging from robots that need to understand and navigate their environments, to the fake bokeh effects beloved of many modern smartphone cameras.

Depth Estimation Challenge

C V P R conference logo with dark blue background and the edge of the earth covered in scattered orange lights connected by white lines

We were very interested then to learn that, as part of the CVPR (Computer Vision and Pattern Recognition) 2021 conference, Andrey Ignatov and Radu Timofte of ETH Zürich were planning to run a Monocular Depth Estimation Challenge. They are specifically targeting the Raspberry Pi 4 platform running TFLite, and we are delighted to support this effort.

For more information, or indeed if any technically minded readers are interested in entering the challenge, please visit:

The conference and workshops are all taking place virtually in June, and we’ll be sure to update our blog with some of the results and models produced for Raspberry Pi 4 by the competing teams. We wish them all good luck!

The post Machine learning and depth estimation using Raspberry Pi appeared first on Raspberry Pi.

Improving AWS Java applications with Amazon CodeGuru Reviewer

Post Syndicated from Rajdeep Mukherjee original https://aws.amazon.com/blogs/devops/improving-aws-java-applications-with-amazon-codeguru-reviewer/

Amazon CodeGuru Reviewer is a machine learning (ML)-based AWS service for providing automated code reviews comments on your Java and Python applications. Powered by program analysis and ML, CodeGuru Reviewer detects hard-to-find bugs and inefficiencies in your code and leverages best practices learned from across millions of lines of open-source and Amazon code. You can start analyzing your code through pull requests and full repository analysis (for more information, see Automating code reviews and application profiling with Amazon CodeGuru).

The recommendations generated by CodeGuru Reviewer for Java fall into the following categories:

  • AWS best practices
  • Concurrency
  • Security
  • Resource leaks
  • Other specialized categories such as sensitive information leaks, input validation, and code clones
  • General best practices on data structures, control flow, exception handling, and more

We expect the recommendations to benefit beginners as well as expert Java programmers.

In this post, we showcase CodeGuru Reviewer recommendations related to using the AWS SDK for Java. For in-depth discussion of other specialized topics, see our posts on concurrency, security, and resource leaks. For Python applications, see Raising Python code quality using Amazon CodeGuru.

The AWS SDK for Java simplifies the use of AWS services by providing a set of features that are consistent and familiar for Java developers. The SDK has more than 250 AWS service clients, which are available on GitHub. Service clients include services like Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Kinesis, Amazon Elastic Compute Cloud (Amazon EC2), AWS IoT, and Amazon SageMaker. These services constitute more than 6,000 operations, which you can use to access AWS services. With such rich and diverse services and APIs, developers may not always be aware of the nuances of AWS API usage. These nuances may not be important at the beginning, but become critical as the scale increases and the application evolves or becomes diverse. This is why CodeGuru Reviewer has a category of recommendations: AWS best practices. This category of recommendations enables you to become aware of certain features of AWS APIs so your code can be more correct and performant.

The first part of this post focuses on the key features of the AWS SDK for Java as well as API patterns in AWS services. The second part of this post demonstrates using CodeGuru Reviewer to improve code quality for Java applications that use the AWS SDK for Java.

AWS SDK for Java

The AWS SDK for Java supports higher-level abstractions for simplified development and provides support for cross-cutting concerns such as credential management, retries, data marshaling, and serialization. In this section, we describe a few key features that are supported in the AWS SDK for Java. Additionally, we discuss some key API patterns such as batching, and pagination, in AWS services.

The AWS SDK for Java has the following features:

  • Waiters Waiters are utility methods that make it easy to wait for a resource to transition into a desired state. Waiters makes it easier to abstract out the polling logic into a simple API call. The waiters interface provides a custom delay strategy to control the sleep time between retries, as well as a custom condition on whether polling of a resource should be retried. The AWS SDK for Java also offer an async variant of waiters.
  • Exceptions The AWS SDK for Java uses runtime (or unchecked) exceptions instead of checked exceptions in order to give you fine-grained control over the errors you want to handle and to prevent scalability issues inherent with checked exceptions in large applications. Broadly, the AWS SDK for Java has two types of exceptions:
    • AmazonClientException – Indicates that a problem occurred inside the Java client code, either while trying to send a request to AWS or while trying to parse a response from AWS. For example, the AWS SDK for Java throws an AmazonClientException if no network connection is available when you try to call an operation on one of the clients.
    • AmazonServiceException – Represents an error response from an AWS service. For example, if you try to end an EC2 instance that doesn’t exist, Amazon EC2 returns an error response, and all the details of that response are included in the AmazonServiceException that’s thrown. For some cases, a subclass of AmazonServiceException is thrown to allow you fine-grained control over handling error cases through catch blocks.

The API has the following patterns:

  • Batching – A batch operation provides you with the ability to perform a single CRUD operation (create, read, update, delete) on multiple resources. Some typical use cases include the following:
  • Pagination – Many AWS operations return paginated results when the response object is too large to return in a single response. To enable you to perform pagination, the request and response objects for many service clients in the SDK provide a continuation token (typically named NextToken) to indicate additional results.

AWS best practices

Now that we have summarized the SDK-specific features and API patterns, let’s look at the CodeGuru Reviewer recommendations on AWS API use.

The CodeGuru Reviewer recommendations for the AWS SDK for Java range from detecting outdated or deprecated APIs to warning about API misuse, missing pagination, authentication and exception scenarios, and using efficient API alternatives. In this section, we discuss a few examples patterned after real code.

Handling pagination

Over 1,000 APIs from more than 150 AWS services have pagination operations. The pagination best practice rule in CodeGuru covers all the pagination operations. In particular, the pagination rule checks if the Java application correctly fetches all the results of the pagination operation.

The response of a pagination operation in AWS SDK for Java 1.0 contains a token that has to be used to retrieve the next page of results. In the following code snippet, you make a call to listTables(), a DynamoDB ListTables operation, which can only return up to 100 table names per page. This code might not produce complete results because the operation returns paginated results instead of all results.

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        List<String> tables = dynamoDbClient.listTables().getTableNames();
        System.out.println(tables)
}

CodeGuru Reviewer detects the missing pagination in the code snippet and makes the following recommendation to add another call to check for additional results.

Screenshot of recommendations for introducing pagination checks

You can accept the recommendation and add the logic to get the next page of table names by checking if a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page. If such a token is present, it’s used in a subsequent request to fetch the next page of results. See the following code:

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        ListTablesRequest listTablesRequest = ListTablesRequest.builder().build();
        boolean done = false;
        while (!done) {
            ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
	    System.out.println(listTablesResponse.tableNames());
            if (listTablesResponse.lastEvaluatedTableName() == null) {
                done = true;
            }
            listTablesRequest = listTablesRequest.toBuilder()
                    .exclusiveStartTableName(listTablesResponse.lastEvaluatedTableName())
                    .build();
        }
}

Handling failures in batch operation calls

Batch operations are common with many AWS services that process bulk requests. Batch operations can succeed without throwing exceptions even if some items in the request fail. Therefore, a recommended practice is to explicitly check for any failures in the result of the batch APIs. Over 40 APIs from more than 20 AWS services have batch operations. The best practice rule in CodeGuru Reviewer covers all the batch operations. In the following code snippet, you make a call to sendMessageBatch, a batch operation from Amazon SQS, but it doesn’t handle any errors returned by that batch operation:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    sqsClient.sendMessageBatch(sqsEndPoint, batch);
}

CodeGuru Reviewer detects this issue and makes the following recommendation to check the return value for failures.

Screenshot of recommendations for batch operations

You can accept this recommendation and add logging for the complete list of messages that failed to send, in addition to throwing an SQSUpdateException. See the following code:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    SendMessageBatchResult result = sqsClient.sendMessageBatch(sqsEndPoint, batch);
    final List<BatchResultErrorEntry> failed = result.getFailed();
    if (!failed.isEmpty()) {
           final String failedMessage = failed.stream()
                         .map(batchResultErrorEntry -> 
                            String.format("…", batchResultErrorEntry.getId(), 
                            batchResultErrorEntry.getMessage()))
                         .collect(Collectors.joining(","));
           throw new SQSUpdateException("Error occurred while sending 
                                        messages to SQS::" + failedMessage);
    }
}

Exception handling best practices

Amazon S3 is one of the most popular AWS services with our customers. A frequent operation with this service is to upload a stream-based object through an Amazon S3 client. Stream-based uploads might encounter occasional network connectivity or timeout issues, and the best practice to address such a scenario is to properly handle the corresponding ResetException error. ResetException extends SdkClientException, which subsequently extends AmazonClientException. Consider the following code snippet, which lacks such exception handling:

private void uploadInputStreamToS3(String bucketName, 
                                   InputStream input, 
                                   String key, ObjectMetadata metadata) 
                         throws SdkClientException {
    final AmazonS3Client amazonS3Client;
    PutObjectRequest putObjectRequest =
          new PutObjectRequest(bucketName, key, input, metadata);
    amazonS3Client.putObject(putObjectRequest);
}

In this case, CodeGuru Reviewer correctly detects the missing handling of the ResetException error and suggests possible solutions.

Screenshot of recommendations for handling exceptions

This recommendation is rich in that it provides alternatives to suit different use cases. The most common handling uses File or FileInputStream objects, but in other cases explicit handling of mark and reset operations are necessary to reliably avoid a ResetException.

You can fix the code by explicitly setting a predefined read limit using the setReadLimit method of RequestClientOptions. Its default value is 128 KB. Setting the read limit value to one byte greater than the size of stream reliably avoids a ResetException.

For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset always work for 100,000 bytes or less. However, this might cause some streams to buffer that number of bytes into memory.

The fix reliably avoids ResetException when uploading an object of type InputStream to Amazon S3:

private void uploadInputStreamToS3(String bucketName, InputStream input, 
                                   String key, ObjectMetadata metadata) 
                             throws SdkClientException {
        final AmazonS3Client amazonS3Client;
        final Integer READ_LIMIT = 10000;
        PutObjectRequest putObjectRequest =
   			new PutObjectRequest(bucketName, key, input, metadata);  
        putObjectRequest.getRequestClientOptions().setReadLimit(READ_LIMIT);
        amazonS3Client.putObject(putObjectRequest);
}

Replacing custom polling with waiters

A common activity when you’re working with services that are eventually consistent (such as DynamoDB) or have a lead time for creating resources (such as Amazon EC2) is to wait for a resource to transition into a desired state. The AWS SDK provides the Waiters API, a convenient and efficient feature for waiting that abstracts out the polling logic into a simple API call. If you’re not aware of this feature, you might come up with a custom, and potentially inefficient polling logic to determine whether a particular resource had transitioned into a desired state.

The following code appears to be waiting for the status of EC2 instances to change to shutting-down or terminated inside a while (true) loop:

private boolean terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    long start = System.currentTimeMillis();
    while (true) {
        try {
            DescribeInstanceStatusResult describeInstanceStatusResult = 
                            ec2Client.describeInstanceStatus(new DescribeInstanceStatusRequest()
                            .withInstanceIds(instanceId).withIncludeAllInstances(true));
            List<InstanceStatus> instanceStatusList = 
                       describeInstanceStatusResult.getInstanceStatuses();
            long finish = System.currentTimeMillis();
            long timeElapsed = finish - start;
            if (timeElapsed > INSTANCE_TERMINATION_TIMEOUT) {
                break;
            }
            if (instanceStatusList.size() < 1) {
                Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
                continue;
            }
            currentState = instanceStatusList.get(0).getInstanceState().getName();
            if ("shutting-down".equals(currentState) || "terminated".equals(currentState)) {
                return true;
             } else {
                 Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
             }
        } catch (AmazonServiceException ex) {
            throw ex;
        }
        …
 }

CodeGuru Reviewer detects the polling scenario and recommends you use the waiters feature to help improve efficiency of such programs.

Screenshot of recommendations for introducing waiters feature

Based on the recommendation, the following code uses the waiters function that is available in the AWS SDK for Java. The polling logic is replaced with the waiters() function, which is then run with the call to waiters.run(…), which accepts custom provided parameters, including the request and optional custom polling strategy. The run function polls synchronously until it’s determined that the resource transitioned into the desired state or not. The SDK throws a WaiterTimedOutException if the resource doesn’t transition into the desired state even after a certain number of retries. The fixed code is more efficient, simple, and abstracts the polling logic to determine whether a particular resource had transitioned into a desired state into a simple API call:

public void terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    Waiter<DescribeInstancesRequest> waiter = ec2Client.waiters().instanceTerminated();
    ec2Client.terminateInstances(new TerminateInstancesRequest().withInstanceIds(instanceId));
    try {
        waiter.run(new WaiterParameters()
              .withRequest(new DescribeInstancesRequest()
              .withInstanceIds(instanceId))
              .withPollingStrategy(new PollingStrategy(new MaxAttemptsRetryStrategy(60), 
                    new FixedDelayStrategy(5))));
    } catch (WaiterTimedOutException e) {
        List<InstanceStatus> instanceStatusList = ec2Client.describeInstanceStatus(
               new DescribeInstanceStatusRequest()
                        .withInstanceIds(instanceId)
                        .withIncludeAllInstances(true))
                        .getInstanceStatuses();
        String state;
        if (instanceStatusList != null && instanceStatusList.size() > 0) {
            state = instanceStatusList.get(0).getInstanceState().getName();
        }
    }
}

Service-specific best practice recommendations

In addition to the SDK operation-specific recommendations in the AWS SDK for Java we discussed, there are various AWS service-specific best practice recommendations pertaining to service APIs for services such as Amazon S3, Amazon EC2, DynamoDB, and more, where CodeGuru Reviewer can help to improve Java applications that use AWS service clients. For example, CodeGuru can detect the following:

  • Resource leaks in Java applications that use high-level libraries, such as the Amazon S3 TransferManager
  • Deprecated methods in various AWS services
  • Missing null checks on the response of the GetItem API call in DynamoDB
  • Missing error handling in the output of the PutRecords API call in Kinesis
  • Anti-patterns such as binding the SNS subscribe or createTopic operation with Publish operation

Conclusion

This post introduced how to use CodeGuru Reviewer to improve the use of the AWS SDK in Java applications. CodeGuru is now available for you to try. For pricing information, see Amazon CodeGuru pricing.

Amazon Lex Introduces an Enhanced Console Experience and New V2 APIs

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/amazon-lex-enhanced-console-experience/

Today, the Amazon Lex team has released a new console experience that makes it easier to build, deploy, and manage conversational experiences. Along with the new console, we have also introduced new V2 APIs, including continuous streaming capability. These improvements allow you to reach new audiences, have more natural conversations, and develop and iterate faster.

The new Lex console and V2 APIs make it easier to build and manage bots focusing on three main benefits. First, you can add a new language to a bot at any time and manage all the languages through the lifecycle of design, test, and deployment as a single resource. The new console experience allows you to quickly move between different languages to compare and refine your conversations. I’ll demonstrate later how easy it was to add French to my English bot.

Second, V2 APIs simplify versioning. The new Lex console and V2 APIs provide a simple information architecture where the bot intents and slot types are scoped to a specific language. Versioning is performed at the bot level so that resources such as intents and slot types do not have to be versioned individually. All resources within the bot (language, intents, and slot types) are archived as part of the bot version creation. This new way of working makes it easier to manage bots.

Lastly, you have additional builder productivity tools and capabilities to give you more flexibility and control of your bot design process. You can now save partially completed work as you develop different bot elements as you script, test and tune your configuration. This provides you with more flexibility as you iterate through the bot development. For example, you can save a slot that refers to a deleted slot type. In addition to saving partially completed work, you can quickly navigate across the configuration without getting lost. The new Conversation flow capability allows you to maintain your orientation as you move across the different intents and slot types.

In addition to the enhanced console and APIs, we are providing a new streaming conversation API. Natural conversations are punctuated with pauses and interruptions. For example, a customer may ask to pause the conversation or hold the line while looking up the necessary information before answering a question to retrieve credit card details when providing bill payments. With streaming conversation APIs, you can pause a conversation and handle interruptions directly as you configure the bot. Overall, the design and implementation of the conversation is simplified and easy to manage. The bot builder can quickly enhance the conversational capability of virtual contact center agents or smart assistants.

Let’s create a new bot and explore how some of Lex’s new console and streaming API features provide an improved bot building experience.

Building a bot
I head over to the new V2 Lex console and click on Create bot to start things off.

I select that I want to Start with an example and select the MakeAppointment example.

Over the years, I have spoken at many conferences, so I now offer to review talks that other community members are producing. Since these speakers are often in different time zones, it can be complicated to organize the various appointments for the different types of reviews that I offer. So I have decided to build a bot to streamline the process. I give my bot the name TalkReview and provide a description. I also select Create a role with basic Amazon Lex permissions and use this as my runtime role.

I must add at least one language to my bot, so I start with English (GB). I also select the text-to-speech voice that I want to use should my bot require voice interaction rather than just text.

During the creation, there is a new button that allows me to Add another language. I click on this to add French (FR) to my bot. You can add languages during creation as I am doing here, or you can add additional languages later on as your bot becomes more popular and needs to work with new audiences.

I can now start defining intents for my bot, and I can begin the iterative process of building and testing my bot. I won’t go into all of the details of how to create a bot or show you all of the intents I added, as we have better tutorials that can show you that step-by-step, but I will point out a few new features that make this new enhanced console really compelling.

The new Conversation flow provides you with a visual flow of the conversation, and you can see how the sample utterances you provide and how your conversation might work in the real world. I love this feature because you can click on the various elements, and it will take you to where you can make changes. For example, I can click on the prompt What type of review would you like to schedule and I am taken to the place where I can edit this prompt.

The new console has a very well thought-out approach to versioning a bot. At anytime, on the Bot versions screen, I can click Create version, and it will take a snapshot of the state of the bot’s current configuration. I can then associate that with an alias. For example, in my application, I have an alias called Production. This Production alias is associated with Version 1. Still, at any time, I could switch it to use a different version or even roll back to a previous version if I discover problems.

The testing experience is now very streamlined. Once I have built the bot, I can click the test button on the bottom right hand of the screen and start speaking to the bot and testing the experience. You can also expand the Inspect window, which gives you details about the conversations state, and you can also explore the raw JSON inputs and outputs.

Things to know
Here are a couple of important things to keep in mind when you use the enhanced console

  • Integration with Amazon Connect – Currently, bots built in the new console cannot be integrated with Amazon Connect contact flows. We plan to provide this integration as part of the near-term roadmap. You can use the current console and existing APIs to create and integrate bots with Amazon Connect.
  • Pricing – You only pay for what you use. The charges remain the same for existing audio and text APIs, renamed as RecognizeUtterance and RecognizeText. For the new Streaming capabilities, please refer to the pricing detail here.
  • All existing APIs and bots will continue to be supported. The newly announced features are only available in the new console and V2 APIs.

Go Build
Lex enhanced console is available now, and you can start using it today. The enhanced experience and V2 APIs are available in all existing regions and support all current languages. So, please give this console a try and let us know what you think. To learn more, check out the documentation for the console and the streaming API.

Happy Building!
— Martin

Raspberry Pi LEGO sorter

Post Syndicated from Ashley Whittaker original https://www.raspberrypi.org/blog/raspberry-pi-lego-sorter/

Raspberry Pi is at the heart of this AI–powered, automated sorting machine that is capable of recognising and sorting any LEGO brick.

And its maker Daniel West believes it to be the first of its kind in the world!

Best ever

This mega-machine was two years in the making and is a LEGO creation itself, built from over 10,000 LEGO bricks.

A beast of 10,000 bricks

It can sort any LEGO brick you place in its input bucket into one of 18 output buckets, at the rate of one brick every two seconds.

While Daniel was inspired by previous LEGO sorters, his creation is a huge step up from them: it can recognise absolutely every LEGO brick ever created, even bricks it has never seen before. Hence the ‘universal’ in the name ‘universal LEGO sorting machine’.

Hardware

There we are, tucked away, just doing our job

Software

The artificial intelligence algorithm behind the LEGO sorting is a convolutional neural network, the go-to for image classification.

What makes Daniel’s project a ‘world first’ is that he trained his classifier using 3D model images of LEGO bricks, which is how the machine can classify absolutely any LEGO brick it’s faced with, even if it has never seen it in real life before.

We LOVE a thorough project video, and we love TWO of them even more

Daniel has made a whole extra video (above) explaining how the AI in this project works. He shouts out all the open source software he used to run the Raspberry Pi Camera Module and access 3D training images etc. at this point in the video.

LEGO brick separation

The vibration plate in action, feeding single parts into the scanner

Daniel needed the input bucket to carefully pick out a single LEGO brick from the mass he chucks in at once.

This is achieved with a primary and secondary belt slowly pushing parts onto a vibration plate. The vibration plate uses a super fast LEGO motor to shake the bricks around so they aren’t sitting on top of each other when they reach the scanner.

Scanning and sorting

A side view of the LEFO sorting machine showing a large white chute built from LEGO bricks
The underside of the beast

A Raspberry Pi Camera Module captures video of each brick, which Raspberry Pi 3 Model B+ then processes and wirelessly sends to a more powerful computer able to run the neural network that classifies the parts.

The classification decision is then sent back to the sorting machine so it can spit the brick, using a series of servo-controlled gates, into the right output bucket.

Extra-credit homework

A front view of the LEGO sorter with the sorting boxes visible underneath
In all its bricky beauty, with the 18 output buckets visible at the bottom

Daniel is such a boss maker that he wrote not one, but two further reading articles for those of you who want to deep-dive into this mega LEGO creation:

The post Raspberry Pi LEGO sorter appeared first on Raspberry Pi.