Tag Archives: machine learning

Hosting Hugging Face models on AWS Lambda for serverless inference

2021-06-29 Chris Munns

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/

This post written by Eddie Pick, AWS Senior Solutions Architect – Startups and Scott Perry, AWS Senior Specialist Solutions Architect – AI/ML

Hugging Face Transformers is a popular open-source project that provides pre-trained, natural language processing (NLP) models for a wide variety of use cases. Customers with minimal machine learning experience can use pre-trained models to enhance their applications quickly using NLP. This includes tasks such as text classification, language translation, summarization, and question answering – to name a few.

First introduced in 2017, the Transformer is a modern neural network architecture that has quickly become the most popular type of machine learning model applied to NLP tasks. It outperforms previous techniques based on convolutional neural networks (CNNs) or recurrent neural networks (RNNs). The Transformer also offers significant improvements in computational efficiency. Notably, Transformers are more conducive to parallel computation. This means that Transformer-based models can be trained more quickly, and on larger datasets than their predecessors.

The computational efficiency of Transformers provides the opportunity to experiment and improve on the original architecture. Over the past few years, the industry has seen the introduction of larger and more powerful Transformer models. For example, BERT was first published in 2018 and was able to get better benchmark scores on 11 natural language processing tasks using between 110M-340M neural network parameters. In 2019, the T5 model using 11B parameters achieved better results on benchmarks such as summarization, question answering, and text classification. More recently, the GPT-3 model was introduced in 2020 with 175B parameters and in 2021 the Switch Transformers are scaling to over 1T parameters.

One consequence of this trend toward larger and more powerful models is an increased barrier to entry. As the number of model parameters increases, as does the computational infrastructure that is necessary to train such a model. This is where the open-source Hugging Face Transformers project helps.

Hugging Face Transformers provides over 30 pretrained Transformer-based models available via a straightforward Python package. Additionally, there are over 10,000 community-developed models available for download from Hugging Face. This allows users to use modern Transformer models within their applications without requiring model training from scratch.

The Hugging Face Transformers project directly addresses challenges associated with training modern Transformer-based models. Many customers want a zero administration ML inference solution that allows Hugging Face Transformers models to be hosted in AWS easily. This post introduces a low touch, cost effective, and scalable mechanism for hosting Hugging Face models for real-time inference using AWS Lambda.

Overview

Our solution consists of an AWS Cloud Development Kit (AWS CDK) script that automatically provisions container image-based Lambda functions that perform ML inference using pre-trained Hugging Face models. This solution also includes Amazon Elastic File System (EFS) storage that is attached to the Lambda functions to cache the pre-trained models and reduce inference latency. Solution architecture

In this architectural diagram:

Serverless inference is achieved by using Lambda functions that are based on container image
The container image is stored in an Amazon Elastic Container Registry (ECR) repository within your account
Pre-trained models are automatically downloaded from Hugging Face the first time the function is invoked
Pre-trained models are cached within Amazon Elastic File System storage in order to improve inference latency

The solution includes Python scripts for two common NLP use cases:

Sentiment analysis: Identifying if a sentence indicates positive or negative sentiment. It uses a fine-tuned model on sst2, which is a GLUE task.
Summarization: Summarizing a body of text into a shorter, representative text. It uses a Bart model that was fine-tuned on the CNN / Daily Mail dataset.

For simplicity, both of these use cases are implemented using Hugging Face pipelines.

Prerequisites

The following is required to run this example:

git
AWS CDK
Python 3.6+
A virtual env (optional)

Deploying the example application

Clone the project to your development environment:

git clone https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face.git

Install the required dependencies:
```
pip install -r requirements.txt
```
Bootstrap the CDK. This command provisions the initial resources needed by the CDK to perform deployments:
```
cdk bootstrap
```
This command deploys the CDK application to its environment. During the deployment, the toolkit outputs progress indications:
```
$ cdk deploy
```

Testing the application

After deployment, navigate to the AWS Management Console to find and test the Lambda functions. There is one for sentiment analysis and one for summarization.

To test:

Enter “Lambda” in the search bar of the AWS Management Console:
Filter the functions by entering “ServerlessHuggingFace”:
Select the ServerlessHuggingFaceStack-sentimentXXXXX function:
In the Test event, enter the following snippet and then choose Test:

{
   "text": "I'm so happy I could cry!"
}

The first invocation takes approximately one minute to complete. The initial Lambda function environment must be allocated and the pre-trained model must be downloaded from Hugging Face. Subsequent invocations are faster, as the Lambda function is already prepared and the pre-trained model is cached in EFS. Function test results

The JSON response shows the result of the sentiment analysis:

{
  "statusCode": 200,
  "body": {
    "label": "POSITIVE",
    "score": 0.9997532367706299
  }
}

Understanding the code structure

The code is organized using the following structure:

├── inference
│ ├── Dockerfile
│ ├── sentiment.py
│ └── summarization.py
├── app.py
└── ...

The inference directory contains:

The Dockerfile used to build a custom image to be able to run PyTorch Hugging Face inference using Lambda functions
The Python scripts that perform the actual ML inference

The sentiment.py script shows how to use a Hugging Face Transformers model:

import json
from transformers import pipeline

nlp = pipeline("sentiment-analysis")

def handler(event, context):
    response = {
        "statusCode": 200,
        "body": nlp(event['text'])[0]
    }
    return response

For each Python script in the inference directory, the CDK generates a Lambda function backed by a container image and a Python inference script.

CDK script

The CDK script is named app.py in the solution’s repository. The beginning of the script creates a virtual private cloud (VPC).

vpc = ec2.Vpc(self, 'Vpc', max_azs=2)

Next, it creates the EFS file system and an access point in EFS for the cached models:

        fs = efs.FileSystem(self, 'FileSystem',
                            vpc=vpc,
                            removal_policy=cdk.RemovalPolicy.DESTROY)
        access_point = fs.add_access_point('MLAccessPoint',
                                           create_acl=efs.Acl(
                                               owner_gid='1001', owner_uid='1001', permissions='750'),
                                           path="/export/models",
                                           posix_user=efs.PosixUser(gid="1001", uid="1001"))>

It iterates through the Python files in the inference directory:

docker_folder = os.path.dirname(os.path.realpath(__file__)) + "/inference"
pathlist = Path(docker_folder).rglob('*.py')
for path in pathlist:

And then creates the Lambda function that serves the inference requests:

            base = os.path.basename(path)
            filename = os.path.splitext(base)[0]
            # Lambda Function from docker image
            function = lambda_.DockerImageFunction(
                self, filename,
                code=lambda_.DockerImageCode.from_image_asset(docker_folder,
                                                              cmd=[
                                                                  filename+".handler"]
                                                              ),
                memory_size=8096,
                timeout=cdk.Duration.seconds(600),
                vpc=vpc,
                filesystem=lambda_.FileSystem.from_efs_access_point(
                    access_point, '/mnt/hf_models_cache'),
                environment={
                    "TRANSFORMERS_CACHE": "/mnt/hf_models_cache"},
            )

Adding a translator

Optionally, you can add more models by adding Python scripts in the inference directory. For example, add the following code in a file called translate-en2fr.py:

import json
from transformers 
import pipeline

en_fr_translator = pipeline('translation_en_to_fr')

def handler(event, context):
    response = {
        "statusCode": 200,
        "body": en_fr_translator(event['text'])[0]
    }
    return response

Then run:

$ cdk synth
$ cdk deploy

This creates a new endpoint to perform English to French translation.

Cleaning up

After you are finished experimenting with this project, run “cdk destroy” to remove all of the associated infrastructure.

Conclusion

This post shows how to perform ML inference for pre-trained Hugging Face models by using Lambda functions. To avoid repeatedly downloading the pre-trained models, this solution uses an EFS-based approach to model caching. This helps to achieve low-latency, near real-time inference. The solution is provided as infrastructure as code using Python and the AWS CDK.

We hope this blog post allows you to prototype quickly and include modern NLP techniques in your own products.

The Future of Machine Learning and Cybersecurity

2021-06-21 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/the-future-of-machine-learning-and-cybersecurity.html

The Center for Security and Emerging Technology has a new report: “Machine Learning and Cybersecurity: Hype and Reality.” Here’s the bottom line:

The report offers four conclusions:

Machine learning can help defenders more accurately detect and triage potential attacks. However, in many cases these technologies are elaborations on long-standing methods — not fundamentally new approaches — that bring new attack surfaces of their own.
A wide range of specific tasks could be fully or partially automated with the use of machine learning, including some forms of vulnerability discovery, deception, and attack disruption. But many of the most transformative of these possibilities still require significant machine learning breakthroughs.
Overall, we anticipate that machine learning will provide incremental advances to cyber defenders, but it is unlikely to fundamentally transform the industry barring additional breakthroughs. Some of the most transformative impacts may come from making previously un- or under-utilized defensive strategies available to more organizations.
Although machine learning will be neither predominantly offense-biased nor defense-biased, it may subtly alter the threat landscape by making certain types of strategies more appealing to attackers or defenders.

Building a Hyper Self-Service, Distributed Tracing and Feedback System for Rule & Machine Learning (ML) Predictions

2021-05-24 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/building-hyper-self-service-distributed-tracing-feedback-system

Introduction

In Grab, the Trust, Identity, Safety, and Security (TISS) is a team of software engineers and AI developers working on fraud detection, login identity check, safety issues, etc. There are many TISS services, like grab-fraud, grab-safety, and grab-id. They make billions of business decisions daily using the Griffin rule engine, which determines if a passenger can book a trip, get a food promotion, or if a driver gets a delivery booking.

There is a natural demand to log down all these important business decisions, store them and query them interactively or in batches. Data analysts and scientists need to use the data to train their machine learning models. RiskOps and customer service teams can query the historical data and help consumers.

That’s where Archivist comes in; it is a new tracing, statistics and feedback system for rule and machine learning-based predictions. It is reliable and performant. Its innovative data schema is flexible for storing events from different business scenarios. Finally, it provides a user-friendly UI, which has access control for classified data.

Here are the impacts Archivist has already made:

Currently, there are 2 teams with a total of 5 services and about 50 business scenarios using Archivist. The scenarios include fraud prevention (e.g. DriverBan, PassengerBan), payment checks (e.g. PayoutBlockCheck, PromoCheck), and identity check events like PinTrigger.
It takes only a few minutes to onboard a new business scenario (event type), by using the configuration page on the user portal. Previously, it took at least 1 to 2 days.
Each day, Archivist logs down 80 million logs to the ElasticSearch cluster, which is about 200GB of data.
Each week, Customer Experience (CE)/Risk Ops goes to the user portal and checks Archivist logs for about 2,000 distinct customers. They can search based on numerous dimensions such as the Passenger/DriverID, phone number, request ID, booking code and payment fingerprint.

Background

Each day, TISS services make billions of business decisions (predictions), based on the Griffin rule engine and ML models.

After the predictions are made, there are still some tough questions for these services to answer.

If Risk Ops believes a prediction is false-positive, a consumer could be banned. If this happens, how can consumers or Risk Ops report or feedback this information to the new rule and ML model training quickly?
As CustomService/Data Scientists investigating any tickets opened due to TISS predictions/decisions, how do you know which rules and data were used? E.g. why the passenger triggered a selfie, or why a booking was blocked.
After Data Analysts/Data Scientists (DA/DS) launch a new rule/model, how can they track the performance in fine-granularity and in real-time? E.g. week-over-week rule performance in a country or city.
How can DA/DS access all prediction data for data analysis or model training?
How can the system keep up with Grab’s business launch speed, with maximum self-service?

Problem

To answer the questions above, TISS services previously used company-wide Kibana to log predictions. For example, a log looks like: PassengerID:123,Scenario:PinTrigger,Decision:Trigger,.... This logging method had some obvious issues:

Logs in plain text don’t have any structure and are not friendly to ML model training as most ML models need processed data to make accurate predictions.
Furthermore, there is no fine-granularity access control for developers in Kibana.
Developers, DA and DS have no access control while CEs have no access at all. So CE cannot easily see the data and DA/DS cannot easily process the data.

To address all the Kibana log issues, we developed ActionTrace, a code library with a well-structured data schema. The logs, also called documents, are stored in a dedicated ElasticSearch cluster with access control implemented. However, after using it for a while, we found that it still needed some improvements.

Each business scenario involves different types of entities and ActionTrace is not fully self-service. This means that a lot of development work was needed to support fast-launching business scenarios. Here are some examples:
- The main entities in the taxi business are Driver and Passenger,
- The main entities in the food business can be Merchant, Driver and Consumer.
All these entities will need to be manually added into the ActionTrace data schema.
Each business scenario may have their own custom information logged. Because there is no overlap, each of them will correspond to a new field in the data schema. For example:
- For any scenario involving payment, a valid payment method and expiration date is logged.
- For the taxi business, the geohash is logged.
To store the log data from ActionTrace, different teams need to set up and manage their own ElasticSearch clusters. This increases hardware and maintenance costs.
There was a simple Web UI created for viewing logs from ActionTrace, but there was still no access control in fine granularity.

Solution

We developed Archivist, a new tracing, statistics, and feedback system for ML/rule-based prediction events. It’s centralised, performant and flexible. It answers all the issues mentioned above, and it is an improvement over all the existing solutions we have mentioned previously.

The key improvements are:

User-defined entities and custom fields
- There are no predefined entity types. Users can define up to 5 entity types (E.g. PassengerId, DriverId, PhoneNumber, PaymentMethodId, etc.).
- Similarly, there are a limited number of custom data fields to use, in addition to the common data fields shared by all business scenarios.
A dedicated service shared by all other services
- Each service writes its prediction events to a Kafka stream. Archivist then reads the stream and writes to the ElasticSearch cluster.
- The data writes are buffered, so it is easy to handle traffic surges in peak time.
- Different services share the same Elastic Cloud Enterprise (ECE) cluster, but they create their own daily file indices so the costs can be split fairly.
Better support for data mining, prediction stats and feedback
- Kafka stream data are simultaneously written to AWS S3. DA/DS can use the PrestoDB SQL query engine to mine the data.
- There is an internal web portal for viewing Archivist logs. Customer service teams and Ops can use no-risk data to address CE tickets, while DA, DS and developers can view high-risk data for code/rule debugging.
A reduction of development days to support new business launches
- Previously, it took a week to modify and deploy the ActionTrace data schema. Now, it only takes several minutes to configure event schemas in the user portal.
Saves time in RiskOps/CE investigations
- With the new web UI which has access control in place, the different roles in the company, like Customer service and Data analysts, can access the Archivist events with different levels of permissions.
- It takes only a few clicks for them to find the relevant events that impact the drivers/passengers.

Architecture Details

Archivist’s system architecture is shown in the diagram below.

Different services (like fraud-detection, safety-allocation, etc.) use a simple SDK to write data to a Kafka stream (the left side of the diagram).
In the centre of Archivist is an event processor. It reads data from Kafka, and writes them to ElasticSearch (ES).
The Kafka stream writes to the Amazon S3 data lake, so DA/DS can use the Presto SQL query engine to query them.
The user portal (bottom right) can be used to view the Archivist log and update configurations. It also sends all the web requests to the API Handler in the centre.

The following diagram shows how internal and external users use Archivist as well as the interaction between the Griffin rule engine and Archivist.

Flexible Event Schema

In Archivist, a prediction/decision is called an event. The event schema can be divided into 3 main parts conceptually.

Data partitioning: Fields like service_name and event_type categorise data by services and business scenarios.

Field name Type Example Notes

service_name string GrabFraud Name of the Service

event_type string PreRide PaxBan/SafeAllocation

Field name	Type	Example	Notes
service_name	string	GrabFraud	Name of the Service
event_type	string	PreRide	PaxBan/SafeAllocation

Business decision making: request_id, decisions, reasons, event_content are used to record the business decision, the reason and the context (E.g. The input features of machine learning algorithms).

Field name	Type	Example	Notes
request_id	string	a16756e8-efe2-472b-b614-ec6ae08a5912	a 32-digit id for web requests
event_content	string		Event context
decisions	[string]	[“NotAllowBook”, “SMS”]	A list
reasons	string		json payload string of the response from engine.

Customisation: Archivist provides user-defined entities and custom fields that we feel are sufficient and flexible for handling different business scenarios.

Field name	Type	Example	Notes
entity_type_1	string	Passenger
entity_id_1	string	12151
entity_type_2	string	Driver
entity_id_2	string	341521-rdxf36767
…	string
entity_id_5	string
custom_field_type_1	string	“MessageToUser”
custom_field_1	string	“please contact Ops”	User defined fields
custom_field_type_2		“Prediction rule:”
custom_field_2	string	“ML rule: 123, version:2”
…	string
custom_field_6	string

A User Portal to Support Querying, Prediction Stats and Feedback

DA, DS, Ops and CE can access the internal user portal to see the prediction events, individually and on an aggregated city level.

*A snapshot of the Archivist logs showing the aggregation of the data in each city*

There are graphs on the portal, showing the rule/model performance on individual customers over a period of time.

*Rule performance on a customer over a period of time*

How to Use Archivist for Your Service

If you want to get onboard Archivist, the coding effort is minimal. Here is an example of a code snippet to log an event:

Lessons

During the implementation of Archivist, we learnt some things:

A good system needs to support multi-tenants from the beginning. Originally, we thought we could use just one Kafka stream, and put all the documents from different teams into one ElasticSearch (ES) index. But after one team insisted on keeping their data separately from others, we created more Kafka streams and ES indexes. We realised that this way, it’s easier for us to manage data and share the cost fairly.
Shortly after we launched Archivist, there was an incident where the ES data writes were choked. Because each document write is a goroutine, the number of goroutines increased to 400k and the memory usage reached 100% within minutes. We added a patch (2 lines of code) to limit the maximum number of goroutines in our system. Since then, we haven’t had any more severe incidents in Archivist.

Join Us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How We Improved Agent Chat Efficiency with Machine Learning

2021-04-19 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/how-we-improved-agent-chat-efficiency-with-ml

In previous articles (see Grab’s in-house chat platform, workforce routing), we shared how chat has grown to become one of the primary channels for support in the last few years.

With continuous chat growth and a new in-house tool, helping our agents be more efficient and productive was key to ensure a faster support time for our users and scale chat even further.

Starting from the analysis on the usage of another third-party tool as well as some shadowing sessions, we realised that building a templated-based feature wouldn’t help. We needed to offer personalisation capabilities, as our consumer support specialists care about their writing style and tone, and using templates often feels robotic.

We decided to build a machine learning model, called SmartChat, which offers contextual suggestions by leveraging several sources of internal data, helping our chat specialists type much faster, and hence serving more consumers.

In this article, we are going to explain the process from problem discovery to design iterations, and share how the model was implemented from both a data science and software engineering perspective.

How SmartChat Works

Diving Deeper into the Problem

Agent productivity became a key part in the process of scaling chat as a channel for support.

After splitting chat time into all its components, we noted that agent typing time represented a big portion of the chat support journey, making it the perfect problem to tackle next.

After some analysis on the usage of the third-party chat tool, we found out that even with functionalities such as canned messages, 85% of the messages were still free typed.

Hours of shadowing sessions also confirmed that the consumer support specialists liked to add their own flair. They would often use the template and adjust it to their style, which took more time than just writing it on the spot. With this in mind, it was obvious that templates wouldn’t be too helpful, unless they provided some degree of personalisation.

We needed something that reduces typing time and also:

Allows some degree of personalisation, so that answers don’t seem robotic and repeated.
Works with multiple languages and nuances, considering Grab operates in 8 markets, even some of the English markets have some slight differences in commonly used words.
It’s contextual to the problem and takes into account the user type, issue reported, and even the time of the day.
Ideally doesn’t require any maintenance effort, such as having to keep templates updated whenever there’s a change in policies.

Considering the constraints, this seemed to be the perfect candidate for a machine learning-based functionality, which predicts sentence completion by considering all the context about the user, issue and even the latest messages exchanged.

Usability is Key

To fulfil the hypothesis, there are a few design considerations:

Minimising the learning curve for agents.
Avoiding visual clutter if recommendations are not relevant.

To increase the probability of predicting an agent’s message, one of the design explorations is to allow agents to select the top 3 predictions (Design 1). To onboard agents, we designed a quick tip to activate SmartChat using keyboard shortcuts.

By displaying the top 3 recommendations, we learnt that it slowed agents down as they started to read all options even if the recommendations were not helpful. Besides, by triggering this component upon every recommendable text, it became a distraction as they were forced to pause.

In our next design iteration, we decided to leverage and reuse the interaction of SmartChat from a familiar platform that agents are using – Gmail’s Smart Compose. As agents are familiar with Gmail, the learning curve for this feature would be less steep. For first time users, agents will see a “Press tab” tooltip, which will activate the text recommendation. The tooltip will disappear after 5 times of use.

To relearn the shortcut, agents can hover over the recommended text.

How We Track Progress

Knowing that this feature would come in multiple iterations, we had to find ways to track how well we were doing progressively, so we decided to measure the different components of chat time.

We realised that the agent typing time is affected by:

Percentage of characters saved. This tells us that the model predicted correctly, and also saved time. This metric should increase as the model improves.
Model’s effectiveness. The agent writes the least number of characters possible before getting the right suggestion, which should decrease as the model learns.
Acceptance rate. This tells us how many messages were written with the help of the model. It is a good proxy for feature usage and model capabilities.
Latency. If the suggestion is not shown in about 100-200ms, the agent would not notice the text and keep typing.

Architecture

The architecture involves support specialists initiating the fetch suggestion request, which is sent for evaluation to the machine learning model through API gateway. This ensures that only authenticated requests are allowed to go through and also ensures that we have proper rate limiting applied.

We have an internal platform called Catwalk, which is a microservice that offers the capability to execute machine learning models as a HTTP service. We used the Presto query engine to calculate and analyse the results from the experiment.

Designing the Machine Learning Model

I am sure all of us can remember an experiment we did in school when we had to catch a falling ruler. For those who have not done this experiment, feel free to try it at home! The purpose of this experiment is to define a ballpark number for typical human reaction time (equations also included in the video link).

Typically, the human reaction time ranges from 100ms to 300ms, with a median of about 250ms (read more here). Hence, we decided to set the upper bound for SmartChat response time to be 200ms while deciding the approach. Otherwise, the experience would be affected as the agents would notice a delay in the suggestions. To achieve this, we had to manage the model’s complexity and ensure that it achieves the optimal time performance.

Taking into consideration network latencies, the machine learning model would need to churn out predictions in less than 100ms, in order for the entire product to achieve a maximum 200ms refresh rate.

As such, a few key components were considered:

Model Tokenisation
- Model input/output tokenisation needs to be implemented along with the model’s core logic so that it is done in one network request.
- Model tokenisation needs to be lightweight and cheap to compute.
Model Architecture
- This is a typical sequence-to-sequence (seq2seq) task so the model needs to be complex enough to account for the auto-regressive nature of seq2seq tasks.
- We could not use pure attention-based models, which are usually state of the art for seq2seq tasks, as they are bulky and computationally expensive.
Model Service
- The model serving platform should be executed on a low-level, highly performant framework.

Our proposed solution considers the points listed above. We have chosen to develop in Tensorflow (TF), which is a well-supported framework for machine learning models and application building.

For Latin-based languages, we used a simple whitespace tokenizer, which is serialisable in the TF graph using the tensorflow-text package.

import tensorflow_text as text

tokenizer = text.WhitespaceTokenizer()

For the model architecture, we considered a few options but eventually settled for a simple recurrent neural network architecture (RNN), in an Encoder-Decoder structure:

Encoder
- Whitespace tokenisation
- Single layered Bi-Directional RNN
- Gated-Recurrent Unit (GRU) Cell
Decoder
- Single layered Uni-Directional RNN
- Gated-Recurrent Unit (GRU) Cell
Optimisation
- Teacher-forcing in training, Greedy decoding in production
- Trained with a cross-entropy loss function
- Using ADAM (Kingma and Ba) optimiser

Features

To provide context for the sentence completion tasks, we provided the following features as model inputs:

Past conversations between the chat agent and the user
Time of the day
User type (Driver-partners, Consumers, etc.)
Entrypoint into the chat (e.g. an article on cancelling a food order)

These features give the model the ability to generalise beyond a simple language model, with additional context on the nature of contact for support. Such experiences also provide a better user experience and a more customised user experience.

For example, the model is better aware of the nature of time in addressing “Good {Morning/Afternoon/Evening}” given the time of the day input, as well as being able to interpret meal times in the case of food orders. E.g. “We have contacted the driver, your {breakfast/lunch/dinner} will be arriving shortly”.

Typeahead Solution for the User Interface

With our goal to provide a seamless experience in showing suggestions to accepting them, we decided to implement a typeahead solution in the chat input area. This solution had to be implemented with the ReactJS library, as the internal web-app used by our support specialist for handling chats is built in React.

There were a few ways to achieve this:

Modify the Document Object Model (DOM) using Javascript to show suggestions by positioning them over the input HTML tag based on the cursor position.
Use a content editable div and have the suggestion span render conditionally.

After evaluating the complexity in both approaches, the second solution seemed to be the better choice, as it is more aligned with the React way of doing things: avoid DOM manipulations as much as possible.

However, when a suggestion is accepted we would still need to update the content editable div through DOM manipulation. It cannot be added to React’s state as it creates a laggy experience for the user to visualise what they type.

Here is a code snippet for the implementation:

import React, { Component } from 'react';
import liveChatInstance from './live-chat';

export class ChatInput extends Component {
 constructor(props) {
   super(props);
   this.state = {
     suggestion: '',
   };
 }

 getCurrentInput = () => {
   const { roomID } = this.props;
   const inputDiv = document.getElementById(`input_content_${roomID}`);
   const suggestionSpan = document.getElementById(
     `suggestion_content_${roomID}`,
   );

   // put the check for extra safety in case suggestion span is accidentally cleared
   if (suggestionSpan) {
     const range = document.createRange();
     range.setStart(inputDiv, 0);
     range.setEndBefore(suggestionSpan);
     return range.toString(); // content before suggestion span in input div
   }
   return inputDiv.textContent;
 };

 handleKeyDown = async e => {
   const { roomID } = this.props;
   // tab or right arrow for accepting suggestion
   if (this.state.suggestion && (e.keyCode === 9 || e.keyCode === 39)) {
     e.preventDefault();
     e.stopPropagation();
     this.insertContent(this.state.suggestion);
     this.setState({ suggestion: '' });
   }
   const parsedValue = this.getCurrentInput();
   // space
   if (e.keyCode === 32 && !this.state.suggestion && parsedValue) {
     // fetch suggestion
     const prediction = await liveChatInstance.getSmartComposePrediction(
       parsedValue.trim(), roomID);
     this.setState({ suggestion: prediction })
   }
 };

 insertContent = content => {
   // insert content behind cursor
   const { roomID } = this.props;
   const inputDiv = document.getElementById(`input_content_${roomID}`);
   if (inputDiv) {
     inputDiv.focus();
     const sel = window.getSelection();
     const range = sel.getRangeAt(0);
     if (sel.getRangeAt && sel.rangeCount) {
       range.insertNode(document.createTextNode(content));
       range.collapse();
     }
   }
 };

 render() {
   const { roomID } = this.props;
   return (
     <div className="message_wrapper">
       <div
         id={`input_content_${roomID}`}
         role={'textbox'}
         contentEditable
         spellCheck
         onKeyDown={this.handleKeyDown}
       >
         {!!this.state.suggestion.length && (
           <span
             contentEditable={false}
             id={`suggestion_content_${roomID}`}
           >
             {this.state.suggestion}
           </span>
         )}
       </div>
     </div>
   );
 }
}

The solution uses the spacebar as the trigger for fetching the suggestion from the ML model and stores them in a React state. The ML model prediction is then rendered in a dynamically rendered span.

We used the window.getSelection() and range APIs to:

Find the current input value
Insert the suggestion
Clear the input to type a new message

The implementation has also considered the following:

Caching. API calls are made on every space character to fetch the prediction. To reduce the number of API calls, we also cached the prediction until it differs from the user input.
Recover placeholder. There are data fields that are specific to the agent and consumer, such as agent name and user phone number, and these data fields are replaced by placeholders for model training. The implementation recovers the placeholders in the prediction before showing it on the UI.
Control rollout. Since rollout is by percentage per country, the implementation has to ensure that only certain users can access predictions from their country chat model.
Aggregate and send metrics. Metrics are gathered and sent for each chat message.

Results

The initial experiment results suggested that we managed to save 20% of characters, which improved the efficiency of our agents by 12% as they were able to resolve the queries faster. These numbers exceeded our expectations and as a result, we decided to move forward by rolling SmartChat out regionally.

What’s Next?

In the upcoming iteration, we are going to focus on non-Latin language support, caching, and continuous training.

Non-Latin Language Support and Caching

The current model only works with Latin languages, where sentences consist of space-separated words. We are looking to provide support for non-Latin languages such as Thai and Vietnamese. The result would also be cached in the frontend to reduce the number of API calls, providing the prediction faster for the agents.

Continuous Training

The current machine learning model is built with training data derived from historical chat data. In order to teach the model and improve the metrics mentioned in our goals, we will enhance the model by letting it learn from data gathered in day-to-day chat conversations. Along with this, we are going to train the model to give better responses by providing more context about the conversations.

Seeing how effective this solution has been for our chat agents, we would also like to expose this to the end consumers to help them express their concerns faster and improve their overall chat experience.

Special thanks to Kok Keong Matthew Yeow, who helped to build the architecture and implementation in a scalable way.
—-

Join Us

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Machine learning and depth estimation using Raspberry Pi

2021-02-11 David Plowman

Post Syndicated from David Plowman original https://www.raspberrypi.org/blog/machine-learning-and-depth-estimation-using-raspberry-pi/

One of our engineers, David Plowman, describes machine learning and shares news of a Raspberry Pi depth estimation challenge run by ETH Zürich (Swiss Federal Institute of Technology).

Spoiler alert – it’s all happening virtually, so you can definitely make the trip and attend, or maybe even enter yourself.

What is Machine Learning?

Machine Learning (ML) and Artificial Intelligence (AI) are some of the top engineering-related buzzwords of the moment, and foremost among current ML paradigms is probably the Artificial Neural Network (ANN).

They involve millions of tiny calculations, merged together in a giant biologically inspired network – hence the name. These networks typically have millions of parameters that control each calculation, and they must be optimised for every different task at hand.

This process of optimising the parameters so that a given set of inputs correctly produces a known set of outputs is known as training, and is what gives rise to the sense that the network is “learning”.

A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel

Machine Learning frameworks

A number of well known companies produce free ML frameworks that you can download and use on your own computer. The network training procedure runs best on machines with powerful CPUs and GPUs, but even using one of these pre-trained networks (known as inference) can be quite expensive.

One of the most popular frameworks is Google’s TensorFlow (TF), and since this is rather resource intensive, they also produce a cut-down version optimised for less powerful platforms. This is TensorFlow Lite (TFLite), which can be run effectively on Raspberry Pi.

Depth estimation

ANNs have proven very adept at a wide variety of image processing tasks, most notably object classification and detection, but also depth estimation. This is the process of taking one or more images and working out how far away every part of the scene is from the camera, producing a depth map.

Here’s an example:

The image on the right shows, by the brightness of each pixel, how far away the objects in the original (left-hand) image are from the camera (darker = nearer).

We distinguish between stereo depth estimation, which starts with a stereo pair of images (taken from marginally different viewpoints; here, parallax can be used to inform the algorithm), and monocular depth estimation, working from just a single image.

The applications of such techniques should be clear, ranging from robots that need to understand and navigate their environments, to the fake bokeh effects beloved of many modern smartphone cameras.

Depth Estimation Challenge

We were very interested then to learn that, as part of the CVPR (Computer Vision and Pattern Recognition) 2021 conference, Andrey Ignatov and Radu Timofte of ETH Zürich were planning to run a Monocular Depth Estimation Challenge. They are specifically targeting the Raspberry Pi 4 platform running TFLite, and we are delighted to support this effort.

For more information, or indeed if any technically minded readers are interested in entering the challenge, please visit:

The conference and workshops are all taking place virtually in June, and we’ll be sure to update our blog with some of the results and models produced for Raspberry Pi 4 by the competing teams. We wish them all good luck!

The post Machine learning and depth estimation using Raspberry Pi appeared first on Raspberry Pi.

Resource leak detection in Amazon CodeGuru Reviewer

2021-01-14 Pranav Garg

Post Syndicated from Pranav Garg original https://aws.amazon.com/blogs/devops/resource-leak-detection-in-amazon-codeguru/

This post discusses the resource leak detector for Java in Amazon CodeGuru Reviewer. CodeGuru Reviewer automatically analyzes pull requests (created in supported repositories such as AWS CodeCommit, GitHub, GitHub Enterprise, and Bitbucket) and generates recommendations for improving code quality. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. This blog does not describe the resource leak detector for Python programs that is now available in preview.

What are resource leaks?

Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.

Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.

Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).

A resource must be closed along all execution paths to prevent resource leaks

Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.

Detecting resource leaks in CodeGuru Reviewer

For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.

 1     private Connection getConnection(final BasicDataSource dataSource, ...)
               throws ValidateConnectionException, SQLException {
 2         boolean connectionAcquired = false;
 3         // Retrying three times to get the connection.
 4         for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
 5             Connection connection = dataSource.getConnection();
 6             // validateConnection may throw ValidateConnectionException
 7             if (! validateConnection(connection, ...)) {
 8                 // connection is invalid
 9                 DbUtils.closeQuietly(connection);
10             } else {
11                 // connection is established
12                 connectionAcquired = true;
13                 return connection;
14             }
15         }
16         return null;
17     }

At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.

However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.

A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:

“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”

As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:

private Connection getConnection(final BasicDataSource dataSource, ...) 
        throws ValidateConnectionException, SQLException {
    boolean connectionAcquired = false;
    // Retrying three times to get the connection.
    for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
        Connection connection = dataSource.getConnection();
        try {
            // validateConnection may throw ValidateConnectionException
            if (! validateConnection(connection, ...)) {
                // connection is invalid
                DbUtils.closeQuietly(connection);
            } else {
                // connection is established
                connectionAcquired = true;
                return connection;
            }
        } finally {
            if (!connectionAcquired) {
                DBUtils.closeQuietly(connection);
            }
        }
    }
    return null;
}

As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.

Combining static code analysis with machine learning for accurate resource leak detection

In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.

The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.

To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:

A connection is established on line 5
Method validateConnection() returns false at line 7
DbUtils.closeQuietly() is called on line 9

This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.

When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.

The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.

Responses to the resource leak findings

Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.

The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.

Conclusion

Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.

Automating Recommendation Engine Training with Amazon Personalize and AWS Glue

2021-01-13 Alexander Spivak

Post Syndicated from Alexander Spivak original https://aws.amazon.com/blogs/architecture/automating-recommendation-engine-training-with-amazon-personalize-and-aws-glue/

Customers from startups to enterprises observe increased revenue when personalizing customer interactions. Still, many companies are not yet leveraging the power of personalization, or, are relying solely on rule-based strategies. Those strategies are effort-intensive to maintain and not effective. Common reasons for not launching machine learning (ML) based personalization projects include: the complexity of aggregating and preparing the datasets, gaps in data science expertise and the lack of trust regarding the quality of ML recommendations.

This blog post demonstrates an approach for product recommendations to mitigate those concerns using historical datasets. To get started with your personalization journey, you don’t need ML expertise or a data lake. The following serverless end-to-end architecture involves aggregating and transforming the required data, as well as automatically training an ML-based recommendation engine.

I will outline the architectural production-ready setup for personalized product recommendations based on historical datasets. This is of interest to data analysts who look for ways to bring an existing recommendation engine to production, as well as solutions architects new to ML.

Solution Overview

The two core elements to create a proof-of-concept for ML-based product recommendations are:

the recommendation engine and,
the data set to train the recommendation engine.

Let’s start with the recommendation engine first, and work backwards to the corresponding data needs.

Product recommendation engine

To create the product recommendation engine, we use Amazon Personalize. Amazon Personalize differentiates three types of input data:

user events called interactions (user events like views, signups or likes),
item metadata (description of your items: category, genre or availability), and
user metadata (age, gender, or loyalty membership).

An interactions dataset is typically the minimal requirement to build a recommendation system. Providing user and item metadata datasets improves recommendation accuracy, and enables cold starts, item discovery and dynamic recommendation filtering.

Most companies already have existing historical datasets to populate all three input types. In the case of retail companies, the product order history is a good fit for interactions. In the case of the media and entertainment industry, the customer’s consumption history maps to the interaction dataset. The product and media catalogs map to the items dataset and the customer profiles to the user dataset.

Amazon Personalize: from datasets to a recommendation API

The Amazon Personalize Deep Dive Series provides a great introduction into the service and explores the topics of training, inference and operations. There are also multiple blog posts available explaining how to create a recommendation engine with Amazon Personalize and how to select the right metadata for the engine training. Additionally, the Amazon Personalize samples repository in GitHub showcases a variety of topics: from getting started with Amazon Personalize, up to performing a POC in a Box using existing datasets, and, finally, automating the recommendation engine with MLOps. In this post, we focus on getting the data from the historical data sources into the structure required by Amazon Personalize.

Creating the dataset

While manual data exports are a quick way to get started with one-time datasets for experiments, we use AWS Glue to automate this process. The automated approach with AWS Glue speeds up the proof of concept (POC) phase and simplifies the process to production by:

easily reproducing dataset exports from various data sources. This are used to iterate with other feature sets for recommendation engine training.
adding additional data sources and using those to enrich existing datasets
efficiently performing transformation logic like column renaming and fuzzy matching out of the box with code generation support.

AWS Glue is a serverless data integration service that is scalable and simple to use. It provides all of the capabilities needed for data integration and supports a wide variety of data sources: Amazon S3 buckets, JDBC connectors, MongoDB databases, Kafka, and Amazon Redshift, the AWS data warehouse. You can even make use of data sources living outside of your AWS environment, e.g. on-premises data centers or other services outside of your VPC. This enables you to perform a data-driven POC even when the data is not yet in AWS.

Modern application environments usually combine multiple heterogeneous database systems, like operational relational and NoSQL databases, in addition to, the BI-powering data warehouses. With AWS Glue, we orchestrate the ETL (extract, transform, and load) jobs to fetch the relevant data from the corresponding data sources. We then bring it into a format that Amazon Personalize understands: CSV files with pre-defined column names hosted in an Amazon S3 bucket.

Each dataset consists of one or multiple CSV files, which can be uniquely identified by an Amazon S3 prefix. Additionally, each dataset must have an associated schema describing the structure of your data. Depending on the dataset type, there are required and pre-defined fields:

USER_ID (string) and one additional metadata field for the users dataset
ITEM_ID (string) and one additional metadata field for the items dataset
USER_ID (string), ITEM_ID (string), TIMESTAMP (long; as Epoch time) for the interactions dataset

The following graph presents a high-level architecture for a retail customer, who has a heterogeneous data store landscape.

Using AWS Glue to export datasets from heterogeneous data sources to Amazon S3

To understand how AWS Glue connects to the variety of data sources and allows transforming the data into the required format, we need to drill down into the AWS Glue concepts and components.

One of the key components of AWS Glue is the AWS Glue Data Catalog: a persistent metadata store containing table definitions, connection information, as well as, the ETL job definitions.
The tables are metadata definitions representing the structure of the data in the defined data sources. They do not contain any data entries from the sources but solely the structure definition. You can create a table either manually or automatically by using AWS Glue Crawlers.

AWS Glue Crawlers scan the data in the data sources, extract the schema information from it, and store the metadata as tables in the AWS Glue Data Catalog. This is the preferred approach for defining tables. The crawlers use AWS Glue Connections to connect to the data sources. Each connection contains the properties that are required to connect to a particular data store. The connections will be also used later by the ETL jobs to fetch the data from the data sources.

AWS Glue Crawlers also help to overcome a challenge frequently appearing in microservice environments. Microservice architectures are frequently operated by fully independent and autonomous teams. This means that keeping track of changes to the data source format becomes a challenge. Based on a schedule, the crawlers can be triggered to update the metadata for the relevant data sources in the AWS Glue Data Catalog automatically. To detect cases when a schema change would break the ETL job logic, you can combine the CloudWatch Events emitted by AWS Glue on updating the Data Catalog tables with an AWS Lambda function or a notification send via the Amazon Simple Notification Service (SNS).

The AWS Glue ETL jobs use the defined connections and the table information from the Data Catalog to extract the data from the described sources, apply the user-defined transformation logic and write the results into a data sink. AWS Glue can automatically generate code for ETL jobs to help perform a variety of useful data transformation tasks. AWS Glue Studio makes the ETL development even simpler by providing an easy-to-use graphical interface that accelerates the development and allows designing jobs without writing any code. If required, the generated code can be fully customized.

AWS Glue supports Apache Spark jobs, written either in Python or in Scala, and Python Shell jobs. Apache Spark jobs are optimized to run in a highly scalable, distributed way dealing with any amount of data and are a perfect fit for data transformation jobs. The Python Shell jobs provide access to the raw Python execution environment, which is less scalable but provides a cost-optimized option for orchestrating AWS SDK calls.

The following diagram visualizes the interaction between the components described.

The basic concepts of populating your Data Catalog and processing ETL dataflow in AWS Glue

For each Amazon Personalize dataset type, we create a separate ETL job. Since those jobs are independent, they also can run in parallel. After all jobs have successfully finished, we can start the recommendation engine training. AWS Glue Workflows allow simplifying data pipelines by visualizing and monitoring complex ETL activities involving multiple crawlers, jobs, and triggers, as a single entity.

The following graph visualizes a typical dataset export workflow for training a recommendation engine, which consists of:

a workflow trigger being either manual or scheduled
a Python Shell job to remove the results of the previous export workflow from S3
a trigger firing when the removal job is finished and initiating the parallel execution of the dataset ETL jobs
the three Apache Spark ETL jobs, one per dataset type
a trigger firing when all three ETL jobs are finished and initiating the training notification job
a Python Shell job to initiate a new dataset import or a full training cycle in Amazon Personalize (e.g. by triggering the MLOps pipeline using the AWS SDK)

AWS Glue workflow for extracting the three datasets and triggering the training workflow of the recommendation engine

Combining the data export and the recommendation engine

In the previous sections, we discussed how to create an ML-based recommendation engine and how to create the datasets for the training of the engine. In this section, we combine both parts of the solution leveraging an adjusted version of the MLOps pipeline solution available on GitHub to speed up the iterations on new solution versions by avoiding manual steps. Moreover, automation means new items can be put faster into production.

The MLOps pipeline uses a JSON file hosted in an S3 bucket to describe the training parameters for Amazon Personalize. The creation of a new parameter file version triggers a new training workflow orchestrated in a serverless manner using AWS Step Functions and AWS Lambda.

To integrate the Glue data export workflow described in the previous section, we also enable the Glue workflow to trigger the training pipeline. Additionally, we manipulate the pipeline to read the parameter file as the first pipeline step. The resulting architecture enables an automated end-to-end set up from dataset export up to the recommendation engine creation.

End-to-end architecture combining the data export with AWS Glue, the MLOps training workflow, and Amazon Personalize

The architecture for the end-to-end data export and recommendation engine creation solution is completely serverless. This makes it highly scalable, reliable, easy to maintain, and cost-efficient. You pay only for what you consume. For example, in the case of the data export, you pay only for the duration of the AWS Glue crawler executions and ETL jobs. These are only need to run to iterate with a new dataset.

The solution is also flexible in terms of the connected data sources. This architecture is also recommended for use cases with a single data source. You can also start with a single data store and enrich the datasets on-demand with additional data sources in future iterations.

Testing the quality of the solution

A common approach to validate the quality of the solution is the A/B testing technique, which is widely used to measure the efficacy of generated recommendations. Based on the testing results, you can iterate on the recommendation engine by optimizing the underlying datasets and models. The high degree of automation increases the speed of iterations and the resiliency of the end-to-end process.

Conclusion

In this post, I presented a typical serverless architecture for a fully automated, end-to-end ML-based recommendation engine leveraging available historical datasets. As you begin to experiment with ML-based personalization, you will unlock value currently hidden in the data. This helps mitigate potential concerns like the lack of trust in machine learning and you can put the resulting engine into production.

Start your personalization journey today with the Amazon Personalize code samples and bring the engine to production with the architecture outlined in this blog. As a next step, you can involve recording real-time events to update the generated recommendations automatically based on the event data.

Supporting content decision makers with machine learning

2020-12-10 Netflix Technology Blog

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f

by Melody Dye*, Chaitanya Ekanadham*, Avneesh Saluja*, Ashish Rastogi
* contributed equally

Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play on our service. Our job is to support them.

The commissioning of a series or film, which we refer to as a title, is a creative decision. Executives consider many factors including narrative quality, relation to the current societal context or zeitgeist, creative talent relationships, and audience composition and size, to name a few. The stakes are high (content is expensive!) as is the uncertainty of the outcome (it is difficult to predict which shows or films will become hits). To mitigate this uncertainty, executives throughout the entertainment industry have always consulted historical data to help characterize the potential audience of a title using comparable titles, if they exist. Two key questions in this endeavor are:

Which existing titles are comparable and in what ways?
What audience size can we expect and in which regions?

The increasing vastness and diversity of what our members are watching make answering these questions particularly challenging using conventional methods, which draw on a limited set of comparable titles and their respective performance metrics (e.g., box office, Nielsen ratings). This challenge is also an opportunity. In this post we explore how machine learning and statistical modeling can aid creative decision makers in tackling these questions at a global scale. The key advantage of these techniques is twofold. First, they draw on a much wider range of historical titles (spanning global as well as niche audiences). Second, they leverage each historical title more effectively by isolating the components (e.g., thematic elements) that are relevant for the title in question.

Our approach is rooted in transfer learning, whereby performance on a target task is improved by leveraging model parameters learned on a separate but related source task. We define a set of source tasks that are loosely related to the target tasks represented by the two questions above. For each source task, we learn a model on a large set of historical titles, leveraging information such as title metadata (e.g., genre, runtime, series or film) as well as tags or text summaries curated by domain experts describing thematic/plot elements. Once we learn this model, we extract model parameters constituting a numerical representation or embedding of the title. These embeddings are then used as inputs to downstream models specialized on the target tasks for a smaller set of titles directly relevant for content decisions (Figure 1). All models were developed and deployed using metaflow, Netflix’s open source framework for bringing models into production.

To assess the usefulness of these embeddings, we look at two indicators: 1) Do they improve the performance on the target task via downstream models? And just as importantly, 2) Are they useful to our creative partners, i.e. do they lend insight or facilitate apt comparisons (e.g., revealing that a pair of titles attracts similar audiences, or that a pair of countries have similar viewing behavior)? These considerations are key in informing subsequent lines of research and innovation.

Figure 1: Similar title identification and audience sizing can be supported by a common learned title embedding.

Similar titles

In entertainment, it is common to contextualize a new project in terms of existing titles. For example, a creative executive developing a title might wonder: Does this teen movie have more of the wholesome, romantic vibe ofTo All the Boys I’ve Loved Before or more of the dark comedic bent of The End of the F***ing World? Similarly, a marketing executive refining her “elevator pitch” might summarize a title with: “The existential angst of Eternal Sunshine of the Spotless Mind meets the surrealist flourishes of The One I Love.”

To make these types of comparisons even richer we “embed” titles in a high-dimensional space or “similarity map,” wherein more similar titles appear closer together with respect to a spatial distance metric such as Euclidean distance. We can then use this similarity map to identify clusters of titles that share common elements (Figure 2), as well as surface candidate similar titles for an unlaunched title.

Notably, there is no “ground truth” about what is similar: embeddings optimized on different source tasks will yield different similarity maps. For example, if we derive our embeddings from a model that classifies genre, the resulting map will minimize the distance between titles that are thematically similar (Figure 2). By contrast, embeddings derived from a model that predicts audience size will align titles with similar performance characteristics. By offering multiple views into how a given title is situated within the broader content universe, these similarity maps offer a valuable tool for ideation and exploration for our creative decision makers.

Figure 2: T-SNE visualization of embeddings learned from content categorization task.

Transfer learning for audience sizing

Another crucial input for content decision makers is an estimate of how large the potential audience will be (and ideally, how that audience breaks down geographically). For example, knowing that a title will likely drive a primary audience in Spain along with sizable audiences in Mexico, Brazil, and Argentina would aid in deciding how best to promote it and what localized assets (subtitles, dubbings) to create ahead of time.

Predicting the potential audience size of a title is a complex problem in its own right, and we leave a more detailed treatment for the future. Here, we simply highlight how embeddings can be leveraged to help tackle this problem. We can include any combination of the following as features in a supervised modeling framework that predicts audience size in a given country:

Embedding of a title
Embedding of a country we’d like to predict audience size in
Audience sizes of past titles with similar embeddings (or some aggregation of them)

Figure 3: How we can use transfer-learned embeddings to help with demand prediction.

As an example, if we are trying to predict the audience size of a dark comedic title in Brazil, we can leverage the aforementioned similarity maps to identify similar dark comedies with an observed audience size in Brazil. We can then include these observed audience sizes (or some weighted average based on similarity) as features. These features are interpretable (they are associated with known titles and one can reason/debate about whether those titles’ performances should factor into the prediction) and significantly improve prediction accuracy.

Learning embeddings

How do we produce these embeddings? The first step is to identify source tasks that will produce useful embeddings for downstream model consumption. Here we discuss two types of tasks: supervised and self-supervised.

Supervised

A major motivation for transfer learning is to “pre-train” model parameters by first learning them on a related source task for which we have more training data. Inspecting the data we have on hand, we find that for any title on our service with sufficient viewing data, we can (1) categorize the title based on who watched it (a.k.a. “content category”) and (2) observe how many subscribers watched it in each country (“audience size”). From this title-level information, we devise the following supervised learning tasks:

{metadata, tags, summaries} → content category
{metadata, tags, summaries, country} → audience size in country

When implementing specific solutions to these tasks, two important modeling decisions we need to make are selecting a) a suitable method (“encoder”) for converting title-level features (metadata, tags, summaries) into an amenable representation for a predictive model and b) a model (“predictor”) that predicts labels (content category, audience size) given an encoded title. Since our goal is to learn somewhat general-purpose embeddings that can plug into multiple use cases, we generally prefer parameter-rich models for the encoder and simpler models for the predictor.

Our choice of encoder (Figure 4) depends on the type of input. For text-based summaries, we leverage pre-trained models like BERT to provide context-dependent word embeddings that are then run through a recurrent neural network style architecture, such as a bidirectional LSTM or GRU. For tags, we directly learn tag representations by considering each title as a tag collection, or a “bag-of-tags”. For audience size models where predictions are country-specific, we also directly learn country embeddings and concatenate the resulting embedding to the tag or summary-based representation. Essentially, conversion of each tag and country to its resulting embedding is done via a lookup table.

Likewise, the predictor depends on the task. For category prediction, we train a linear model on top of the encoder representation, apply a softmax operation, and minimize the negative log likelihood. For audience size prediction, we use a single hidden-layer feedforward neural network to minimize the mean squared error for a given title-country pair. Both the encoder and predictor models are optimized via backpropagation, and the representation produced by the optimized encoder is used in downstream models.

Figure 4: encoder architectures to handle various kinds of title-related inputs. For text summaries, we first convert each word to its context-dependent representation via BERT or a related model, followed by a biGRU to convert the sequence of embeddings to a single (final-state) representation. For tags, we compute the average tag representation (since each title is associated with multiple tags).

Self-supervised

Knowledge graphs are abstract graph-based data structures which encode relations (edges) between entities (nodes). Each edge in the graph, i.e. head-relation-tail triple, is known as a fact, and in this way a set of facts (i.e. “knowledge”) results in a graph. However, the real power of the graph is the information contained in the relational structure.

At Netflix, we apply this concept to the knowledge contained in the content universe. Consider a simplified graph whose nodes consist of three entity types: {titles, books, metadata tags} and whose edges encode relationships between them (e.g., “Apocalypse Now is based on Heart of Darkness” ; “21 Grams has a storyline around moral dilemmas”) as illustrated in Figure 5. These facts can be represented as triples (h, r, t), e.g. (Apocalypse Now, based_on, Heart of Darkness), (21 Grams, storyline, moral dilemmas). Next, we can craft a self-supervised learning task where we randomly select edges in the graph to form a test set, and condition on the rest of the graph to predict these missing edges. This task, also known as link prediction, allows us to learn embeddings for all entities in the graph. There are a number of approaches to extract embeddings and our current approach is based on the TransE algorithm. TransE learns an embedding F that minimizes the average Euclidean distance between (F(h) + F(r)) and F(t).

Figure 5: Left: Illustration of a graph relating titles, books, and thematic elements to each other. Right: Illustration of translational embeddings in which the sum of the head and relation embeddings approximates the tail embedding.

The self-supervision is crucial since it allows us to train on titles both on and off our service, expanding the training set considerably and unlocking more gains from transfer learning. The resulting embeddings can then be used in the aforementioned similarity models and audience sizing models models.

Epilogue

Making great content is hard. It involves many different factors and requires considerable investment, all for an outcome that is very difficult to predict. The success of our titles is ultimately determined by our members, and we must do our best to serve their needs given the tools and data we have. We identified two ways to support content decision makers: surfacing similar titles and predicting audience size, drawing from various areas such as transfer learning, embedding representations, natural language processing, and supervised learning. Surfacing these types of insights in a scalable manner is becoming ever more crucial as both our subscriber base and catalog grow and become increasingly diverse. If you’d like to be a part of this effort, please contact us!.

Supporting content decision makers with machine learning was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

re:Invent 2020 Liveblog: Andy Jassy Keynote

2020-11-27 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/reinvent-2020-liveblog-andy-jassy-keynote/

I’m always ready to try something new! This year, I am going to liveblog Andy Jassy‘s AWS re:Invent keynote address, which takes place from 8 a.m. to 11 a.m. on Tuesday, December 1 (PST). I’ll be updating this post every couple of minutes as I watch Andy’s address from the comfort of my home office. Stay tuned!

— Jeff;

Split-Second Phantom Images Fool Autopilots

2020-10-19 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/10/split-second-phantom-images-fool-autopilots.html

Researchers are tricking autopilots by inserting split-second images into roadside billboards.

Researchers at Israel’s Ben Gurion University of the Negev … previously revealed that they could use split-second light projections on roads to successfully trick Tesla’s driver-assistance systems into automatically stopping without warning when its camera sees spoofed images of road signs or pedestrians. In new research, they’ve found they can pull off the same trick with just a few frames of a road sign injected on a billboard’s video. And they warn that if hackers hijacked an internet-connected billboard to carry out the trick, it could be used to cause traffic jams or even road accidents while leaving little evidence behind.

[…]

In this latest set of experiments, the researchers injected frames of a phantom stop sign on digital billboards, simulating what they describe as a scenario in which someone hacked into a roadside billboard to alter its video. They also upgraded to Tesla’s most recent version of Autopilot known as HW3. They found that they could again trick a Tesla or cause the same Mobileye device to give the driver mistaken alerts with just a few frames of altered video.

The researchers found that an image that appeared for 0.42 seconds would reliably trick the Tesla, while one that appeared for just an eighth of a second would fool the Mobileye device. They also experimented with finding spots in a video frame that would attract the least notice from a human eye, going so far as to develop their own algorithm for identifying key blocks of pixels in an image so that a half-second phantom road sign could be slipped into the “uninteresting” portions.

The paper:

Abstract: In this paper, we investigate “split-second phantom attacks,” a scientific gap that causes two commercial advanced driver-assistance systems (ADASs), Telsa Model X (HW 2.5 and HW 3) and Mobileye 630, to treat a depthless object that appears for a few milliseconds as a real obstacle/object. We discuss the challenge that split-second phantom attacks create for ADASs. We demonstrate how attackers can apply split-second phantom attacks remotely by embedding phantom road signs into an advertisement presented on a digital billboard which causes Tesla’s autopilot to suddenly stop the car in the middle of a road and Mobileye 630 to issue false notifications. We also demonstrate how attackers can use a projector in order to cause Tesla’s autopilot to apply the brakes in response to a phantom of a pedestrian that was projected on the road and Mobileye 630 to issue false notifications in response to a projected road sign. To counter this threat, we propose a countermeasure which can determine whether a detected object is a phantom or real using just the camera sensor. The countermeasure (GhostBusters) uses a “committee of experts” approach and combines the results obtained from four lightweight deep convolutional neural networks that assess the authenticity of an object based on the object’s light, context, surface, and depth. We demonstrate our countermeasure’s effectiveness (it obtains a TPR of 0.994 with an FPR of zero) and test its robustness to adversarial machine learning attacks.

Pay as you go machine learning inference with AWS Lambda

2020-10-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/pay-as-you-go-machine-learning-inference-with-aws-lambda/

This post is courtesy of Eitan Sela, Senior Startup Solutions Architect.

Many customers want to deploy machine learning models for real-time inference, and pay only for what they use. Using Amazon EC2 instances for real-time inference may not be cost effective to support sporadic inference requests throughout the day.

AWS Lambda is a serverless compute service with pay-per-use billing. However, ML frameworks like XGBoost are too large to fit into the 250 MB application artifact size limit, or the 512 MB /tmp space limit. While you can store the packages in Amazon S3 and download to Lambda (up to 3 GB), this can increase the cost.

To address this, Lambda functions can now mount an Amazon Elastic File System (EFS). This is a scalable and elastic NFS file system storing data within and across multiple Availability Zones (AZ) for high availability and durability.

With this new capability, it’s now easier to use Python packages in Lambda that require storage space to load models and other dependencies.

In this blog post, I walk through how to:

Create an EFS file system and an Access Point as an application-specific entry point.
Provision an EC2 instance, mount EFS using the Access Point, and train a breast cancer XGBoost ML model. XGBoost, Python packages, and the model are saved on the EFS file system.
Create a Lambda function that loads the Python packages and model from EFS, and performs the prediction based on a test event.

Create an Amazon EFS file system with an Access Point

Configuring EFS for Lambda is straight-forward. I show how to do this in the AWS CloudFormation but you can also use the AWS CLI, AWS SDK, and AWS Serverless Application Model (AWS SAM).

EFS file systems are created within a customer VPC, so Lambda functions using the EFS file system must have access to the same VPC.

You can deploy the AWS CloudFormation stack located on this GitHub repository.

The stack includes the following:

Create a VPC with public subnet.
Create an EFS file system
Create an EFS Access Point
Create an EC2 in the VPC

It can take up to 10 minutes for the CloudFormation stack to create the resources. After the resource creation is complete, navigate to the EFS console to see the new file system.

Navigate to the Access Points panel to see a new Access Point with the File system ID from the previous page.

Note the Access Point ID and File System ID for the following sections.

Launch an Amazon EC2 instance to train a breast cancer model

In this section, you install Python packages on the EFS file system, after mounting it to EC2. You then train the breast cancer model, and save the model in the EFS file system used by the Lambda function.

The machine learning framework you use for this function is XGBoost. This is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. XGBoost is one of the most popular machine learning algorithms.

Navigate to the EC2 console to see the new EC2 instance created from the CloudFormation stack. This is an Amazon Linux 2 c5.large EC2 instance named ‘xgboost-for-serverless-inference-cfn-ec2’. In the instance details, you see that the security group is configured to allow inbound SSH access (for connecting to the instance).

Mount the EFS file system on the EC2

Connect to the instance using SSH and mount the EFS file system previously created by using the Access Point:

Install amazon-efs-utils tools:
sudo yum -y install amazon-efs-utils
Create a directory to mount EFS into:
mkdir efs
Mount the EFS file system using the Access Point:
sudo mount -t efs -o tls,accesspoint=<Access point ID> <File system ID>:/ efs

Install Python, pip and required packages

Install Python and pip:
sudo yum -y install python37
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py --user
Verify the installation:
python3 --version
pip3 --version
Create a requirements.txt file containing the dependencies:
xgboost==1.1.1
pandas
sklearn
joblib
Install the Python packages using the requirements file:
pip3 install -t efs/lib/ -r requirements.txt
Note: using bursting throughput mode with EFS File system, this action can take up to 10 minutes.
Set the Python path to refer to the installed packages directory of EFS file system:
export PYTHONPATH=/home/ec2-user/efs/lib/

Train the breast cancer model

The breast cancer model predicts whether the breast mass is a malignant tumor or benign by looking at features computed from a digitized image of a fine needle aspirate of a breast mass.

The data used to train the model consists of the diagnosis in addition to the 10 real-valued features that are computed for each cell nucleus. Such features include radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. The prediction returned by the model is either “B” for benign or “M” for malignant. This sample project uses the public Breast Cancer Wisconsin (Diagnostic) dataset.

After installing the required Python packages, train a XGBoost model on the breast cancer dataset:

Create a bc_xgboost_train.py file containing the Python code needed to train a breast cancer XGBoost model. Download the code here.
Start the training of the model:python3 bc_xgboost_train.pyYou see the following message:The model file bc-xgboost-model is created in the root directory.
Create a new directory on the EFS file system and copy the XGBoost breast cancer model:
mkdir efs/model
cp bc-xgboost-model efs/model/
Check you have the required Python packages and the model on the EFS file system:
ls efs/model/ efs/lib/
You see all the Python packages installed previously in the lib directory, and the model file in the model directory.
Review the total size of lib Python packages directory:
du -sh efs/lib/

You can see that the total size of lib directory is 534 MB. This is a larger package size than was allowed before EFS for Lambda.

Building a serverless machine learning inference using Lambda

In this section, you use the EFS file system previously configured for the Lambda function to import the required libraries and load the model.

Using EFS with Lambda

The AWS SAM template creates the Lambda function, mount the EFS Access Point created earlier, and both IAM roles required.

It takes several minutes for the AWS SAM CLI to create the Lambda function. After, navigate to the Lambda console to see the created Lambda function.

In the Lambda function configuration, you see the environment variables, and basic settings, such as runtime, memory, and timeout.

Further down, you see that the Lambda function has the VPC access configured, and the file system is mounted.

Test your Lambda function

In the Lambda console, select Configure test events from the Test events dropdown.
For Event Name, enter InferenceTestEvent.
Copy the event JSON from here and paste in the dialog box.
Choose Create. After saving, you see InferenceTestEvent in the Test list. Now choose Test.

You see the Lambda function inference result, log output, and duration:

Conclusion

In this blog post, you train an XGBoost breast cancer model using Python packages installed on an Amazon EFS file system. You create an AWS Lambda function that loads the Python packages and the model from EFS file system, and perform the predictions.

Now you know how to call a machine learning model inference using a Lambda function. To learn more about other real-world examples, see:

AWS Architecture Monthly Magazine: Robotics

2020-09-14 Annik Stahl

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/architecture-monthly-magazine-robotics/

September’s issue of AWS Architecture Monthly issue is all about robotics. Discover why iRobot, the creator of your favorite (though maybe not your pet’s favorite) little robot vacuum, decided to move its mission-critical platform to the serverless architecture of AWS. Learn how and why you sometimes need to test in a virtual environment instead of a physical one. You’ll also have the opportunity to hear from technical experts from across the robotics industry who came together for the AWS Cloud Robotics Summit in August.

Our expert this month, Matt Hansen (who has dreamed of building robots since he was a teen), gives us his outlook for the industry and explains why cloud will be an essential part of that.

In September’s Robotics issue

Ask an Expert: Matt Hansen, Principle Solutions Architect
Blog: Testing a PR2 Robot in a Simulated Hospital
Case Study: iRobot
Blog: Introduction to Automatic Testing of Robotics Applications
Case Study: Multiply Labs Uses AWS RoboMaker to Manufacture Individualized Medicines
Demos & Videos: AWS Cloud Robotics Summit (August 18-19, 2020)
Related Videos: iRobot and ZS Associates

Survey opportunity

This month, we’re also asking you to take a 10-question survey about your experiences with this magazine. The survey is hosted by an external company (Qualtrics), so the below survey button doesn’t lead to our website. Please note that AWS will own the data gathered from this survey, and we will not share the results we collect with survey respondents. Your responses to this survey will be subject to Amazon’s Privacy Notice. Please take a few moments to give us your opinions.

How to access the magazine

Readers in the US, UK, Germany, and France can subscribe to the Kindle version of the magazine at Kindle Newsstand.
View and download past issues as PDFs on the AWS Architecture Monthly webpage.
Visit Flipboard, a personalized mobile magazine app that you can also read on your computer.

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Overview

Prerequisites

Deploying the example application

Testing the application

Understanding the code structure

CDK script

Adding a translator

Cleaning up

Conclusion

Introduction

Background

Problem

Solution

Architecture Details

Flexible Event Schema

A User Portal to Support Querying, Prediction Stats and Feedback

How to Use Archivist for Your Service

Lessons

Join Us

How SmartChat Works

Diving Deeper into the Problem

Usability is Key

How We Track Progress

Architecture

Designing the Machine Learning Model

Typeahead Solution for the User Interface

Results

What’s Next?

Join Us

What is Machine Learning?

Machine Learning frameworks

Depth estimation

Depth Estimation Challenge

What are resource leaks?

Detecting resource leaks in CodeGuru Reviewer

Combining static code analysis with machine learning for accurate resource leak detection

Responses to the resource leak findings

Conclusion

Solution Overview

Conclusion

Similar titles

Transfer learning for audience sizing

Learning embeddings

Supervised

Self-supervised

Epilogue

Create an Amazon EFS file system with an Access Point

Launch an Amazon EC2 instance to train a breast cancer model

Mount the EFS file system on the EC2

Install Python, pip and required packages

Train the breast cancer model

Building a serverless machine learning inference using Lambda

Using EFS with Lambda

Test your Lambda function

Conclusion

In September’s Robotics issue

Survey opportunity

How to access the magazine

The collective thoughts of the interwebz