Tag Archives: artificial intelligence

Paging Doctor Cloud! Amazon HealthLake Is Now Generally Available

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/paging-doctor-cloud-amazon-healthlake-is-now-generally-available/

At AWS re:Invent 2020, we previewed Amazon HealthLake, a fully managed, HIPAA-eligible service that allows healthcare and life sciences customers to aggregate their health information from different silos and formats into a structured, centralized AWS data lake, and extract insights from that data with analytics and machine learning (ML). Today, I’m very happy to announce that Amazon HealthLake is generally available to all AWS customers.

The ability to store, transform, and analyze health data quickly and at any scale is critical in driving high-quality health decisions. In their daily practice, doctors need a complete chronological view of patient history to identify the best course of action. During an emergency, giving medical teams the right information at the right time can dramatically improve patient outcomes. Likewise, healthcare and life sciences researchers need high-quality, normalized data that they can analyze and build models with, to identify population health trends or drug trial recipients.

Traditionally, most health data has been locked in unstructured text such as clinical notes, and stored in IT silos. Heterogeneous applications, infrastructure, and data formats have made it difficult for practitioners to access patient data, and extract insights from it. We built Amazon HealthLake to solve that problem.

If you can’t wait to get started, you can jump to the AWS console for Amazon HealthLake now. If you’d like to learn more, read on!

Introducing Amazon HealthLake
Amazon HealthLake is backed by fully-managed AWS infrastructure. You won’t have to procure, provision, or manage a single piece of IT equipment. All you have to do is create a new data store, which only takes a few minutes. Once the data store is ready, you can immediately create, read, update, delete, and query your data. HealthLake exposes a simple REST Application Programming Interface (API) available in the most popular languages, which customers and partners can easily integrate in their business applications.

Security is job zero at AWS. By default, HealthLake encrypts data at rest with AWS Key Management Service (KMS). You can use an AWS-managed key or your own key. KMS is designed so that no one, including AWS employees, can retrieve your plaintext keys from the service. For data in transit, HealthLake uses industry-standard TLS 1.2 encryption end to end.

At launch, HealthLake supports both structured and unstructured text data typically found in clinical notes, lab reports, insurance claims, and so on. The service stores this data in the Fast Healthcare Interoperability Resource (FHIR, pronounced ‘fire’) format, a standard designed to enable exchange of health data. HealthLake is compatible with the latest revision (R4) and currently supports 71 FHIR resource types, with additional resources to follow.

If your data is already in FHIR format, great! If not, you can convert it yourself, or rely on partner solutions available in AWS Marketplace. At launch, HealthLake includes validated connectors for Redox, HealthLX, Diameter Health, and InterSystems applications. They make it easy to convert your HL7v2, CCDA, and flat file data to FHIR, and to upload it to HealthLake.

As data is uploaded, HealthLake uses integrated natural language processing to extract entities present in your documents and stores the corresponding metadata. These entities include anatomy, medical conditions, medication, protected health information, test, treatments, and procedures. They are also matched to industry-standard ICD-10-CM and RxNorm entities.

After you’ve uploaded your data, you can start querying it, by assigning parameter values to FHIR resources and extracted entities. Whether you need to access information on a single patient, or want to export many documents to build a research dataset, all it takes is a single API call.

Let’s do a quick demo.

Querying FHIR Data in Amazon HealthLake
Opening the AWS console for HealthLake, I click on ‘Create a Data Store’. Then, I simply pick a name for my data store, and decide to encrypt it with an AWS managed key. I also tick the box that preloads sample synthetic data, which is a great way to quickly kick the tires of the service without having to upload my own data.

Creating a data store

After a few minutes, the data store is active, and I can send queries to its HTTPS endpoint. In the example below, I look for clinical notes (and clinical notes only) that contain the ICD-CM-10 entity for ‘hypertension’ with a confidence score of 99% or more. Under the hood, the AWS console is sending an HTTP GET request to the endpoint. I highlighted the corresponding query string.

Querying HealthLake

The query runs in seconds. Examining the JSON response in my browser, I see that it contains two documents. For each one, I can see lots of information: when it was created, which organization owns it, who the author is, and more. I can also see that HealthLake has automatically extracted a long list of entities, with names, descriptions, and confidence scores, and added them to the document.

HealthLake entities

The document is attached in the response in base64 format.

HealthLake document

Saving the string to a text file, and decoding it with a command-line tool, I see the following:

Mr Nesser is a 52 year old Caucasian male with an extensive past medical history that includes coronary artery disease , atrial fibrillation , hypertension , hyperlipidemia , presented to North ED with complaints of chills , nausea , acute left flank pain and some numbness in his left leg

This document is spot on. As you can see, it’s really easy to query and retrieve data stored in Amazon HealthLake.

Analyzing Data Stored in Amazon HealthLake
You can export data from HealthLake, store it in an Amazon Simple Storage Service (Amazon S3) bucket and use it for analytics and ML tasks. For example, you could transform your data with AWS Glue, query it with Amazon Athena, and visualize it with Amazon QuickSight. You could also use this data to build, train and deploy ML models on Amazon SageMaker.

The following blog posts show you end-to-end analytics and ML workflows based on data stored in HealthLake:

Last but not least, this self-paced workshop will show you how to import and export data with HealthLake, process it with AWS Glue and Amazon Athena, and build an Amazon QuickSight dashboard.

Now, let’s see what our customers are building with HealthLake.

Customers Are Already Using Amazon HealthLake
Based in Chicago, Rush University Medical Center is an early adopter of HealthLake. They used it to build a public health analytics platform on behalf of the Chicago Department of Public Health. The platform aggregates, combines, and analyzes multi-hospital data related to patient admissions, discharges and transfers, electronic lab reporting, hospital capacity, and clinical care documents for COVID-19 patients who are receiving care in and across Chicago hospitals. 17 of the 32 hospitals in Chicago are currently submitting data, and Rush plans to integrate all 32 hospitals by this summer. You can learn more in this blog post.

Recently, Rush launched another project to identify communities that are most exposed to high blood pressure risks, understand the social determinants of health, and improve healthcare access. For this purpose, they collect all sorts of data, such as clinical notes, ambulatory blood pressure measurements from the community, and Medicare claims data. This data is then ingested it into HealthLake and stored in FHIR format for further analysis.

Dr. Hota

Says Dr. Bala Hota, Vice President and Chief Analytics Officer at Rush University Medical Center: “We don’t have to spend time building extraneous items or reinventing something that already exists. This allows us to move to the analytics phase much quicker. Amazon HealthLake really accelerates the sort of insights that we need to deliver results for the population. We don’t want to be spending all our time building infrastructure. We want to deliver the insights.


Cortica is on a mission to revolutionize healthcare for children with autism and other developmental differences. Today, Cortica use HealthLake to store all patient data in a standardized, secured, and compliant manner. Building ML models with that data, they can track the progress of their patients with sentiment analysis, and they can share with parents the progress that their children are doing on speech development and motor skills. Cortical can also validate the effectiveness of treatment models and optimize medication regimens.

Ernesto DiMarinoErnesto DiMarino, Head of Enterprise Applications and Data at Cortica told us: “In a matter of weeks rather than months, Amazon HealthLake empowered us to create a centralized platform that securely stores patients’ medical history, medication history, behavioral assessments, and lab reports. This platform gives our clinical team deeper insight into the care progression of our patients. Using predefined notebooks in Amazon SageMaker with data from Amazon HealthLake, we can apply machine learning models to track and prognosticate each patient’s progression toward their goals in ways not otherwise possible. Through this technology, we can also share HIPAA-compliant data with our patients, researchers, and healthcare partners in an interoperable manner, furthering important research into autism treatment.

MEDHOST provides products and services to more than 1,000 healthcare facilities of all types and sizes. These customers want to develop solutions to standardize patient data in FHIR format and build dashboards and advanced analytics to improve patient care, but that is difficult and time consuming today.

Says Pandian Velayutham, Sr. Director Of Engineering at MEDHOST: “With Amazon HealthLake we can meet our customers’ needs by creating a compliant FHIR data store in just days rather than weeks with integrated natural language processing and analytics to improve hospital operational efficiency and provide better patient care.



Getting Started
Amazon HealthLake is available today in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions.

Give our self-paced workshop a try, and let us know what you think. As always, we look forward to your feedback. You can send it through your usual AWS Support contacts, or post it on the AWS Forums.

– Julien

Physicists Teach AI to Simulate Atomic Clusters

Post Syndicated from Matthew Hutson original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/replacing-simulations-of-atomic-clusters-with-ai

Physicists love recreating the world in software. A simulation lets you explore many versions of reality to find patterns or to test possibilities. But if you want one that’s realistic down to individual atoms and electrons, you run out of computing juice pretty quickly.

Machine-learning models can approximate detailed simulations, but often require lots of expensive training data. A new method shows that physicists can lend their expertise to machine-learning algorithms, helping them train on a few small simulations consisting of a few atoms, then predict the behavior of system with hundreds of atoms. In the future, similar techniques might even characterize microchips with billions of atoms, predicting failures before they occur.

The researchers started with simulated units of 16 silicon and germanium atoms, two elements often used to make microchips. They employed high-performance computers to calculate the quantum-mechanical interactions between the atoms’ electrons. Given a certain arrangement of atoms, the simulation generated unit-level characteristics such as its energy bands, the energy levels available to its electrons. But “you realize that there is a big gap between the toy models that we can study using a first-principles approach and realistic structures,” says Sanghamitra Neogi, a physicist at the University of Colorado, Boulder, and the paper’s senior author. Could she and her co-author, Artem Pimachev, bridge the gap using machine learning? 

The idea, published in June, in npj Computational Materials, was to train machine-learning models to predict energy bands from 16-atom arrangements, then feed the models larger arrangements and see if they could predict their energy bands. “Essentially, we’re trying to poke this world of billions of atoms,” Neogi says. “That physics is completely unknown.”

A traditional model might require ten thousand training examples, Neogi says. But she and Pimachev thought they could do better. So they applied physics principles to generate the right training data.

First, they knew that strain changes energy bands, so they simulated 16-atom units with different amounts of strain, rather than wasting time generating a lot of simulations with the same strain. 

Second, they spent a year finding a way to describe the atomic arrangements that would be useful for the model, a way to “fingerprint” the units. They decided to represent a unit as a set of 3D shapes with flat walls, one for each atom. A shapes’ walls were defined by points that were equidistant between the atom and its neighbors. (Together, the shapes fit snugly together into what’s called a Voronoi tessellation.) “If you’re smart enough to create a good set of fingerprints,” Neogi says, “that eliminates the need of a large amount of data.” Their training sets consisted of no more than 357 examples. 

Neogi and Pimachev trained two different types of models—a neural network and a random forest, or set of decision trees—and tested them on three different types of structures, comparing their data with that from detailed simulations. The first structures were “ideal superlattices,” which might contain several atomic layers of pure silicon, followed by several layers of pure germanium, and so on. They tested these in strained and relaxed conditions. The second structures were “non-ideal heterostructures,” in which a given layer might vary in its thickness or contain defects. Third were “fabricated heterostructures,” which had sections of pure silicon and sections of silicon-germanium alloys. Test cases contained up to 544 atoms. 

Across conditions, the predictions of the random forests differed from the simulation outputs by 3.7 percent to 19 percent, and the neural networks differed by 2.3 percent to 9.6 percent.

“We didn’t expect that we would be able to simulate such a large system,” Neogi says. “Five hundred atoms is a huge deal.” Further, even as the number of atoms in a system increased exponentially, the hours of computation the models required to make predictions scaled only linearly, meaning that that world of billions of atoms is relatively reachable. 

“I thought it was very clever,” Logan Ward, a computational scientist at Argonne National Laboratory, in Lemont, Illinois, says about the study. “The authors did a really neat job of mixing their understanding of the physics at different stages to get the machine learning models to work. I haven’t seen something quite like it before.”

In Neogi’s follow-up work, to be published in the coming months, her lab performed an inverse operation. Given a material’s energy bands, their system predicted its atomic arrangement. Such a system gets them closer to diagnosing faults in computer chips. If a semiconductor’s conductivity is off, they might point to the flaw. 

The framework they present has applications to other kinds of materials, as well. Regarding the physics-informed approach to generating and representing training examples, Neogi says, “a little can tell us a lot if we know where to look.”

It’s Easy for Computers to Detect Sarcasm, Right?

Post Syndicated from Steven Cherry original https://spectrum.ieee.org/podcast/artificial-intelligence/machine-learning/its-easy-for-computers-to-detect-sarcasm-right

Hi and welcome to Fixing the Future, IEEE Spectrum’s podcast series on the technologies that can set us on the right path toward sustainability, meaningful work, and a healthy economy for all. Fixing the Future is sponsored by COMSOL, makers of of COMSOL Multiphysics simulation software. I’m Steven Cherry.

Leonard: Hey, Penny. How’s work?
Penny: Great! I hope I’m a waitress at the Cheesecake Factory for my whole life!
Sheldon: Was that sarcasm?
Penny: No.
Sheldon: Was that sarcasm?
Penny: Yes.

Steven Cherry That’s Leonard, Penny and Sheldon from season two of the Big Bang Theory. Fans of the show know there’s some question of whether Sheldon understands sarcasm. In some episodes he does, and in others he’s just learning it. But there’s no question that computers don’t understand sarcasm or didn’t until some researchers at the University of Central Florida started them on a path to learning it. Software engineers have been working on various flavors of sentiment analysis for quite some time. Back in 2005, I wrote an article in Spectrum about call centers automatically scanning conversations for anger either by the caller or the service operator. One of the early use cases behind messages like this call may be monitored for quality assurance purposes. Since then, software has been getting better and better at detecting joy, fear, sadness, confidence and now, finally, sarcasm. My guest today, Ramia Akula, is a PhD student and a graduate research assistant at the University of Central Florida is Complex Adaptive Systems Lab.. She has at least 11 publications to her name, including the most recent interpretable multiheaded self attention architecture for Sarcasm Detection in Social Media, published in March in the journal Entropy with her advisor, Ivan Garibay Ramia. Welcome to the podcast.

Ramya Akula Thank you. It’s my pleasure to be here.

Ramya, maybe you could tell us a little bit about how sentiment analysis works for things like anger and sadness and joy. And then what’s different and harder about sarcasm?

Ramya Akula So in general, understanding the sentiment behind people’s emotions like a variety of emotions. It’s always been hard. Actually, to some extent when you are in a face-to-face conversation, probably with all the visual cues and bodily gestures, it helps the conversation. But when we do not know who is sitting behind the computer or the mobile phone, so it’s always hard. So that applies for all kinds of sentiments. So that includes anger, emotion, humor, and also sarcasm as well. So that’s the initial point of this research.

Steven Cherry And what makes sarcasm harder than some of the others?

Ramya Akula So sometimes sarcasm can be humor, but also it hurts people really bad. Also how people interpret it because of people coming from different cultures, different backgrounds. In some cultures, something might be okay, but in another it is not. So taking these different cultures, backgrounds, and also the colloquialisms and the slang people use, these are some of the challenges that we face in everyday conversations, especially with sarcasm detection.

Steven Cherry Computers have been writing news and sports stories for some time now, taking a bunch of facts and turning them into simple narratives. Professional writers haven’t been particularly worried by this development, though, because the thinking is that computers have a long way to go—which may be never—when it comes to nuanced, subtle, creative forms of writing. What writers are mainly depending on to save their jobs and maybe their souls is irony, satire, humor. What they’re depending on, in a word, is subtext. Are you trying to teach computers to understand subtext?

Ramya Akula To be precise, these algorithms … One of the toughest jobs for the algorithms is understanding the context, which we humans are really good at, so any human can understand the context and then go on the content based on the context, but for the algorithms, it’s always hard because when you have such long sentences, so having the semantic similarity or some kind of a relationship between the words in these long sentences, understanding the context and then coming up with the next sentence or coming up with some kind of a sentiment like a humor or the irony or these kinds of emotions to the text that adds another level of complexity. Yet in the machine learning community, they started, like most researchers, attacking this problem by looking at different representations. So by taking the sentence as it is and then chunking it down into parts like phrases, and then having different representations for each phrase. So in order to understand the context and then put all this context together and then generate a meaningful sentence next. I feel like it’s still in a very initial phase. And we have a long way to go.

You started with social media posts. This seems like in some ways an easier problem and in some ways a harder problem than, say, audio from a call center. You don’t have tone and intonation, which I think in a real conversation are often clues in what we might call human-to-human sarcasm detection.

Ramya Akula Yes. So in speech recognition, that’s one advantage, we look at the connotation or the how the voice modulate,s and then those kind of the signals will help us better understand it. But when we look at the text like a real text from all the articles or the online conversations that we see day to day. So there is not really any stress or any kind of a connotation that you could relate to. So that that’s what it makes a little harder for any algorithm to see. Yeah. So Hodor, for checking the severity of the humor or sarcasm there.

Steven Cherry If I understand something in your paper, neuropsychologists and linguists have apparently worked on sarcasm, but often through identifying sarcastic words and punctuation and also something called sentimental shifts. But what are these? And did you use them too?

Ramya Akula So the neurolinguistics or the psychologists, they primarily look at … So the data that they get is mainly from real humans and real conversations. So it’s actually also when they are looking at the text, text written by real humans, then it’s actually the real humans are understanding the sense of the text. Right. So we humans, as I said earlier, so we are good at understanding the context just by reading it or just by talking in any form. We are good at understanding the context. So in our case, because we have no human involved in any of the data analysis part, so it’s all the pre-processing and everything is done automatically. It’s the algorithm that is doing it.

So we definitely use some cues. And also for the machine learning part, we have the labeled data, which is like giving a sentence, it is labeled with a sentence as sarcasm—has some sarcasm or no sarcasm—and then the data is split into training and test. So we use this data to train our algorithm and then test it on unseen annotated data. So in our case, because the data is already labeled, so we use those labels and also in our case, we use weights to understand what are the cues. So instead of real humans looking at the cues in the sentence, our algorithm looks at the weights that give us the cues for the words.

Steven Cherry We can say a little bit more about this. There’s a lot of AI here, I gather, and it involves pre-trained language models that help you break down a sentence into something you call word embeddings. What are they and how do these models work?

Ramya Akula So basically a computer understands everything in terms of numbers. Right. So we have to convert the words into numbers for the algorithm to understand. So that’s been put forward. So would this embedding does is basically the conversion of the real world into vectors of numbers. In our case, what we’ve used is that we use multiple endings. So there are like many embeddings out there. So starting from [inaudible] to the very latest GPT [Generalized Pre-Trained Transformer] that we are seeing every day, that’s generating tons of data out there.

So in our case, we use the BERT—BERT is one of the latest embedding technologies out there. BERT stands for Bidirectional Encoder Representations from Transformers. I know it’s a mouthful, but it’s basically what it does is that it takes the words—individual words in a sentence—and it tries to relate, connect, each word with every other word, both on the left and right side and also from the right to left side. So the main reason, for the BERT to work that way is that it is trying to understand the positional encoding.

So that’s basically what comes next. Like, for example, I need apples for work. So in this context, does the user mean you need fruit apples for work or an Apple gadget for work? So that depends really on the context. Right. So as I said, humans can understand the context, but for an algorithm to understand what comes, either the gadget or the fruit, it depends on the entire sentence or the conversation. So what BERT does, is basically it looks at these individual positional encodings and then tries to find the similarity or the closest similar word that comes next to it and then put it together. So it works both in the right to left and the left the right directions.

So to better understand and understand the semantic similarity. So similarly, we also have different things like Elmo [Embeddings from Language Models]. We tried experimenting with different embedding types, so we had the BERT, ELMo,  and several others. So we added this part into our studies, so this is just the initial layer. So it’s a type of conversion for converting the real words into numbers to fit it into the algorithm.

Steven Cherry I gather you have in mind that this could help Twitter and Facebook better spot trolls. Is the idea that it could help detect insincere speech of all kinds?

Ramya Akula Yes, it does. That’s a short answer. But adapting algorithms is something … It’s up to the corporates whether they want to do it or not, but that that’s the main idea—to help curb the unhealthy conversations online. So that could be anything, ranging from trolling, bullying, to all the way to misinformation propagation. So that’s a wide spectrum. So, yeah.

Steven Cherry Do you think it would help to work with audio in the way the call centers do? For one thing, it would turn the punctuation cues into tones and inflections that they represent.

Ramya Akula So the most precise answer is yes. But then there is another filter to it though, or actually, adding an additional layer. So the first thing is they’re analyzing the audio form. Right. So in the audio form, we also get the cues like as I said earlier. So we’re based on the audio. I mean, the connotations are the expressions that give us and others another set of helpful cues. But after the conversation is again transcribed, that is when our algorithm can help. So, yes, definitely our algorithm can help for using any kind of speech synthesis or for any application in call center or any voice recorder stuff. Yes, we will also add the speech part to it.

Steven Cherry Ramya, your bachelor’s degree was earned in India and your master’s in Germany before you came to the U.S. You speak five languages. Two of them are apparently Dravidian languages. I have two questions for you. Why the University of Central Florida, and what’s next?

Ramya Akula I did my master’s in Technical University of Kaiserslautern in Germany, and my master’s thesis was mainly on the visualization on social networks. And this is back in 2014. So that is when I got introduced to working on social networks. And I was so fascinated to learn about how people adapt to the changes that comes along their way, adapting the technology, how online life keeps changing.

 For example, before Covid and after Covid, how we’ve moved from face-to-face to a completely virtual world. So when I was doing my master’s thesis on social networks, I was so interested in the topic. And then I worked for a while again in the industry. But then again, I wanted to come back to academics to pursue … Get into the research field, actually to understand—rather than like developing something out there in an industry for someone. I thought maybe I could do some research and try to understand and get more knowledge about the field.

So then I was like looking at different options. One of the options was working with Ivan Garibay because he had the Darpa SocialSim project. And so it about a $6.5 million funded project. But the overall idea of the project is really fascinating. It’s looking at the human simulation, how humans behave on all these kinds of online social media networks. So when I wrote about this project and about his lab, so that was my main I think the trajectory point toward this lab and of my work.

And so this project is also part of that of that main big project. And going forward, I would want to work for a startup where I can learn because every day is like a learning process; we can learn like multiple things.

Steven Cherry It seems like a lot of this would be applicable to chatbots. Is that is that a possible direction?

Ramya Akula Chatbots? Yes, that’s one application in a question-answering system. But there is a lot more to it. So instead of just the automated way of analyzing the question and answering stuff. So it can’t be applied for multiple things like not just the online conversations, but also personal assistants, yeah. So it applies for the personal assistant as well.

Steven Cherry What a computer beat the world champion of chess. It was impressive and winning it go was more impressive. And beating the champions of Jeopardy was impressive, at least until you realized it was mostly the computer knew Wikipedia better and faster than humans. But about four years ago, a computer at Carnegie Mellon University beat some top players at poker, which required to in some sense understand bluffing. That seems impressive in a whole different way from poker and go. And this development with sarcasm seems like a similar advance.

Ramya Akula So the main advantage of having these algorithms is that, as I said, they are really good at understanding the different patterns. Right. We as a human being are limited in that sense, how much of a pro we are in a certain task. And so there is always a limitation to understanding a different pattern and learning the patterns are in fact matching the patterns. That is where we can take hold of help of the algorithms like it, like our sarcasm detector or any other machine learning algorithms, because they look at all possible combinations. And also the beauty of this, the beauty of machine learning is that so the algorithm knows when it should stop learning.

Or actually the programmer who is looking at the training lost, like when the training is like really dropping, then that’s when he would know that it’s now it’s starting to decay. Like, for example, it is all fitting on the data. So we have to stop the training. So those are those are the kind of indications for a programmer to stop training.

But after the training, then we can see how well these patterns are learned. So all the all the previous achievements by different machine learning algorithms, precisely the reinforcement learning algorithms, is that it could look at all different, I mean, the variety of combinations of winning chances. And yeah. And then like having all that data within the very last time and then learn from it. It’s like sort of most of these also had some kind of feedback loop. So from which it learns. So sometimes the programmer that helps or the human in the loop that helps the training and sometimes the machine learning train learns by itself. Yeah. So these algorithms help us better understand the patterns and we humans better understand the context.

Steven Cherry Well, Ramya, there are two kinds of people in the world, those who hate sarcasm and those who live by it. I sometimes think that no two people can have a friendship or a romance if they’re on opposite sides of that line. And I can’t think of a more exciting and terrifying prospect than a robot that understands sarcasm. Exciting because maybe I can have a real conversation someday with Siri and terrifying if it means software will soon be writing better fiction than me and not just sports records—to say nothing of the advance towards Skynet. But thanks for this completely trivial and unimportant work and for joining us today.

Ramya Akula It’s my pleasure. It was fun talking to you.

Steven Cherry We’ve been speaking with University of Central Florida PhD student Rami Akula, whose work on detecting sarcasm, a significant advance in the field of sentiment analysis, was published recently in the journal Entropy.

Fixing the Future is sponsored by COMSOL, makers of mathematical modeling software and a longtime supporter of IEEE Spectrum as a way to connect and communicate with engineers.

Fixing the Future is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.

This interview was recorded May 21st, 2021, on Adobe Audition via Zoom, and edited in Audacity. Our theme music is by Chad Crouch.

You can subscribe to Fixing the Future on Spotify, Stitcher, Apple, Google, and wherever else you get your podcasts, or listen on the Spectrum website, which also contains transcripts of all our episodes. We welcome your feedback on the web or in social media.

For Radio Spectrum, I’m Steven Cherry.

Reckoning With Tech Before It Becomes Invisible

Post Syndicated from Stacey Higginbotham original https://spectrum.ieee.org/artificial-intelligence/machine-learning/reckoning-with-tech-before-it-becomes-invisible

Ten years ago, venture capitalist Marc Andreessen proclaimed that software was eating the world. Today, the hottest features in the latest phones are software updates or AI improvements, not faster chips or new form factors. Technology is becoming more mundane, and ultimately, invisible. 

This probably doesn’t bother you. But even as technologies fade into the background of our lives, they still play a pervasive role. We still need to examine how technologies might be affecting us, even if—especially if—they’re commonplace. 

For example, Waze’s navigation software has been influencing drivers’ behavior in the real world for years, algorithmically routing too many cars to residential streets and clogging them. The devices and apps from home-security company Ring have turned neighborhoods into panopticons in which your next door neighbor can become the subject of a notification. Connected medical devices can let an insurance company know if the patient isn’t using the device appropriately, allowing the insurer to stop covering the gadget. 

Using technology to create or reinforce social norms might seem benign or even beneficial, but it doesn’t hurt to ask which norms the technology is enforcing. Likewise, technologies that promise to save time might be saving time for some at the expense of others. Most important, how do we know if a new technology is serving a greater good or policy goal, or merely boosting a company’s profit margins? Underneath concerns about Amazon and Facebook and Google is an understanding that big tech is everywhere, and we have no idea how to make it work for society’s goals, rather than a company’s, or an individual’s. 

A big part of the problem is that we haven’t even established what those benefits should be. Let’s take the idea of legislating AI, or even computer-mediated decisions in general. Should we declare such technology illegal on its face? Many municipalities in the United States are trying to ban law enforcement from using facial-recognition software in order to identify individuals. Then again, the FBI has used it to find the people who participated in the 6 January insurrection at the U.S. Capitol. 

To complicate the issue further, it’s well established that facial recognition (and algorithms in general) are biased against Black faces and women’s faces. Personally, I don’t think the solution is to ban facial recognition outright. The European Union, for example, has proposed legislation to audit the outcomes of facial-recognition algorithms regularly to ensure policy goals are met. There’s no reason the United States and the rest of the world can’t do the same. 

And while some in the technology industry have called for the United States to create a separate regulatory body to govern AI, I think the country and policymakers are best served by the addition of offices and experts within existing agencies who can audit the various algorithms and determine if they help meet the agency’s goals. For example, the U.S. Justice Department could monitor, or even be in charge of approving, programs used to release people on bail to keep an eye out for potential bias. 

The United States already has a model of how this might work. The Federal Communications Commission relies on its Office of Engineering and Technology to help regulate the airwaves. Crucially, the office hires experts in the field rather than political appointees. The government can build the same infrastructure into other agencies that can handle scientific and technological inquiry on demand. Doing so would make the invisible visible again—and then we could all see and control the results of our technology.

This article appears in the July 2021 print issue as “Reckoning With Tech.”

AI-Piloted Fighter Jets

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/ai-piloted-fighter-jets.html

News from Georgetown’s Center for Security and Emerging Technology:

China Claims Its AI Can Beat Human Pilots in Battle: Chinese state media reported that an AI system had successfully defeated human pilots during simulated dogfights. According to the Global Times report, the system had shot down several PLA pilots during a handful of virtual exercises in recent years. Observers outside China noted that while reports coming out of state-controlled media outlets should be taken with a grain of salt, the capabilities described in the report are not outside the realm of possibility. Last year, for example, an AI agent defeated a U.S. Air Force F-16 pilot five times out of five as part of DARPA’s AlphaDogfight Trial (which we covered at the time). While the Global Times report indicated plans to incorporate AI into future fighter planes, it is not clear how far away the system is from real-world testing. At the moment, the system appears to be used only for training human pilots. DARPA, for its part, is aiming to test dogfights with AI-piloted subscale jets later this year and with full-scale jets in 2023 and 2024.

Amazon CodeGuru Reviewer Updates: New Java Detectors and CI/CD Integration with GitHub Actions

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/amazon_codeguru_reviewer_updates_new_java_detectors_and_cicd_integration_with_github_actions/

Amazon CodeGuru allows you to automate code reviews and improve code quality, and thanks to the new pricing model announced in April you can get started with a lower and fixed monthly rate based on the size of your repository (up to 90% less expensive). CodeGuru Reviewer helps you detect potential defects and bugs that are hard to find in your Java and Python applications, using the AWS Management Console, AWS SDKs, and AWS CLI.

Today, I’m happy to announce that CodeGuru Reviewer natively integrates with the tools that you use every day to package and deploy your code. This new CI/CD experience allows you to trigger code quality and security analysis as a step in your build process using GitHub Actions.

Although the CodeGuru Reviewer console still serves as an analysis hub for all your onboarded repositories, the new CI/CD experience allows you to integrate CodeGuru Reviewer more deeply with your favorite source code management and CI/CD tools.

And that’s not all! Today we’re also releasing 20 new security detectors for Java to help you identify even more issues related to security and AWS best practices.

A New CI/CD Experience for CodeGuru Reviewer
As a developer or development team, you push new code every day and want to identify security vulnerabilities early in the development cycle, ideally at every push. During a pull-request (PR) review, all the CodeGuru recommendations will appear as a comment, as if you had another pair of eyes on the PR. These comments include useful links to help you resolve the problem.

When you push new code or schedule a code review, recommendations will appear in the Security > Code scanning alerts tab on GitHub.

Let’s see how to integrate CodeGuru Reviewer with GitHub Actions.

First of all, create a .yml file in your repository under .github/workflows/ (or update an existing action). This file will contain all your actions’ step. Let’s go through the individual steps.

The first step is configuring your AWS credentials. You want to do this securely, without storing any credentials in your repository’s code, using the Configure AWS Credentials action. This action allows you to configure an IAM role that GitHub will use to interact with AWS services. This role will require a few permissions related to CodeGuru Reviewer and Amazon S3. You can attach the AmazonCodeGuruReviewerFullAccess managed policy to the action role, in addition to s3:GetObject, s3:PutObject and s3:ListBucket.

This first step will look as follows:

- name: Configure AWS Credentials
  uses: aws-actions/[email protected]
    aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
    aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    aws-region: eu-west-1

These access key and secret key correspond to your IAM role and will be used to interact with CodeGuru Reviewer and Amazon S3.

Next, you add the CodeGuru Reviewer action and a final step to upload the results:

- name: Amazon CodeGuru Reviewer Scanner
  uses: aws-actions/codeguru-reviewer
  if: ${{ always() }} 
    build_path: target # build artifact(s) directory
    s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
- name: Upload review result
  if: ${{ always() }}
  uses: github/codeql-action/[email protected]
    sarif_file: codeguru-results.sarif.json

The CodeGuru Reviewer action requires two input parameters:

  • build_path: Where your build artifacts are in the repository.
  • s3_bucket: The name of an S3 bucket that you’ve created previously, used to upload the build artifacts and analysis results. It’s a customer-owned bucket so you have full control over access and permissions, in case you need to share its content with other systems.

Now, let’s put all the pieces together.

Your .yml file should look like this:

name: CodeGuru Reviewer GitHub Actions Integration
on: [pull_request, push, schedule]
    runs-on: ubuntu-latest
      - name: Configure AWS Credentials
        uses: aws-actions/[email protected]
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-2
	  - name: Amazon CodeGuru Reviewer Scanner
        uses: aws-actions/codeguru-reviewer
        if: ${{ always() }} 
          build_path: target # build artifact(s) directory
          s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
      - name: Upload review result
        if: ${{ always() }}
        uses: github/codeql-action/[email protected]
          sarif_file: codeguru-results.sarif.json

It’s important to remember that the S3 bucket name needs to start with codeguru_reviewer- and that these actions can be configured to run with the pull_request, push, or schedule triggers (check out the GitHub Actions documentation for the full list of events that trigger workflows). Also keep in mind that there are minor differences in how you configure GitHub-hosted runners and self-hosted runners, mainly in the credentials configuration step. For example, if you run your GitHub Actions in a self-hosted runner that already has access to AWS credentials, such as an EC2 instance, then you don’t need to provide any credentials to this action (check out the full documentation for self-hosted runners).

Now when you push a change or open a PR CodeGuru Reviewer will comment on your code changes with a few recommendations.

Or you can schedule a daily or weekly repository scan and check out the recommendations in the Security > Code scanning alerts tab.

New Security Detectors for Java
In December last year, we launched the Java Security Detectors for CodeGuru Reviewer to help you find and remediate potential security issues in your Java applications. These detectors are built with machine learning and automated reasoning techniques, trained on over 100,000 Amazon and open-source code repositories, and based on the decades of expertise of the AWS Application Security (AppSec) team.

For example, some of these detectors will look at potential leaks of sensitive information or credentials through excessively verbose logging, exception handling, and storing passwords in plaintext in memory. The security detectors also help you identify several web application vulnerabilities such as command injection, weak cryptography, weak hashing, LDAP injection, path traversal, secure cookie flag, SQL injection, XPATH injection, and XSS (cross-site scripting).

The new security detectors for Java can identify security issues with the Java Servlet APIs and web frameworks such as Spring. Some of the new detectors will also help you with security best practices for AWS APIs when using services such as Amazon S3, IAM, and AWS Lambda, as well as libraries and utilities such as Apache ActiveMQ, LDAP servers, SAML parsers, and password encoders.

Available Today at No Additional Cost
The new CI/CD integration and security detectors for Java are available today at no additional cost, excluding the storage on S3 which can be estimated based on size of your build artifacts and the frequency of code reviews. Check out the CodeGuru Reviewer Action in the GitHub Marketplace and the Amazon CodeGuru pricing page to find pricing examples based on the new pricing model we launched last month.

We’re looking forward to hearing your feedback, launching more detectors to help you identify potential issues, and integrating with even more CI/CD tools in the future.

You can learn more about the CI/CD experience and configuration in the technical documentation.


Amazon SageMaker Named as the Outright Leader in Enterprise MLOps Platforms

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms/

Over the last few years, Machine Learning (ML) has proven its worth in helping organizations increase efficiency and foster innovation. As ML matures, the focus naturally shifts from experimentation to production. ML processes need to be streamlined, standardized, and automated to build, train, deploy, and manage models in a consistent and reliable way. Perennial IT concerns such as security, high availability, scaling, monitoring, and automation also become critical. Great ML models are not going to do much good if they can’t serve fast and accurate predictions to business applications, 24/7 and at any scale.

In November 2017, we launched Amazon SageMaker to help ML Engineers and Data Scientists not only build the best models, but also operate them efficiently. Striving to give our customers the most comprehensive service, we’ve since then added hundreds of features covering every step of the ML lifecycle, such as data labeling, data preparation, feature engineering, bias detection, AutoML, training, tuning, hosting, explainability, monitoring, and automation. We’ve also integrated these features in our web-based development environment, Amazon SageMaker Studio.

Thanks to the extensive ML capabilities available in SageMaker, tens of thousands of AWS customers across all industry segments have adopted ML to accelerate business processes, create innovative user experiences, improve revenue, and reduce costs. Examples include Engie (energy), Deliveroo (food delivery), SNCF (railways), Nerdwallet (financial services), Autodesk (computer-aided design), Formula 1 (auto racing), as well as our very own Amazon Fulfillment Technologies and Amazon Robotics.

Today, we’re happy to announce that in his latest report on Enterprise MLOps Platforms, Bradley Shimmin, Chief Analyst at Omdia, paid SageMaker this compliment: “AWS is the outright leader in the Omdia comparative review of enterprise MLOps platforms. Across almost every measure, the company significantly outscored its rivals, delivering consistent value across the entire ML lifecycle. AWS delivers highly differentiated functionality that targets highly impactful areas of concern for enterprise AI practitioners seeking to not just operationalize but also scale AI across the business.


You can download the full report to learn more.

Getting Started
Curious about Amazon SageMaker? The developer guide will show you how to set it up and start running your notebooks in minutes.

As always, we look forward to your feedback. You can send it through your usual AWS Support contacts or post it on the AWS Forum for Amazon SageMaker.

– Julien

Amazon Redshift ML Is Now Generally Available – Use SQL to Create Machine Learning Models and Make Predictions from Your Data

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-redshift-ml-is-now-generally-available-use-sql-to-create-machine-learning-models-and-make-predictions-from-your-data/

With Amazon Redshift, you can use SQL to query and combine exabytes of structured and semi-structured data across your data warehouse, operational databases, and data lake. Now that AQUA (Advanced Query Accelerator) is generally available, you can improve the performance of your queries by up to 10 times with no additional costs and no code changes. In fact, Amazon Redshift provides up to three times better price/performance than other cloud data warehouses.

But what if you want to go a step further and process this data to train machine learning (ML) models and use these models to generate insights from data in your warehouse? For example, to implement use cases such as forecasting revenue, predicting customer churn, and detecting anomalies? In the past, you would need to export the training data from Amazon Redshift to an Amazon Simple Storage Service (Amazon S3) bucket, and then configure and start a machine learning training process (for example, using Amazon SageMaker). This process required many different skills and usually more than one person to complete. Can we make it easier?

Today, Amazon Redshift ML is generally available to help you create, train, and deploy machine learning models directly from your Amazon Redshift cluster. To create a machine learning model, you use a simple SQL query to specify the data you want to use to train your model, and the output value you want to predict. For example, to create a model that predicts the success rate for your marketing activities, you define your inputs by selecting the columns (in one or more tables) that include customer profiles and results from previous marketing campaigns, and the output column you want to predict. In this example, the output column could be one that shows whether a customer has shown interest in a campaign.

After you run the SQL command to create the model, Redshift ML securely exports the specified data from Amazon Redshift to your S3 bucket and calls Amazon SageMaker Autopilot to prepare the data (pre-processing and feature engineering), select the appropriate pre-built algorithm, and apply the algorithm for model training. You can optionally specify the algorithm to use, for example XGBoost.

Architectural diagram.

Redshift ML handles all of the interactions between Amazon Redshift, S3, and SageMaker, including all the steps involved in training and compilation. When the model has been trained, Redshift ML uses Amazon SageMaker Neo to optimize the model for deployment and makes it available as a SQL function. You can use the SQL function to apply the machine learning model to your data in queries, reports, and dashboards.

Redshift ML now includes many new features that were not available during the preview, including Amazon Virtual Private Cloud (VPC) support. For example:

Architectural diagram.

  • You can also create SQL functions that use existing SageMaker endpoints to make predictions (remote inference). In this case, Redshift ML is batching calls to the endpoint to speed up processing.

Before looking into how to use these new capabilities in practice, let’s see the difference between Redshift ML and similar features in AWS databases and analytics services.

ML Feature Data Training
from SQL
using SQL Functions
Amazon Redshift ML

Data warehouse

Federated relational databases

S3 data lake (with Redshift Spectrum)

Yes, using
Amazon SageMaker Autopilot
Yes, a model can be imported and executed inside the Amazon Redshift cluster, or invoked using a SageMaker endpoint.
Amazon Aurora ML Relational database
(compatible with MySQL or PostgreSQL)

Yes, using a SageMaker endpoint.

A native integration with Amazon Comprehend for sentiment analysis is also available.

Amazon Athena ML

S3 data lake

Other data sources can be used through Athena Federated Query.

No Yes, using a SageMaker endpoint.

Building a Machine Learning Model with Redshift ML
Let’s build a model that predicts if customers will accept or decline a marketing offer.

To manage the interactions with S3 and SageMaker, Redshift ML needs permissions to access those resources. I create an AWS Identity and Access Management (IAM) role as described in the documentation. I use RedshiftML for the role name. Note that the trust policy of the role allows both Amazon Redshift and SageMaker to assume the role to interact with other AWS services.

From the Amazon Redshift console, I create a cluster. In the cluster permissions, I associate the RedshiftML IAM role. When the cluster is available, I load the same dataset used in this super interesting blog post that my colleague Julien wrote when SageMaker Autopilot was announced.

The file I am using (bank-additional-full.csv) is in CSV format. Each line describes a direct marketing activity with a customer. The last column (y) describes the outcome of the activity (if the customer subscribed to a service that was marketed to them).

Here are the first few lines of the file. The first line contains the headers.

age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y 56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no

I store the file in one of my S3 buckets. The S3 bucket is used to unload data and store SageMaker training artifacts.

Then, using the Amazon Redshift query editor in the console, I create a table to load the data.

CREATE TABLE direct_marketing (
	education VARCHAR NOT NULL, 
	credit_default VARCHAR NOT NULL, 
	day_of_week VARCHAR NOT NULL, 
	duration DECIMAL NOT NULL, 
	campaign DECIMAL NOT NULL, 
	previous DECIMAL NOT NULL, 
	poutcome VARCHAR NOT NULL, 
	emp_var_rate DECIMAL NOT NULL, 
	cons_price_idx DECIMAL NOT NULL, 
	cons_conf_idx DECIMAL NOT NULL, 
	euribor3m DECIMAL NOT NULL, 
	nr_employed DECIMAL NOT NULL, 

I load the data into the table using the COPY command. I can use the same IAM role I created earlier (RedshiftML) because I am using the same S3 bucket to import and export the data.

COPY direct_marketing 
FROM 's3://my-bucket/direct_marketing/bank-additional-full.csv' 
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
REGION 'us-east-1';

Now, I create the model straight form the SQL interface using the new CREATE MODEL statement:

CREATE MODEL direct_marketing
FROM direct_marketing
FUNCTION predict_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
  S3_BUCKET 'my-bucket'

In this SQL command, I specify the parameters required to create the model:

  • FROM – I select all the rows in the direct_marketing table, but I can replace the name of the table with a nested query (see example below).
  • TARGET – This is the column that I want to predict (in this case, y).
  • FUNCTION – The name of the SQL function to make predictions.
  • IAM_ROLE – The IAM role assumed by Amazon Redshift and SageMaker to create, train, and deploy the model.
  • S3_BUCKET – The S3 bucket where the training data is temporarily stored, and where model artifacts are stored if you choose to retain a copy of them.

Here I am using a simple syntax for the CREATE MODEL statement. For more advanced users, other options are available, such as:

  • MODEL_TYPE – To use a specific model type for training, such as XGBoost or multilayer perceptron (MLP). If I don’t specify this parameter, SageMaker Autopilot selects the appropriate model class to use.
  • PROBLEM_TYPE – To define the type of problem to solve: regression, binary classification, or multiclass classification. If I don’t specify this parameter, the problem type is discovered during training, based on my data.
  • OBJECTIVE – The objective metric used to measure the quality of the model. This metric is optimized during training to provide the best estimate from data. If I don’t specify a metric, the default behavior is to use mean squared error (MSE) for regression, the F1 score for binary classification, and accuracy for multiclass classification. Other available options are F1Macro (to apply F1 scoring to multiclass classification) and area under the curve (AUC). More information on objective metrics is available in the SageMaker documentation.

Depending on the complexity of the model and the amount of data, it can take some time for the model to be available. I use the SHOW MODEL command to see when it is available:

SHOW MODEL direct_marketing

When I execute this command using the query editor in the console, I get the following output:

Console screenshot.

As expected, the model is currently in the TRAINING state.

When I created this model, I selected all the columns in the table as input parameters. I wonder what happens if I create a model that uses fewer input parameters? I am in the cloud and I am not slowed down by limited resources, so I create another model using a subset of the columns in the table:

CREATE MODEL simple_direct_marketing
        SELECT age, job, marital, education, housing, contact, month, day_of_week, y
 	  FROM direct_marketing
FUNCTION predict_simple_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
  S3_BUCKET 'my-bucket'

After some time, my first model is ready, and I get this output from SHOW MODEL. The actual output in the console is in multiple pages, I merged the results here to make it easier to follow:

Console screenshot.

From the output, I see that the model has been correctly recognized as BinaryClassification, and F1 has been selected as the objective. The F1 score is a metrics that considers both precision and recall. It returns a value between 1 (perfect precision and recall) and 0 (lowest possible score). The final score for the model (validation:f1) is 0.79. In this table I also find the name of the SQL function (predict_direct_marketing) that has been created for the model, its parameters and their types, and an estimation of the training costs.

When the second model is ready, I compare the F1 scores. The F1 score of the second model is lower (0.66) than the first one. However, with fewer parameters the SQL function is easier to apply to new data. As is often the case with machine learning, I have to find the right balance between complexity and usability.

Using Redshift ML to Make Predictions
Now that the two models are ready, I can make predictions using SQL functions. Using the first model, I check how many false positives (wrong positive predictions) and false negatives (wrong negative predictions) I get when applying the model on the same data used for training:

SELECT predict_direct_marketing, y, COUNT(*)
  FROM (SELECT predict_direct_marketing(
                   age, job, marital, education, credit_default, housing,
                   loan, contact, month, day_of_week, duration, campaign,
                   pdays, previous, poutcome, emp_var_rate, cons_price_idx,
                   cons_conf_idx, euribor3m, nr_employed), y
          FROM direct_marketing)
 GROUP BY predict_direct_marketing, y;

The result of the query shows that the model is better at predicting negative rather than positive outcomes. In fact, even if the number of true negatives is much bigger than true positives, there are much more false positives than false negatives. I added some comments in green and red to the following screenshot to clarify the meaning of the results.

Console screenshot.

Using the second model, I see how many customers might be interested in a marketing campaign. Ideally, I should run this query on new customer data, not the same data I used for training.

  FROM direct_marketing
 WHERE predict_simple_direct_marketing(
           age, job, marital, education, housing,
           contact, month, day_of_week) = true;

Wow, looking at the results, there are more than 7,000 prospects!

Console screenshot.

Availability and Pricing
Redshift ML is available today in the following AWS Regions: US East (Ohio), US East (N Virginia), US West (Oregon), US West (San Francisco), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm), Asia Pacific (Hong Kong) Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and South America (São Paulo). For more information, see the AWS Regional Services list.

With Redshift ML, you pay only for what you use. When training a new model, you pay for the Amazon SageMaker Autopilot and S3 resources used by Redshift ML. When making predictions, there is no additional cost for models imported into your Amazon Redshift cluster, as in the example I used in this post.

Redshift ML also allows you to use existing Amazon SageMaker endpoints for inference. In that case, the usual SageMaker pricing for real-time inference applies. Here you can find a few tips on how to control your costs with Redshift ML.

To learn more, you can see this blog post from when Redshift ML was announced in preview and the documentation.

Start getting better insights from your data with Redshift ML.


Forwarding emails automatically based on content with Amazon Simple Email Service

Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/forwarding-emails-automatically-based-on-content-with-amazon-simple-email-service/


Email is one of the most popular channels consumers use to interact with support organizations. In its most basic form, consumers will send their email to a catch-all email address where it is further dispatched to the correct support group. Often, this requires a person to inspect content manually. Some IT organizations even have a dedicated support group that handles triaging the incoming emails before assigning them to specialized support teams. Triaging each email can be challenging, and delays in email routing and support processes can reduce customer satisfaction. By utilizing Amazon Simple Email Service’s deep integration with Amazon S3, AWS Lambda, and other AWS services, the task of categorizing and routing emails is automated. This automation results in increased operational efficiencies and reduced costs.

This blog post shows you how a serverless application will receive emails with Amazon SES and deliver them to an Amazon S3 bucket. The application uses Amazon Comprehend to identify the dominant language from the message body.  It then looks it up in an Amazon DynamoDB table to find the support group’s email address specializing in the email subject. As the last step, it forwards the email via Amazon SES to its destination. Archiving incoming emails to Amazon S3 also enables further processing or auditing.


By completing the steps in this post, you will create a system that uses the architecture illustrated in the following image:

Architecture showing how to forward emails by content using Amazon SES

The flow of events starts when a customer sends an email to the generic support email address like [email protected]. This email is listened to by Amazon SES via a recipient rule. As per the rule, incoming messages are written to a specified Amazon S3 bucket with a given prefix.

This bucket and prefix are configured with S3 Events to trigger a Lambda function on object creation events. The Lambda function reads the email object, parses the contents, and sends them to Amazon Comprehend for language detection.

Amazon DynamoDB looks up the detected language code from an Amazon DynamoDB table, which includes the mappings between language codes and support group email addresses for these languages. One support group could answer English emails, while another support group answers French emails. The Lambda function determines the destination address and re-sends the same email address by performing an email forward operation. Suppose the lookup does not return any destination address, or the language was not be detected. In that case, the email is forwarded to a catch-all email address specified during the application deployment.

In this example, Amazon SES hosts the destination email addresses used for forwarding, but this is not a requirement. External email servers will also receive the forwarded emails.


To use Amazon SES for receiving email messages, you need to verify a domain that you own. Refer to the documentation to verify your domain with Amazon SES console. If you do not have a domain name, you will register one from Amazon Route 53.

Deploying the Sample Application

Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS Identity and Access Management (IAM) user.

You will use AWS SAM to deploy the remaining parts of this serverless architecture.

The AWS SAM template creates the following resources:

  • An Amazon DynamoDB mapping table (language-lookup) contains information about language codes and associates them with destination email addresses.
  • An AWS Lambda function (BlogEmailForwarder) that reads the email content parses it, detects the language, looks up the forwarding destination email address, and sends it.
  • An Amazon S3 bucket, which will store the incoming emails.
  • IAM roles and policies.

To start the AWS SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. You will refer to the documentation to learn how to create an Amazon S3 bucket. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

At the command line, enter the following command to package the application:

sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

AWS SAM packages the application and copies it into this Amazon S3 bucket.

When the AWS SAM package command finishes running, enter the following command to deploy the package:

sam deploy --template-file output_template.yaml --stack-name blogstack --capabilities CAPABILITY_IAM --parameter-overrides [email protected] YOUR_DOMAIN_NAME_HERE [email protected] YOUR_DOMAIN_NAME_HERE

In the preceding command, change the YOUR_DOMAIN_NAME_HERE with the domain name you validated with Amazon SES. This domain also applies to other commands and configurations that will be introduced later.

This example uses “blogstack” as the stack name, you will change this to any other name you want. When you run this command, AWS SAM shows the progress of the deployment.

Configure the Sample Application

Now that you have deployed the application, you will configure it.

Configuring Receipt Rules

To deliver incoming messages to Amazon S3 bucket, you need to create a Rule Set and a Receipt rule under it.

Note: This blog uses Amazon SES console to create the rule sets. To create the rule sets with AWS CloudFormation, refer to the documentation.

  1. Navigate to the Amazon SES console. From the left navigation choose Rule Sets.
  2. Choose Create a Receipt Rule button at the right pane.
  3. Add [email protected]YOUR_DOMAIN_NAME_HERE as the first recipient addresses by entering it into the text box and choosing Add Recipient.



Choose the Next Step button to move on to the next step.

  1. On the Actions page, select S3 from the Add action drop-down to reveal S3 action’s details. Select the S3 bucket that was created by the AWS SAM template. It is in the format of your_stack_name-inboxbucket-randomstring. You will find the exact name in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console. Set the Object key prefix to info/. This tells Amazon SES to add this prefix to all messages destined to this recipient address. This way, you will re-use the same bucket for different recipients.

Choose the Next Step button to move on to the next step.

In the Rule Details page, give this rule a name at the Rule name field. This example uses the name info-recipient-rule. Leave the rest of the fields with their default values.

Choose the Next Step button to move on to the next step.

  1. Review your settings on the Review page and finalize rule creation by choosing Create Rule

  1. In this example, you will be hosting the destination email addresses in Amazon SES rather than forwarding the messages to an external email server. This way, you will be able to see the forwarded messages in your Amazon S3 bucket under different prefixes. To host the destination email addresses, you need to create different rules under the default rule set. Create three additional rules for [email protected]YOUR_DOMAIN_NAME_HERE , [email protected] YOUR_DOMAIN_NAME_HERE and [email protected]YOUR_DOMAIN_NAME_HERE email addresses by repeating the steps 2 to 5. For Amazon S3 prefixes, use catchall/, english/, and french/ respectively.


Configuring Amazon DynamoDB Table

To configure the Amazon DynamoDB table that is used by the sample application

  1. Navigate to Amazon DynamoDB console and reach the tables view. Inspect the table created by the AWS SAM application.

language-lookup table is the table where languages and their support group mappings are kept. You need to create an item for each language, and an item that will hold the default destination email address that will be used in case no language match is found. Amazon Comprehend supports more than 60 different languages. You will visit the documentation for the supported languages and add their language codes to this lookup table to enhance this application.

  1. To start inserting items, choose the language-lookup table to open table overview page.
  2. Select the Items tab and choose the Create item From the dropdown, select Text. Add the following JSON content and choose Save to create your first mapping object. While adding the following object, replace Destination attribute’s value with an email address you own. The email messages will be forwarded to that address.


  “language”: “en”,

  “destination”: “[email protected]_DOMAIN_NAME_HERE”


Lastly, create an item for French language support.


  “language”: “fr”,

  “destination”: “[email protected]_DOMAIN_NAME_HERE”



Now that the application is deployed and configured, you will test it.

  1. Use your favorite email client to send the following email to the domain name [email protected] email address.

Subject: I need help


Hello, I’d like to return the shoes I bought from your online store. How can I do this?

After the email is sent, navigate to the Amazon S3 console to inspect the contents of the Amazon S3 bucket that is backing the Amazon SES Rule Sets. You will also see the AWS Lambda logs from the Amazon CloudWatch console to confirm that the Lambda function is triggered and run successfully. You should receive an email with the same content at the address you defined for the English language.

  1. Next, send another email with the same content, this time in French language.

Subject: j’ai besoin d’aide


Bonjour, je souhaite retourner les chaussures que j’ai achetées dans votre boutique en ligne. Comment puis-je faire ceci?


Suppose a message is not matched to a language in the lookup table. In that case, the Lambda function will forward it to the catchall email address that you provided during the AWS SAM deployment.

You will inspect the new email objects under english/, french/ and catchall/ prefixes to observe the forwarding behavior.

Continue experimenting with the sample application by sending different email contents to [email protected] YOUR_DOMAIN_NAME_HERE address or adding other language codes and email address combinations into the mapping table. You will find the available languages and their codes in the documentation. When adding a new language support, don’t forget to associate a new email address and Amazon S3 bucket prefix by defining a new rule.


To clean up the resources you used in your account,

  1. Navigate to the Amazon S3 console and delete the inbox bucket’s contents. You will find the name of this bucket in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console.
  2. Navigate to AWS CloudFormation console and delete the stack named “blogstack”.
  3. After the stack is deleted, remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
  4. From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
  5. On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
  6. On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
  7. Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the BlogEmailForwarderFunction resource and delete it by selecting it and choosing Actions, Delete log group(s).
  8. You will also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.



This solution shows how to use Amazon SES to classify email messages by the dominant content language and forward them to respective support groups. You will use the same techniques to implement similar scenarios. You will forward emails based on custom key entities, like product codes, or you will remove PII information from emails before forwarding with Amazon Comprehend.

With its native integrations with AWS services, Amazon SES allows you to enhance your email applications with different AWS Cloud capabilities easily.

To learn more about email forwarding with Amazon SES, you will visit documentation and AWS blogs.

The Algorithms That Make Instacart Roll

Post Syndicated from Sharath Rao original https://spectrum.ieee.org/artificial-intelligence/machine-learning/the-algorithms-that-make-instacart-roll

It’s Sunday morning, and, after your socially distanced morning hike, you look at your schedule for the next few days. You need to restock your refrigerator, but the weekend crowds at the supermarket don’t excite you. Monday and Tuesday are jam-packed with Zoom meetings, and you’ll also be supervising your children’s remote learning. In short, you aren’t going to make it to the grocery store anytime soon. So you pull out your phone, fire up the Instacart app, and select your favorite grocery store. You click through your list of previously purchased items, browse specials, search for a new key-lime sparkling water a friend recommended, then select a delivery window. About 2 hours later, you watch a shopper, wearing a face mask, place bags on your porch.

The transaction seems simple. But this apparent simplicity depends on a complex web of carefully choreographed technologies working behind the scenes, powered by a host of apps, data science, machine-learning algorithms, and human shoppers.

Grocery delivery isn’t a new concept, of course. In our great-grandparents’ day, people could select items at a neighborhood store and then walk home empty-handed, the groceries to follow later, likely transported by a teenager on a bicycle. Customers often had basics like milk and eggs delivered weekly. But with the advent of the fully stocked supermarket, with its broad selection of staples, produce, and specialty foods, customers shifted to selecting goods from store shelves and toting them home themselves, though in some cities local stores still offered delivery services.

Then in 1989, Peapod—followed in the mid-1990s by companies like Webvan and HomeGrocer—tried to revive grocery delivery for the Internet age. They invested heavily in sophisticated warehouses with automated inventory systems and fleets of delivery vehicles. While these services were adored by some for their high-quality products and short delivery windows, they never became profitable. Industry analysts concluded that the cost of building up delivery networks across dozens of large metro areas rapidly ate into the already thin margins of the grocery industry.

Timing, of course, is everything. Cloud computing and inexpensive smartphones emerged in the decade after the launch of the first-generation online grocery companies. By 2012, when Instacart began, these technologies had created an environment in which online grocery ordering could finally come into its own.

Today, retailers like Target and Whole Foods (via Amazon) offer delivery and pickup services, using their existing brick-and-mortar facilities. Some of these retailers run their delivery businesses from warehouses, some pull from the stocked shelves of retail stores, and some fulfill from a mix of both. Small, online-only companies like Good Eggs, Imperfect Foods, and Thrive Market offer curated selections of groceries sourced from local farms and suppliers.

Meanwhile, food and grocery delivery services emerged to bring brick-and-mortar restaurants and stores into the online economy. These businesses—which include DoorDash, Shipt, and Uber Eats in the United States, and Buymie, Deliveroo, and Grofers, based elsewhere—have built technology platforms and fulfillment networks that existing stores and restaurants can use to reach customers online. In this model, the retailer’s or restaurant’s physical location nearest the customer is the “warehouse,” and a community of independent contractors handles fulfillment and delivery.

Our employer, Instacart, is the North American leader in this type of online grocery service, with more than 500 grocers, including Aldi, Costco, Food Lion, Loblaws, Publix, Safeway, Sam’s Club, Sprouts Farmers Market, and Wegmans, encompassing nearly 40,000 physical store locations in the United States and Canada. At the onset of the COVID-19 pandemic, as consumers heeded stay-at-home orders, we saw our order volume surge by as much as 500 percent, compared with the volume during those same weeks in 2019. The increase prompted us to more than double the number of shoppers who can access the platform from 200,000 in early March to 500,000 by year-end.

Here’s how Instacart works.

From the customer’s perspective, the ordering process is simple. Customers start by opening a mobile app or logging on to a website. They enter their delivery zip code to see available retailers. After choosing a store or retail chain, they can browse virtual aisles of produce, deli, or snacks and search for specific products, clicking to add items to an online shopping cart and specifying weights or quantities as appropriate. When finished, they see a list of available 2-hour delivery windows, from later the same day to a week or more in the future. Customers can adjust their orders up until the shoppers start picking their items off store shelves, usually an hour or two before the delivery window. They can enter preferred substitutions beforehand or chat with their shoppers in real time about what’s available. Once the groceries are out of the store and on the move, customers get alerts when their orders are about to be delivered.

That’s Instacart from a customer’s perspective. Behind the scenes, we face huge technical challenges to make this process work. We have to keep track of the products in nearly 40,000 grocery stores—billions of different data points. We have to predict how many of our 500,000-plus shoppers will be online at a given time in a given area and available to work. We have to group multiple orders from different customers together into batches, so that the designated shopper can efficiently pick, pack, and deliver them. When products run out, we suggest the best replacements. Finally, we dispatch shoppers to different delivery addresses along the most efficient route. We’re crunching enormous volumes of data every day to keep the customer-facing Instacart app, our Shopper app, our business management tools, and other software all humming along.

Let’s start with how we keep track of products on the shelf. The average large supermarket has about 40,000 unique items. Our database includes the names of these products, plus images, descriptions, nutritional information, pricing, and close-to-real-time availability at every store. We process petabytes daily in order to keep these billions of data points current.

Back in 2012, when Instacart started making its first deliveries in the San Francisco Bay Area, we relied on manual methods to get this product data into our system. To stock our first set of virtual shelves, our founders and a handful of employees went to a store and purchased one of every item, filling up cart after cart. They took everything back to the office and entered the product data into the system by hand, taking photos with their phones. It worked, but it obviously wasn’t going to scale.

Today, Instacart aggregates product data from a variety of sources, relying on automated rule-based systems to sort it all out. Many stores send us inventory data once a day, including pricing and item availability, while other retailers send updates every few minutes. Large consumer products companies, like General Mills and Procter & Gamble, send us detailed product data, including images and descriptions. We also purchase specialized data from third-party companies, including nutrition and allergy information.

One listing in our database could have information from dozens of sources that must be sorted out. Let’s say a popular apple juice just underwent a rebranding, complete with new packaging. Our system has to decide whether to use the image provided by a third-party data provider last week, the image sent in by the local store last night, or the image submitted by the manufacturer earlier that morning.

Our rules address this problem. Usually images and other data provided by the manufacturer on the morning of a rebrand will be more up-to-date than data provided by individual stores the night before. But what if a store and the manufacturer upload data at about the same time? In this case, our rules tell the system to trust the image provided by the manufacturer and trust the price and availability data provided by the store. Our catalog updates automatically and continuously around the clock to account for all sorts of incremental changes—more than a billion data points every 24 hours on average.

Because Instacart doesn’t own and operate its own stores or warehouses, we don’t have a perfect picture of what is on the shelves of a particular store at any moment, much less what will be there later that day or several days in the future. Instead, we need to make well-informed predictions as we stock our virtual shelves. There’s a lot to consider here. Stores in certain regions may get produce shipments on, say, Monday mornings and meat shipments on Thursday evenings. Some stores restock their shelves periodically throughout the day, while others just restock at night. We’ve built two machine-learning models to help us understand what’s on each store’s shelves and manage our customers’ expectations about what they will actually receive in their grocery bags.

Our Item Availability Model predicts the likelihood that popular items are in stock at any location at any given time. We trained this model using our own data set, which includes millions of anonymized orders from across North America. Some items—like a particular brand of organic eggs, chips, or seasoned salt, or niche items like fresh-made stroopwafels—are considered “active,” meaning they’re regularly ordered year-round from a particular store. “Non-active” items include discontinued products as well as seasonal items like eggnog, Advent calendars, and Peeps marshmallows. The model looks at the history of how often our shoppers are able to purchase the items consumers order most. For each, it calculates an availability score ranging from 0.0 to 1.0; a score of 0.8 means the item has an 80 percent chance of being found in the store by a shopper. We can update this score in real time because our shoppers scan each item they pick up or else mark it as “not found.”

Having this score enables us to reduce the chances our customers will order items that won’t be on store shelves when our shoppers look for them, whether that’s a few hours away or days ahead. We do this in several ways. For example, if a customer’s favorite type of peanut butter has a very low availability score, we will automatically bump that listing down in the search results and, in turn, bump up similar products that have a higher availability score. In these times of supply-chain shortages, we’ll also add “out of stock” labels to affected items and prevent customers from adding them to their carts.

The COVID-19 pandemic pushed our Item Availability Model in a number of other ways and challenged our assumptions about customer behavior. In March 2020, at the start of the U.S. shelter-in-place orders, we saw massive spikes in demand for common household products like toilet paper and disinfecting wipes. Such items were flying off the shelves faster than retailers could stock them. Consumers behaved in new ways—instead of buying their preferred brand of toilet paper, they grabbed any kind of toilet paper they could find. In response, we broadened the number of products our availability model scores to include lesser-known products. We also tweaked the model to give less weight to historical data from several weeks earlier in favor of more recent data that might be more representative of the times.

If a customer adds an item with a low availability score to her cart, a second machine-learning model—our Item Replacement Recommendation Model—gets to work, prompting the customer to select a replacement from a menu of automatically generated alternatives in case the first-choice item isn’t in stock.

Giving customers great replacements is a critical part of making them happy. If you’re shopping in-store for yourself, our research suggests that you’ll have to find replacements for about 7 percent of the items on your list. When Instacart shoppers are shopping for you, they can’t just leave out an item—some items may be critical for you, and if you have to make your own trip to the store after unpacking an Instacart order, you might be less likely to use the service again. But our shoppers aren’t mind readers. If the store is out of your preferred brand of creamy peanut butter, should a shopper replace it with crunchy peanut butter from the same brand? What about a creamy peanut butter from a different brand?

We trained our Item Replacement Recommendation Model on a range of data inputs, including item name, product description, and five years of customer ratings of the success of our chosen replacements. When we present a menu of replacement choices, we rank them according to scores assigned by this model. If you select one of the replacements, we’ll remember it for your future orders; if you don’t, our shopper picks from products our model recommends.

That’s how machine learning helps us set expectations with our customers as they fill their shopping carts. Once an order is placed, another piece of technology enters the picture: the Shopper app. The vast majority of Instacart’s shoppers are independent contractors who have signed up to shop for us, meeting requirements and passing a background check. They drive to the stores, select items off the shelves, check out, and deliver the orders. They can choose to work at any time by logging onto the Shopper app. In certain high-volume stores, we also directly employ part-time shoppers who pick and pack orders and then hand them off to contractors for delivery.

The Shopper app includes a range of tools meant to make it easy to access new orders, address issues that shoppers encounter, and guide checkout and delivery. When shoppers are ready to work, they open up the app and select batches of orders. As they go through the store and fill orders, they can communicate with the customers via in-app chat. The Shopper app suggests an item-picking order to help the shopper navigate the store efficiently. Generally, this picking order puts refrigerated and frozen items, along with hot or fresh deli preparations, near the end of a shopping trip. Meanwhile, customers can watch their shopper’s progress via the Instacart app, tracking each item as it’s scanned into the shopper’s cart, approving replacement items, and viewing yet-to-be-shopped items.

When shoppers check out, they can charge the order to a physical card that Instacart mails to them or use a mobile payments system in the Shopper app. If they encounter a problem, they can communicate with our help team through the app. And when they complete a delivery, they can use the app to transfer their earnings to their bank accounts.

We have many orders coming in at once to the same stores, slated to be delivered in the same general vicinity. In a major metropolitan area, we may get more than 50 orders a minute. So we typically group orders into batches to be picked off store shelves at the same time.

Here, our Matching Algorithm comes into play. This technology applies rules of thumb and machine-learning models to try to balance the number of shoppers with customer demand in real time. The algorithm benefits from scale—the more orders we have in a given area, the more options we can give the algorithm and the better decisions it can make. It considers things like a shopper’s age: If shoppers are not yet 21, they may not be eligible to deliver orders containing alcohol. We rerun the Matching Algorithm as often as every few minutes as we get new information about orders and delivery locations.

The algorithm works hand in hand with our Capacity Model. This model calculates how much delivery capacity we have throughout the day as conditions on the ground change. We used machine learning to build this system; it takes demand predictions based on historical data and historical shopping speeds at individual stores and couples them with real-time data, including the number of shoppers completing orders and the number of orders waiting in a queue for each store. We rerun this model every 2 minutes to ensure that we’re getting a close-to-real-time understanding of our capacity. That’s why a customer may log on at 1:00 pm and see only one late-evening delivery slot remaining, but when they look again at 1:30 pm, they see a host of afternoon delivery slots pop up.

While these models are critical to Instacart’s operation, other tools are crucial for getting the groceries from the store to the customer smoothly and predictably.

Our Drive Time Model uses historical transit times and real-time traffic data to estimate when a shopper will arrive at the store. Our Parking Model calculates how long it can take the shopper to get in and out of a particular store’s parking lot. If a shopper is likely to spend 10 minutes cruising for a spot in a small, crowded parking lot, that needs to be built into delivery-time estimates for that store.

Once the shopper is ready to make deliveries, our Routing Algorithm comes into play. This model is our take on the classic “traveling salesman” problem. Given three customers at three different addresses in the same city, what’s the most efficient route from the store to the first location and from there to the next two? That’s tricky enough, but Instacart has to work with added complexity. For example, in highly dense areas like New York City, some shoppers may walk to their destinations. And we need to ensure that all three deliveries are made within their designated delivery windows—if a customer isn’t home, too early can be just as bad as too late. So our algorithm considers the projected arrival time, using real-time traffic conditions, to create a delivery route. Our system also sends the projected arrival time to the customer and an alert when the shopper is just a few minutes away.

All of our databases, machine-learning models, and geolocation technologies work in concert to build an efficient system. But no model is perfect.

And the COVID-19 pandemic proved to be an unexpected stress test for our systems. As stay-at-home orders rippled across North America, with more data flowing into the platform than ever before, we had to repeatedly reconfigure our databases and tools to keep up with the new demand. At the peak, we found ourselves making upgrades multiple times a week.

We also had to speed up the rollout of a new feature we had just started testing: Leave at My Door Delivery, which allows shoppers and customers to remain socially distant. Shoppers can drop groceries on the porch of a house or the reception or lobby area of an apartment building and send customers a photo of their completed orders at the site.

We are continually looking at ways to optimize our technology and operations. Right now, we are exploring how to improve the suggested picking orders in the Shopper app. Today we rely on a set of rule-based formulas guided by human intuition—for example, that it’s best to pick up fresh vegetables and fruit together, since they’re usually in the same section of the store. But not all stores have the same layout, aisles in a given store can be rearranged, and items may get moved around the store seasonally. We’re hoping we can use machine learning to develop an algorithm that determines such “rhythms” in the way a location should be shopped, based on historical item-picking data along with seasonal additions to store shelves and regular changes in store layouts.

As we add retailers and brands and serve more customers, our algorithms and technologies continue to evolve. We retrain all of our models over and over again to better reflect new activity on our platform. So the next time you click on the Instacart app and order groceries to get you through a busy week, know that anonymized data from your order and from your shopper will get fed into this feedback loop, informing the models we train and the technologies we build.

We are proud that our system has been able to keep groceries flowing to people across North America who have been sheltering at home during the pandemic, especially those who are particularly vulnerable to the novel coronavirus. These are extraordinary times, and we’ve taken our responsibility to serve our customers, shoppers, partners, and corporate employees very seriously, as well as to keep them safe. As the world continues to shop from home, we hope that our investments in machine learning will continue to make it easier for everyone to get access to the food they love and more time to enjoy it together.

About the Authors

Sharath Rao is director of machine learning at Instacart. Lily Zhang is Instacart’s director of software engineering.

AWS DeepRacer League’s 2021 Season Launches With New Open and Pro Divisions

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-deepracer-leagues-2021-season-launches-with-new-open-and-pro-divisions/

AWS DeepRacer League LogoAs a developer, I have been hearing a lot of stories lately about how companies have solved their business problems using machine learning (ML), so one of my goals for 2021 is to learn more about it.

For the last few years I have been using artificial intelligence (AI) services such as, Amazon Rekognition, Amazon Comprehend, and others extensively. AI services provide a simple API to solve common ML problems such as image recognition, text to speech, and analysis of sentiment in a text. When using these high-level APIs, you don’t need to understand how the underlying ML model works, nor do you have to train it, or maintain it in any way.

Even though those services are great and I can solve most of my business cases with them, I want to understand how ML algorithms work, and that is how I started tinkering with AWS DeepRacer.

AWS DeepRacer, a service that helps you learn reinforcement learning (RL), has been around since 2018. RL is an advanced ML technique that takes a very different approach to training models than other ML methods. Basically, it can learn very complex behavior without requiring any labeled training data, and it can make short-term decisions while optimizing for a long-term goal.

AWS DeepRacer is an autonomous 1/18th scale race car designed to test RL models by racing virtually in the AWS DeepRacer console or physically on a track at AWS and customer events. AWS DeepRacer is for developers of all skill levels, even if you don’t have any ML experience. When learning RL using AWS DeepRacer, you can take part in the AWS DeepRacer League where you get experience with machine learning in a fun and competitive environment.

Over the past year, the AWS DeepRacer League’s races have gone completely virtual and participants have competed for different kinds of prizes. However, the competition has become dominated by experts and newcomers haven’t had much of a chance to win.

The 2021 season introduces new skill-based Open and Pro racing divisions, where racers of all skill levels have five times more opportunities to win rewards than in previous seasons.

Image of the leagues in the console

How the New AWS DeepRacer Racing Divisions Work

The 2021 AWS DeepRacer league runs from March 1 through the end of October. When it kicks off, all participants will enter the Open division, a place to have fun and develop your RL knowledge with other community members.

At the end of every month, the top 10% of the Open division leaderboard will advance to the Pro division for the remainder of the season; they’ll also receive a Pro Welcome kit full of AWS DeepRacer swag. Pro division racers can win DeepRacer Evo cars and AWS DeepRacer merchandise such as hats and T-shirts.

At the end of every month, the top 16 racers in the Pro division will compete against each other in a live race console. That race will determine who will advance that month to the 2021 Championship Cup at re:Invent 2021.

The monthly Pro division winner gets an expenses-paid trip to re:Invent 2021 and participates in the Championship Cup to get a chance to win a Machine Learning education sponsorship worth $20k.

In both divisions, you can collect digital rewards, including vehicle customizations and accessories which will be released to participants once the winners are announced each month. 

You can start racing in the Open division any time during the 2021 season. Get started here!

Image of my racer profileNew Racer Profiles Increase the Fun

At the end of March, you will be able to create a new racer profile with an avatar and show the world which country you are representing.

I hope to see you in the new AWS DeepRacer season, where I’ll start in the Open division as MaVi.

Start racing today and train your first model for free! 


Improving the CPU and latency performance of Amazon applications using AWS CodeGuru Profiler

Post Syndicated from Neha Gupta original https://aws.amazon.com/blogs/devops/improving-the-cpu-and-latency-performance-of-amazon-applications-using-aws-codeguru-profiler/

Amazon CodeGuru Profiler is a developer tool powered by machine learning (ML) that helps identify an application’s most expensive lines of code and provides intelligent recommendations to optimize it. You can identify application performance issues and troubleshoot latency and CPU utilization issues in your application.

You can use CodeGuru Profiler to optimize performance for any application running on AWS Lambda, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), AWS Fargate, or AWS Elastic Beanstalk, and on premises.

This post gives a high-level overview of how CodeGuru Profiler has reduced CPU usage and latency by approximately 50% and saved around $100,000 a year for a particular Amazon retail service.

Technical and business value of CodeGuru Profiler

CodeGuru Profiler is easy and simple to use, just turn it on and start using it. You can keep it running in the background and you can just look into the CodeGuru Profiler findings and implement the relevant changes.

It’s fairly low cost and unlike traditional tools that take up lot of CPU and RAM, running CodeGuru Profiler has less than 1% impact on total CPU usage overhead to applications and typically uses no more than 100 MB of memory.

You can run it in a pre-production environment to test changes to ensure no impact occurs on your application’s key metrics.

It automatically detects performance anomalies in the application stack traces that start consuming more CPU or show increased latency. It also provides visualizations and recommendations on how to fix performance issues and the estimated cost of running inefficient code. Detecting the anomalies early prevents escalating the issue in production. This helps you prioritize remediation by giving you enough time to fix the issue before it impacts your service’s availability and your customers’ experience.

How we used CodeGuru Profiler at Amazon

Amazon has on-boarded many of its applications to CodeGuru Profiler, which has resulted in an annual savings of millions of dollars and latency improvements. In this post, we discuss how we used CodeGuru Profiler on an Amazon Prime service. A simple code change resulted in saving around $100,000 for the year.

Opportunity to improve

After a change to one of our data sources that caused its payload size to increase, we expected a slight increase to our service latency, but what we saw was higher than expected. Because CodeGuru Profiler is easy to integrate, we were able to quickly make and deploy the changes needed to get it running on our production environment.

After loading up the profile in Amazon CodeGuru Profiler, it was immediately apparent from the visualization that a very large portion of the service’s CPU time was being taken up by Jackson deserialization (37%, across the two call sites). It was also interesting that most of the blocking calls in the program (in blue) was happening in the jackson.databind method _createAndCacheValueDeserializer.

Flame graphs represent the relative amount of time that the CPU spends at each point in the call graph. The wider it is, the more CPU usage it corresponds to.

The following flame graph is from before the performance improvements were implemented.

The Flame Graph before the deployment

Looking at the source for _createAndCacheValueDeserializer confirmed that there was a synchronized block. From within it, _createAndCache2 was called, which actually did the adding to the cache. Adding to the cache was guarded by a boolean condition which had a comment that indicated that caching would only be enabled for custom serializers if @JsonCachable was set.


Checking the documentation for @JsonCachable confirmed that this annotation looked like the correct solution for this performance issue. After we deployed a quick change to add @JsonCachable to our four custom deserializers, we observed that no visible time was spent in _createAndCacheValueDeserializer.


Adding a one-line annotation in four different places made the code run twice as fast. Because it was holding a lock while it recreated the same deserializers for every call, this was allowing only one of the four CPU cores to be used and therefore causing latency and inefficiency. Reusing the deserializers avoided repeated work and saved us lot of resources.

After the CodeGuru Profiler recommendations were implemented, the amount of CPU spent in Jackson reduced from 37% to 5% across the two call paths, and there was no visible blocking. With the removal of the blocking, we could run higher load on our hosts and reduce the fleet size, saving approximately $100,000 a year in Amazon EC2 costs, thereby resulting in overall savings.

The following flame graph shows performance after the deployment.

The Flame Graph after the deployment


The following graph shows that CPU usage reduced by almost 50%. The blue line shows the CPU usage the week before we implemented CodeGuru Profiler recommendations, and green shows the dropped usage after deploying. We could later safely scale down the fleet to reduce costs, while still having better performance than prior to the change.

Average Fleet CPU Utilization


The following graph shows the server latency, which also dropped by almost 50%. The latency dropped from 100 milliseconds to 50 milliseconds as depicted in the initial portion of the graph. The orange line depicts p99, green p99.9, and blue p50 (mean latency).

Server Latency



With a few lines of changed code and a half-hour investigation, we removed the bottleneck which led to lower utilization of resources and  thus we were able to decrease the fleet size. We have seen many similar cases, and in one instance, a change of literally six characters of inefficient code, reduced CPU usage from 99% to 5%.

Across Amazon, CodeGuru Profiler has been used internally among various teams and resulted in millions of dollars of savings and performance optimization. You can use CodeGuru Profiler for quick insights into performance issues of your application. The more efficient the code and application is, the less costly it is to run. You can find potential savings for any application running in production and significantly reduce infrastructure costs using CodeGuru Profiler. Reducing fleet size, latency, and CPU usage is a major win.



About the Authors

Neha Gupta

Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.

Ian Clark

Ian is a Senior Software engineer with the Last Mile organization at Amazon. In his spare time, he enjoys exploring the Vancouver area with his family.

AI Agents Play “Hide the Toilet Plunger” to Learn Deep Concepts About Life

Post Syndicated from Eliza Strickland original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/ai-agent-learns-about-the-world-by-gameplay

Most papers about artificial intelligence don’t cite Jean Piaget, the social scientist known for his groundbreaking studies of children’s cognitive development in the 1950s. But there he is, in a paper from the Allen Institute for AI (AI2). The researchers state that their AI agents learned the concept of object permanence—the understanding that an object hidden from view is still there—thus making those AI agents similar to a baby who just figured out the trick behind peekaboo. 

The researchers’ AI agents learned this precept and other rudimentary rules about the world by playing many games of hide and seek with objects, which took place within a simulated, but fairly realistic, house. The AI2 team calls the game “Cache,” but I prefer to call it “Hide the Toilet Plunger.” The agents also got to hide tomatoes, loaves of bread, cups, and knives.

The AI agents, which acted as both hiders and seekers, figured out the game via reinforcement learning. Starting out, they didn’t know anything about the 3D visual environment. They began by taking random actions like pulling on the handle of a drawer or pulling on an immovable wall, and they dropped their objects in all sorts of random places. The agents got better by playing against each other and learning from outcomes—if the seeker didn’t find the tomato, the hider knew it had chosen a good hiding place. 

The paper, which was recently accepted for the 2021 International Conference on Learning Representations, hasn’t yet been published in a peer reviewed journal.

Unlike many projects concerning AI and gameplay, the point here wasn’t to create an AI super-player that could destroy puny humans. Rather, the researchers wanted to see if an AI agent could achieve a more generalized kind of visual intelligence if it learned about the world via gameplay.

“For us, the question was: Can it learn very basic things about objects and their attributes by interacting with them?” says Aniruddha Kembhavi, a research manager with AI2’s computer vision team and a paper coauthor.

This AI2 team is working on representation learning, in which AI systems are given some input—images, audio, text, etc.—and learn to categorize the data according to its features. In computer vision, for example, an AI system might learn the features that represent a cat or a traffic light. Ideally, though, it doesn’t learn only the categories, it also learns how to categorize data, making it useful even when given images of objects it has never before seen.

Visual representation learning has evolved over the past decade, Kembhavi explains. When deep learning took off, researchers first trained AI systems on databases of labeled images, such as the famous ImageNet. Because the labels enable the AI system to check its work, that technique is called supervised learning. “Then in past few years, the buzz has gone from supervised learning to self-supervised learning,” says Kembhavi, in which AI systems have to determine the labels for themselves. “We believe that an even more general way of doing it is gameplay—we just let the agents play around, and they figure it out.” 

Once the AI2 agents had gotten good at the game, the researchers ran them through a variety of tests designed to test their understanding of the world. They first tested them on computer-generated images of rooms, asking them to predict traits such as depth of field and the geometry of objects. When compared to a model trained on the gold-standard ImageNet, the AI2 agents performed as well or better. They also tested them on photographs of real rooms; while they didn’t do as well as the ImageNet-trained model there, they did better than expected—an important indication that training in simulated environments could produce AI systems that function in the real world. 

The tests that really excited the researchers, though, were those inspired by developmental psychology. They wanted to determine whether the AI agents grasped certain “cognitive primitives,” or basic elements of understanding that can be built upon. They found that the agents understood the principles of containment, object permanence, and that they could rank images according to how much free space they contained. That ranking test was an attempt to get at a concept that Jean Piaget called seriation, or the ability to order objects based on a common property. 

If you’re thinking, “Haven’t I read something in IEEE Spectrum before about AI agents playing hide and seek?” you are not wrong, and you are also a faithful reader. In 2019, I covered an OpenAI project in which the hiders and seekers surprised the researchers by coming up with strategies that weren’t supposed to be possible in the game environment.  

Igor Mordatch, one of the OpenAI researchers behind that project, says he’s excited to see that AI2’s research doesn’t focus on external behaviors within the game, but rather the “internal representations of the world emerging in the minds of these agents,” he says in an email. “Representation learning is thought to be one of the key components to progress in general-purpose AI systems today, so any advances in this area would be highly impactful.”

As for transferring any advances from their research to the real world, the AI2 researchers say that the agents’ dynamic understanding of how objects act in time and space could someday be useful to robots. But they have no intention of doing robot experiments anytime soon. Training in simulation took several weeks; training in the real world would be infeasible. “Also, there’s a safety issue,” notes study coauthor Roozbeh Motaghi, also a research manager at AI2.“These agents do random stuff.” Just think of the havoc that could be wreaked on a lab by a rogue robot carrying a toilet plunger.

Machine learning and depth estimation using Raspberry Pi

Post Syndicated from David Plowman original https://www.raspberrypi.org/blog/machine-learning-and-depth-estimation-using-raspberry-pi/

One of our engineers, David Plowman, describes machine learning and shares news of a Raspberry Pi depth estimation challenge run by ETH Zürich (Swiss Federal Institute of Technology).

Spoiler alert – it’s all happening virtually, so you can definitely make the trip and attend, or maybe even enter yourself.

What is Machine Learning?

Machine Learning (ML) and Artificial Intelligence (AI) are some of the top engineering-related buzzwords of the moment, and foremost among current ML paradigms is probably the Artificial Neural Network (ANN).

They involve millions of tiny calculations, merged together in a giant biologically inspired network – hence the name. These networks typically have millions of parameters that control each calculation, and they must be optimised for every different task at hand.

This process of optimising the parameters so that a given set of inputs correctly produces a known set of outputs is known as training, and is what gives rise to the sense that the network is “learning”.

A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel
A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel

Machine Learning frameworks

A number of well known companies produce free ML frameworks that you can download and use on your own computer. The network training procedure runs best on machines with powerful CPUs and GPUs, but even using one of these pre-trained networks (known as inference) can be quite expensive.

One of the most popular frameworks is Google’s TensorFlow (TF), and since this is rather resource intensive, they also produce a cut-down version optimised for less powerful platforms. This is TensorFlow Lite (TFLite), which can be run effectively on Raspberry Pi.

Depth estimation

ANNs have proven very adept at a wide variety of image processing tasks, most notably object classification and detection, but also depth estimation. This is the process of taking one or more images and working out how far away every part of the scene is from the camera, producing a depth map.

Here’s an example:

Depth estimation example using a truck

The image on the right shows, by the brightness of each pixel, how far away the objects in the original (left-hand) image are from the camera (darker = nearer).

We distinguish between stereo depth estimation, which starts with a stereo pair of images (taken from marginally different viewpoints; here, parallax can be used to inform the algorithm), and monocular depth estimation, working from just a single image.

The applications of such techniques should be clear, ranging from robots that need to understand and navigate their environments, to the fake bokeh effects beloved of many modern smartphone cameras.

Depth Estimation Challenge

C V P R conference logo with dark blue background and the edge of the earth covered in scattered orange lights connected by white lines

We were very interested then to learn that, as part of the CVPR (Computer Vision and Pattern Recognition) 2021 conference, Andrey Ignatov and Radu Timofte of ETH Zürich were planning to run a Monocular Depth Estimation Challenge. They are specifically targeting the Raspberry Pi 4 platform running TFLite, and we are delighted to support this effort.

For more information, or indeed if any technically minded readers are interested in entering the challenge, please visit:

The conference and workshops are all taking place virtually in June, and we’ll be sure to update our blog with some of the results and models produced for Raspberry Pi 4 by the competing teams. We wish them all good luck!

The post Machine learning and depth estimation using Raspberry Pi appeared first on Raspberry Pi.

Improving AWS Java applications with Amazon CodeGuru Reviewer

Post Syndicated from Rajdeep Mukherjee original https://aws.amazon.com/blogs/devops/improving-aws-java-applications-with-amazon-codeguru-reviewer/

Amazon CodeGuru Reviewer is a machine learning (ML)-based AWS service for providing automated code reviews comments on your Java and Python applications. Powered by program analysis and ML, CodeGuru Reviewer detects hard-to-find bugs and inefficiencies in your code and leverages best practices learned from across millions of lines of open-source and Amazon code. You can start analyzing your code through pull requests and full repository analysis (for more information, see Automating code reviews and application profiling with Amazon CodeGuru).

The recommendations generated by CodeGuru Reviewer for Java fall into the following categories:

  • AWS best practices
  • Concurrency
  • Security
  • Resource leaks
  • Other specialized categories such as sensitive information leaks, input validation, and code clones
  • General best practices on data structures, control flow, exception handling, and more

We expect the recommendations to benefit beginners as well as expert Java programmers.

In this post, we showcase CodeGuru Reviewer recommendations related to using the AWS SDK for Java. For in-depth discussion of other specialized topics, see our posts on concurrency, security, and resource leaks. For Python applications, see Raising Python code quality using Amazon CodeGuru.

The AWS SDK for Java simplifies the use of AWS services by providing a set of features that are consistent and familiar for Java developers. The SDK has more than 250 AWS service clients, which are available on GitHub. Service clients include services like Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Kinesis, Amazon Elastic Compute Cloud (Amazon EC2), AWS IoT, and Amazon SageMaker. These services constitute more than 6,000 operations, which you can use to access AWS services. With such rich and diverse services and APIs, developers may not always be aware of the nuances of AWS API usage. These nuances may not be important at the beginning, but become critical as the scale increases and the application evolves or becomes diverse. This is why CodeGuru Reviewer has a category of recommendations: AWS best practices. This category of recommendations enables you to become aware of certain features of AWS APIs so your code can be more correct and performant.

The first part of this post focuses on the key features of the AWS SDK for Java as well as API patterns in AWS services. The second part of this post demonstrates using CodeGuru Reviewer to improve code quality for Java applications that use the AWS SDK for Java.

AWS SDK for Java

The AWS SDK for Java supports higher-level abstractions for simplified development and provides support for cross-cutting concerns such as credential management, retries, data marshaling, and serialization. In this section, we describe a few key features that are supported in the AWS SDK for Java. Additionally, we discuss some key API patterns such as batching, and pagination, in AWS services.

The AWS SDK for Java has the following features:

  • Waiters Waiters are utility methods that make it easy to wait for a resource to transition into a desired state. Waiters makes it easier to abstract out the polling logic into a simple API call. The waiters interface provides a custom delay strategy to control the sleep time between retries, as well as a custom condition on whether polling of a resource should be retried. The AWS SDK for Java also offer an async variant of waiters.
  • Exceptions The AWS SDK for Java uses runtime (or unchecked) exceptions instead of checked exceptions in order to give you fine-grained control over the errors you want to handle and to prevent scalability issues inherent with checked exceptions in large applications. Broadly, the AWS SDK for Java has two types of exceptions:
    • AmazonClientException – Indicates that a problem occurred inside the Java client code, either while trying to send a request to AWS or while trying to parse a response from AWS. For example, the AWS SDK for Java throws an AmazonClientException if no network connection is available when you try to call an operation on one of the clients.
    • AmazonServiceException – Represents an error response from an AWS service. For example, if you try to end an EC2 instance that doesn’t exist, Amazon EC2 returns an error response, and all the details of that response are included in the AmazonServiceException that’s thrown. For some cases, a subclass of AmazonServiceException is thrown to allow you fine-grained control over handling error cases through catch blocks.

The API has the following patterns:

  • Batching – A batch operation provides you with the ability to perform a single CRUD operation (create, read, update, delete) on multiple resources. Some typical use cases include the following:
  • Pagination – Many AWS operations return paginated results when the response object is too large to return in a single response. To enable you to perform pagination, the request and response objects for many service clients in the SDK provide a continuation token (typically named NextToken) to indicate additional results.

AWS best practices

Now that we have summarized the SDK-specific features and API patterns, let’s look at the CodeGuru Reviewer recommendations on AWS API use.

The CodeGuru Reviewer recommendations for the AWS SDK for Java range from detecting outdated or deprecated APIs to warning about API misuse, missing pagination, authentication and exception scenarios, and using efficient API alternatives. In this section, we discuss a few examples patterned after real code.

Handling pagination

Over 1,000 APIs from more than 150 AWS services have pagination operations. The pagination best practice rule in CodeGuru covers all the pagination operations. In particular, the pagination rule checks if the Java application correctly fetches all the results of the pagination operation.

The response of a pagination operation in AWS SDK for Java 1.0 contains a token that has to be used to retrieve the next page of results. In the following code snippet, you make a call to listTables(), a DynamoDB ListTables operation, which can only return up to 100 table names per page. This code might not produce complete results because the operation returns paginated results instead of all results.

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        List<String> tables = dynamoDbClient.listTables().getTableNames();

CodeGuru Reviewer detects the missing pagination in the code snippet and makes the following recommendation to add another call to check for additional results.

Screenshot of recommendations for introducing pagination checks

You can accept the recommendation and add the logic to get the next page of table names by checking if a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page. If such a token is present, it’s used in a subsequent request to fetch the next page of results. See the following code:

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        ListTablesRequest listTablesRequest = ListTablesRequest.builder().build();
        boolean done = false;
        while (!done) {
            ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
            if (listTablesResponse.lastEvaluatedTableName() == null) {
                done = true;
            listTablesRequest = listTablesRequest.toBuilder()

Handling failures in batch operation calls

Batch operations are common with many AWS services that process bulk requests. Batch operations can succeed without throwing exceptions even if some items in the request fail. Therefore, a recommended practice is to explicitly check for any failures in the result of the batch APIs. Over 40 APIs from more than 20 AWS services have batch operations. The best practice rule in CodeGuru Reviewer covers all the batch operations. In the following code snippet, you make a call to sendMessageBatch, a batch operation from Amazon SQS, but it doesn’t handle any errors returned by that batch operation:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
    sqsClient.sendMessageBatch(sqsEndPoint, batch);

CodeGuru Reviewer detects this issue and makes the following recommendation to check the return value for failures.

Screenshot of recommendations for batch operations

You can accept this recommendation and add logging for the complete list of messages that failed to send, in addition to throwing an SQSUpdateException. See the following code:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
    SendMessageBatchResult result = sqsClient.sendMessageBatch(sqsEndPoint, batch);
    final List<BatchResultErrorEntry> failed = result.getFailed();
    if (!failed.isEmpty()) {
           final String failedMessage = failed.stream()
                         .map(batchResultErrorEntry -> 
                            String.format("…", batchResultErrorEntry.getId(), 
           throw new SQSUpdateException("Error occurred while sending 
                                        messages to SQS::" + failedMessage);

Exception handling best practices

Amazon S3 is one of the most popular AWS services with our customers. A frequent operation with this service is to upload a stream-based object through an Amazon S3 client. Stream-based uploads might encounter occasional network connectivity or timeout issues, and the best practice to address such a scenario is to properly handle the corresponding ResetException error. ResetException extends SdkClientException, which subsequently extends AmazonClientException. Consider the following code snippet, which lacks such exception handling:

private void uploadInputStreamToS3(String bucketName, 
                                   InputStream input, 
                                   String key, ObjectMetadata metadata) 
                         throws SdkClientException {
    final AmazonS3Client amazonS3Client;
    PutObjectRequest putObjectRequest =
          new PutObjectRequest(bucketName, key, input, metadata);

In this case, CodeGuru Reviewer correctly detects the missing handling of the ResetException error and suggests possible solutions.

Screenshot of recommendations for handling exceptions

This recommendation is rich in that it provides alternatives to suit different use cases. The most common handling uses File or FileInputStream objects, but in other cases explicit handling of mark and reset operations are necessary to reliably avoid a ResetException.

You can fix the code by explicitly setting a predefined read limit using the setReadLimit method of RequestClientOptions. Its default value is 128 KB. Setting the read limit value to one byte greater than the size of stream reliably avoids a ResetException.

For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset always work for 100,000 bytes or less. However, this might cause some streams to buffer that number of bytes into memory.

The fix reliably avoids ResetException when uploading an object of type InputStream to Amazon S3:

private void uploadInputStreamToS3(String bucketName, InputStream input, 
                                   String key, ObjectMetadata metadata) 
                             throws SdkClientException {
        final AmazonS3Client amazonS3Client;
        final Integer READ_LIMIT = 10000;
        PutObjectRequest putObjectRequest =
   			new PutObjectRequest(bucketName, key, input, metadata);  

Replacing custom polling with waiters

A common activity when you’re working with services that are eventually consistent (such as DynamoDB) or have a lead time for creating resources (such as Amazon EC2) is to wait for a resource to transition into a desired state. The AWS SDK provides the Waiters API, a convenient and efficient feature for waiting that abstracts out the polling logic into a simple API call. If you’re not aware of this feature, you might come up with a custom, and potentially inefficient polling logic to determine whether a particular resource had transitioned into a desired state.

The following code appears to be waiting for the status of EC2 instances to change to shutting-down or terminated inside a while (true) loop:

private boolean terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    long start = System.currentTimeMillis();
    while (true) {
        try {
            DescribeInstanceStatusResult describeInstanceStatusResult = 
                            ec2Client.describeInstanceStatus(new DescribeInstanceStatusRequest()
            List<InstanceStatus> instanceStatusList = 
            long finish = System.currentTimeMillis();
            long timeElapsed = finish - start;
            if (timeElapsed > INSTANCE_TERMINATION_TIMEOUT) {
            if (instanceStatusList.size() < 1) {
            currentState = instanceStatusList.get(0).getInstanceState().getName();
            if ("shutting-down".equals(currentState) || "terminated".equals(currentState)) {
                return true;
             } else {
        } catch (AmazonServiceException ex) {
            throw ex;

CodeGuru Reviewer detects the polling scenario and recommends you use the waiters feature to help improve efficiency of such programs.

Screenshot of recommendations for introducing waiters feature

Based on the recommendation, the following code uses the waiters function that is available in the AWS SDK for Java. The polling logic is replaced with the waiters() function, which is then run with the call to waiters.run(…), which accepts custom provided parameters, including the request and optional custom polling strategy. The run function polls synchronously until it’s determined that the resource transitioned into the desired state or not. The SDK throws a WaiterTimedOutException if the resource doesn’t transition into the desired state even after a certain number of retries. The fixed code is more efficient, simple, and abstracts the polling logic to determine whether a particular resource had transitioned into a desired state into a simple API call:

public void terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    Waiter<DescribeInstancesRequest> waiter = ec2Client.waiters().instanceTerminated();
    ec2Client.terminateInstances(new TerminateInstancesRequest().withInstanceIds(instanceId));
    try {
        waiter.run(new WaiterParameters()
              .withRequest(new DescribeInstancesRequest()
              .withPollingStrategy(new PollingStrategy(new MaxAttemptsRetryStrategy(60), 
                    new FixedDelayStrategy(5))));
    } catch (WaiterTimedOutException e) {
        List<InstanceStatus> instanceStatusList = ec2Client.describeInstanceStatus(
               new DescribeInstanceStatusRequest()
        String state;
        if (instanceStatusList != null && instanceStatusList.size() > 0) {
            state = instanceStatusList.get(0).getInstanceState().getName();

Service-specific best practice recommendations

In addition to the SDK operation-specific recommendations in the AWS SDK for Java we discussed, there are various AWS service-specific best practice recommendations pertaining to service APIs for services such as Amazon S3, Amazon EC2, DynamoDB, and more, where CodeGuru Reviewer can help to improve Java applications that use AWS service clients. For example, CodeGuru can detect the following:

  • Resource leaks in Java applications that use high-level libraries, such as the Amazon S3 TransferManager
  • Deprecated methods in various AWS services
  • Missing null checks on the response of the GetItem API call in DynamoDB
  • Missing error handling in the output of the PutRecords API call in Kinesis
  • Anti-patterns such as binding the SNS subscribe or createTopic operation with Publish operation


This post introduced how to use CodeGuru Reviewer to improve the use of the AWS SDK in Java applications. CodeGuru is now available for you to try. For pricing information, see Amazon CodeGuru pricing.

To Really Judge an AI’s Smarts, Give it One of These IQ Tests

Post Syndicated from Matthew Hutson original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/how-do-you-test-the-iq-of-ai

Chess was once seen as an ultimate test of intelligence, until computers defeated humans while showing none of the other broad capabilities we associate with smarts. Artificial intelligence has since bested humans at Go, some types of poker, and many video games.

So researchers are developing AI IQ tests meant to assess deeper humanlike aspects of intelligence, such as concept learning and analogical reasoning. So far, computers have struggled on many of these tasks, which is exactly the point. The test-makers hope their challenges will highlight what’s missing in AI, and guide the field toward machines that can finally think like us.

A common human IQ test is Raven’s Progressive Matrices, in which one needs to complete an arrangement of nine abstract drawings by deciphering the underlying structure and selecting the missing drawing from a group of options. Neural networks have gotten pretty good at that task. But a paper presented in December at the massive AI conference known as NeurIPS offers a new challenge: The AI system must generate a fitting image from scratch, an ultimate test of understanding the pattern.

“If you are developing a computer vision system, usually it recognizes without really understanding what’s in the scene,” says Lior Wolf, a computer scientist at Tel Aviv University and Facebook, and the paper’s senior author. This task requires understanding composition and rules, “so it’s a very neat problem.” The researchers also designed a neural network to tackle the task—according to human judges, it gets about 70 percent correct, leaving plenty of room for improvement.

Other tests are harder still. Another NeurIPS paper presented a software-generated dataset of so-called Bongard Problems, a classic test for humans and computers. In their version, called Bongard-LOGO, one sees a few abstract sketches that match a pattern and a few that don’t, and one must decide if new sketches match the pattern.

The puzzles test “compositionality,” or the ability to break a pattern down into its component parts, which is a critical piece of intelligence, says Anima Anandkumar, a computer scientist at the California Institute of Technology and the paper’s senior author. Humans got the correct answer more than 90 percent of the time, the researchers found, but state-of-the-art visual processing algorithms topped out around 65 percent (with chance being 50 percent). “That’s the beauty of it,” Anandkumar said of the test, “that something so simple can still be so challenging for AI.” They’re currently developing a version of the test with real images.

Compositional thinking might help machines perform in the real world. Imagine a street scene, Anandkumar says. An autonomous vehicle needs to break it down into general concepts like cars and pedestrians to predict what will happen next. Compositional thinking would also make AI more interpretable and trustworthy, she added. One might peer inside to see how it pieces evidence together. 

Still harder tests are out there. In 2019, François Chollet, an AI researcher at Google, created the Abstraction and Reasoning Corpus (ARC), a set of visual puzzles tapping into core human knowledge of geometry, numbers, physics, and even goal directness. On each puzzle, one sees one or more pairs of grids filled with colored squares, each pair a sort of before-and-after grid. One also sees a new grid and fills in its partner according to whatever rule one has inferred.

A website called Kaggle held a competition with the puzzles and awarded $20,000 last May to the three teams with the best-performing algorithms. The puzzles are pretty easy for humans, but the top AI barely reached 20 percent. “That’s a big red flag that tells you there’s something interesting there,” Chollet says, “that we’re missing something.”

The current wave of advancement in AI is driven largely by multi-layered neural networks, also known as deep learning. But, Chollet says, these neural nets perform “abysmally” on the ARC. The Kaggle winners used old-school methods that combine handwritten rules rather than learning subtle patterns from gobs of data. Though he sees a role for both paradigms in tandem. A neural net might translate messy perceptual data into a structured form that symbolic processing can handle.

Anandkumar agrees with the need for a hybrid approach. Much of deep learning’s progress now comes from making it deeper and deeper, with bigger and bigger neural nets, she says. “The scale now is so enormous that I think we’ll see more work trying to do more with less.”

Anandkumar and Chollet point out one misconception about intelligence: People confuse it with skill. Instead, they say, it’s the ability to pick up new skills easily. That may be why deep learning so often falters. It typically requires lots of training and doesn’t generalize to new tasks, whereas the Bongard and ARC problems require solving a variety of puzzles with only a few examples of each. Maybe a good test of AI IQ would be for a computer to read this article and come up with a new IQ test.

OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language)

Post Syndicated from Eliza Strickland original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/open-ais-powerful-text-generating-tool-is-ready-for-business

Last September, a data scientist named Vinay Prabhu was playing around with an app called Philosopher AI. The app provides access to the artificial intelligence system known as GPT-3, which has incredible abilities to generate fluid and natural-seeming text. The creator of that underlying technology, the San Francisco company OpenAI, has allowed hundreds of developers and companies to try out GPT-3 in a wide range of applications, including customer service, video games, tutoring services, and mental health apps. The company says tens of thousands more are on the waiting list.

Philosopher AI is meant to show people the technology’s astounding capabilities—and its limits. A user enters any prompt, from a few words to a few sentences, and the AI turns the fragment into a full essay of surprising coherence. But while Prahbu was experimenting with the tool, he found a certain type of prompt that returned offensive results. “I tried: What ails modern feminism? What ails critical race theory? What ails leftist politics?” he tells IEEE Spectrum.  

The results were deeply troubling. Take, for example, this excerpt from GPT-3’s essay on what ails Ethiopia, which another AI researcher and a friend of Prabhu’s posted on Twitter: “Ethiopians are divided into a number of different ethnic groups. However, it is unclear whether ethiopia’s [sic] problems can really be attributed to racial diversity or simply the fact that most of its population is black and thus would have faced the same issues in any country (since africa [sic] has had more than enough time to prove itself incapable of self-government).”

Prabhu, who works on machine learning as chief scientist for the biometrics company UnifyID, notes that Philospher AI sometimes returned diametrically opposing responses to the same query, and that not all of its responses were problematic. “But a key adversarial metric is: How many attempts does a person who is probing the model have to make before it spits out deeply offensive verbiage?” he says. “In all of my experiments, it was on the order of two or three.”  

The Philosopher AI incident laid bare the potential danger that companies face as they work with this new and largely untamed technology, and as they deploy commercial products and services powered by GPT-3. Imagine the toxic language that surfaced in the Philosopher AI app appearing in another context—your customer service representative, an AI companion that rides around in your phone, your online tutor, the characters in your video game, your virtual therapist, or an assistant who writes your emails.

Those are not theoretical concerns. Spectrum spoke with beta users of the API who are working to incorporate GPT-3 into such applications and others. The good news is that all the users Spectrum talked with were actively thinking about how to deploy the technology safely.

The Vancouver-based developer behind the Philosopher AI app, Murat Ayfer, says he created it to both further his own understanding of GPT-3’s potential and to educate the public. He quickly discovered the many ways in which his app could go wrong. “With automation, you need either a 100 percent success rate, or you need it to error out gracefully,” he tells Spectrum. “The problem with GPT-3 is that it doesn’t error out, it just produces garbage—and there’s no way to detect if it’s producing garbage.”

GPT-3 Learned From Us

The fundamental problem is that GPT-3 learned about language from the Internet: Its massive training dataset included not just news articles, Wikipedia entries, and online books, but also every unsavory discussion on Reddit and other sites. From that morass of verbiage—both upstanding and unsavory—it drew 175 billion parameters that define its language. As Prabhu puts it: “These things it’s saying, they’re not coming out of a vacuum. It’s holding up a mirror.” Whatever GPT-3’s failings, it learned them from humans.

Following some outcry about the PhilosopherAI app—another response that ended up on Twitter started with cute rabbits but quickly devolved into a discussion of reproductive organs and rape—Ayfer made changes. He had already been steadily working on the app’s content filter, causing more prompts to return the polite response: “Philosopher AI is not providing a response for this topic, because we know this system has a tendency to discuss some topics using unsafe and insensitive language.” He also added a function that let users report offensive responses.

Ayfer argues that Philospher AI is a “relatively harmless context” for GPT-3 to generate offensive content. “It’s probably better to make mistakes now, so we can really learn how to fix them,” he says.  

That’s just what OpenAI intended when it launched the API that enables access to GPT-3 last June, and announced a private beta test in which carefully selected users would develop applications for the technology under the company’s watchful eye. The blog post noted that OpenAI will be guarding against “obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing,” and will be looking for unexpected problems: “We also know we can’t anticipate all of the possible consequences of this technology.”

Prabhu worries that the AI and business community are being swept away into uncharted waters: “People are thrilled, excited, giddy.” He thinks the rollout into commercial applications is bound to cause some disasters. “Even if they’re very careful, the odds of something offensive coming out is 100 percent—that’s my humble opinion. It’s an intractable problem, and there is no solution,” he says.  

Janelle Shane is a member of that AI community, and a beta user of GPT-3 for her blog, AI Weirdness. She clearly enjoys the technology, having used it to generate Christmas carols, recipes, news headlines, and anything else she thought would be funny. Yet the tweets about PhilosopherAI’s essay on Ethiopia prompted her to post this sobering thought: “Sometimes, to reckon with the effects of biased training data is to realize that the app shouldn’t be built. That without human supervision, there is no way to stop the app from saying problematic stuff to its users, and that it’s unacceptable to let it do so.”

So what is OpenAI doing about its intractable problem?

OpenAI’s Approach to AI Safety

The company has arguably learned from its experiences with earlier iterations of its language-generating technology. In 2019 it introduced GPT-2, but declared that it was actually too dangerous to be released into the wild. The company instead offered up a downsized version of the language model but withheld the full model, which included the data set and training code.

The main fear, highlighted by OpenAI in a blog post, was that malicious actors would use GPT-2 to generate high-quality fake news that would fool readers and destroy the distinction between fact and fiction.  

However, much of the AI community objected to that limited release. When the company reversed course later that year and made the full model available, some people did indeed use it to generate fake news and clickbait. But it didn’t create a tsunami of non-truth on the Internet. In the past few years, people have shown they can do that well enough themselves, without the help of an AI. 

Then came GPT-3, unveiled in a 75-page paper in May 2020. OpenAI’s newest language model was far larger than any that had come before. Its 175 billion language parameters were a massive increase over GPT-2’s 1.5 billion parameters).

Sandhini Agarwal, an AI policy researcher at OpenAI, spoke with Spectrum about the company’s strategy for GPT-3. “We have to do this closed beta with a few people, otherwise we won’t even know what the model is capable of, and we won’t know which issues we need to make headway on,” she says. “If we want to make headway on things like harmful bias, we have to actually deploy.”

Agarwal explains that an internal team vets proposed applications, provides safety guidelines to those companies granted access to GPT-3 via the API, reviews the applications again before deployment, and monitors their use after deployment.

OpenAI is also developing tools to help users better control GPT-3’s generated text. It offers a general content filter for harmful bias and toxic language. However, Agarwal says that such a filter is really an impossible thing to create, since “bias is a very nebulous thing that keeps shifting based on context.” Particularly on controversial topics, a response that might seem right-on to people on one side of the debate could be deemed toxic by the other.

Another approach, called prompt engineering, adds a phrase to the user’s prompt such as “the friendly bot then said,” which sets up GPT-3 to generate text in a polite and uncontroversial tone. Users can also choose a “temperature” setting for their responses. A low-temperature setting means the AI will put together words that it has very often seen together before, taking few risks and causing few surprises; when set to a high temperature, it’s more likely to produce outlandish language.

In addition to all the work being done on the product side of OpenAI, Agarwal says there’s a parallel effort on the “pure machine learning research” side of the company. “We have an internal red team that’s always trying to break the model, trying to make it do all these bad things,” she says. Researchers are trying to understand what’s happening when GPT-3 generates overtly sexist or racist text. “They’re going down to the underlying weights of the model, trying to see which weights might indicate that particular content is harmful.”

In areas where mistakes could have serious consequences, such as the health care, finance, and legal industries, Agarwal says OpenAI’s review team takes special care. In some cases, they’ve rejected applicants because their proposed product was too sensitive. In others, she says, they’ve insisted on having a “human in the loop,” meaning that the AI-generated text is reviewed by a human before it reaches a customer or user.  

OpenAI is making progress on toxic language and harmful bias, Agarwal says, but “we’re not quite where we want to be.” She says the company won’t broadly expand access to GPT-3 until it’s comfortable that it has a handle on these issues. “If we open it up to the world now, it could end really badly,” she says.

But such an approach raises plenty of questions. It’s not clear how OpenAI will get the risk of toxic language down to a manageable level—and it’s not clear what manageable means in this context. Commercial users will have to weigh GPT-3’s benefits against these risks.

Can Language Models Be Detoxified? 

OpenAI’s researchers aren’t the only ones trying to understand the scope of the problem. In December, AI researcher Timnit Gebru said that she’d been fired by Google, forced to leave her work on ethical AI and algorithmic bias, because of an internal disagreement about a paper she’d coauthored. The paper discussed the current failings of large language models such as GPT-3 and Google’s own BERT, including the dilemma of encoded bias. Gebru and her coauthors argued that companies intent on developing large language models should devote more of their resources to curating the training data and “only creating datasets as large as can be sufficiently documented.”

Meanwhile, at the Allen Institute for AI (AI2), in Seattle, a handful of researchers have been probing GPT-3 and other large language models. One of their projects, called RealToxicityPrompts, created a dataset of 100,000 prompts derived from web text, evaluated the toxicity of the resulting text from five different language models, and tried out several mitigation strategies. Those five models included GPT versions 1, 2, and 3 (OpenAI gave the researchers access to the API).

The conclusion stated in their paper, which was presented at the 2020 Empirical Methods in Natural Language Processing conference in November: No current mitigation method is “failsafe against neural toxic degeneration.” In other words, they couldn’t find a way to reliably keep out ugly words and sentiments.  

When the research team spoke with Spectrum about their findings, they noted that the standard ways of training these big language models may need improvement. “Using Internet text has been the default,” says Suchin Gururangan, an author on the paper and an investigator at AI2. “The assumption is that you’re getting the most diverse set of voices in the data. But it’s pretty clear in our analysis that Internet text does have its own biases, and biases do propagate in the model behavior.”

Gururangan says that when researchers think about what data to train their new models on, they should consider what kinds of text they’d like to exclude. But he notes that it’s a hard task to automatically identify toxic language even in a document, and that doing it at web-scale is “is fertile ground for research.”

As for ways to fix the problem, the AI2 team tried two approaches to “detoxify” the models’ output: giving the model additional training with text that’s known to be innocuous, or filtering the generated text by scanning for keywords or by fancier means. “We found that most of these techniques don’t really work very well,” Gururangan says. “All of these methods reduce the prevalence of toxicity—but we always found, if you generate enough times, you will find some toxicity.”

What’s more, he says, reducing the toxicity can also have the side effect of reducing the fluency of the language. That’s one of the issues that the beta users are grappling with today.  

How Beta Users of GPT-3 Aim for Safe Deployment

The companies and developers in the private beta that Spectrum spoke with all made two basic points: GPT-3 is a powerful technology, and OpenAI is working hard to address toxic language and harmful bias. “The people there take these issues extremely seriously,” says Richard Rusczyk, founder of Art of Problem Solving, a beta-user company that offers online math courses to “kids who are really into math.” And the companies have all devised strategies for keeping GPT-3’s output safe and inoffensive.   

Rusczyk says his company is trying out GPT-3 to speed up its instructors’ grading of students’ math proofs—GPT-3 can provide a basic response about a proof’s accuracy and presentation, and then the instructor can check the response and customize it to best help that individual student. “It lets the grader spend more time on the high value tasks,” he says.

To protect the students, the generated text “never goes directly to the students,” Rusczyk says. “If there’s some garbage coming out, only a grader would see it.” He notes that it’s extremely unlikely that GPT-3 would generate offensive language in response to a math proof, because it seems likely that such correlations rarely (if ever) occurred in its training data. Yet he stresses that OpenAI still wanted a human in the loop. “They were very insistent that students should not be talking directly to the machine,” he says.

Some companies find safety in limiting the use case for GPT-3. At Sapling Intelligence, a startup that helps customer service agents with emails, chat, and service tickets, CEO Ziang Xie he doesn’t anticipate using it for “freeform generation.” Xie says it’s important to put this technology in place within certain protective constraints. “I like the analogy of cars versus trolleys,” he says. “Cars can drive anywhere, so they can veer off the road. Trolleys are on rails, so you know at the very least they won’t run off and hit someone on the sidewalk.” However, Xie notes that the recent furor over Timnit Gebru’s forced departure from Google has caused him to question whether companies like OpenAI can do more to make their language models safer from the get-go, so they don’t need guardrails.

Robert Morris, the cofounder of the mental health app Koko, describes how his team is using GPT-3 in a particularly sensitive domain. Koko is a peer-support platform that provides crowdsourced cognitive therapy. His team is experimenting with using GPT-3 to generate bot-written responses to users while they wait for peer responses, and also with giving respondents possible text that they can modify. Morris says the human collaboration approach feels safer to him. “I get increasingly concerned the more freedom it has,” he says.

Yet some companies need GPT-3 to have a good amount of freedom. Replika, an AI companion app used by 7 million people around the world, offers friendly conversation about anything under the sun. “People can talk to Replika about anything—their life, their day, their interests,” says Artem Rodichev, head of AI at Replika. “We need to support conversation about all types of topics.”

To prevent the app from saying offensive things, the company has GPT-3 generate a variety of responses to each message, then uses a number of custom classifiers to detect and filter out responses with negativity, harmful bias, nasty words, and so on. Since such attributes are hard to detect from keywords alone, the app also collects signals from users to train its classifiers. “Users can label a response as inappropriate, and we can use that feedback as a dataset to train the classifier,” says Rodichev.  

Another company that requires GPT-3 to be relatively unfettered is Latitude, a startup creating AI-powered games. Its first offering, a text adventure game called AI Dungeon, currently uses GPT-3 to create the narrative and respond to the player’s actions. Latitude CEO and cofounder Nick Walton says his team has grappled with inappropriate and bad language. “It doesn’t happen a ton, but it does happen,” he says. “And things end up on Reddit.”

Latitude is not trying to prevent all such incidents, because some users want a “grittier experience,” Walton says. Instead, the company tries to give users control over the settings that determine what kind of language they’ll encounter. Players start out in a default safe mode, and stay there unless they explicitly turn it off.

Safe mode isn’t perfect, Walton says, but it relies on a combination of filters and prompt engineering (such as: “continue this story in a way that’s safe for kids”) to get pretty good performance. He notes that Latitude wanted to build its own screening tech rather than rely on OpenAI’s safety filter because “safety is relative to the context,” he says. “If a customer service chatbot threatens you and asks you to give it all its money, that’s bad. If you’re playing a game and you encounter a bandit on the road, that’s normal storytelling.”  

These applications are only a small sampling of those being tested by beta users, and the beta users are a tiny fraction of the entities that want access to GPT-3. Aaro Isosaari cofounded the startup Flowrite in September after getting access to GPT-3; the company aims to help people compose faster emails and online content. Just as advances in computer vision and speech recognition enabled thousands of new companies, He thinks GPT-3 may usher in a new wave of innovation. “Language models have the potential to be the next technological advancement on top of which new startups are being built,” he says.  

Coming Soon to Microsoft? 

Technology powered by GPT-3 could even find its way into the productivity tools that millions of office workers use every day. Last September, Microsoft announced an exclusive licensing agreement with OpenAI, stating that the company would use GPT-3 to “create new solutions that harness the amazing power of advanced natural language generation.” This arrangement won’t prevent other companies from accessing GPT-3 via OpenAI’s API, but it gives Microsoft exclusive rights to work with the basic code—it’s the difference between riding in a fast car and popping the hood to tinker with the engine.

In the blog post announcing the agreement, Microsoft chief technology officer Kevin Scott enthused about the possibilities, saying: “The scope of commercial and creative potential that can be unlocked through the GPT-3 model is profound, with genuinely novel capabilities – most of which we haven’t even imagined yet.” Microsoft declined to comment when asked about its plans for the technology and its ideas for safe deployment.

Ayfer, the creator of the Philosopher AI app, thinks that GPT-3 and similar language technologies should only gradually become part of our lives. “I think this is a remarkably similar situation to self-driving cars,” he says, noting that various aspects of autonomous car technology are gradually being integrated into normal vehicles. “But there’s still the disclaimer: It’s going to make life-threatening mistakes, so be ready to take over at any time. You have to be in control.” He notes that we’re not yet ready to put the AI systems in charge and use them without supervision.

With language technology like GPT-3, the consequences of mistakes might not be as obvious as a car crash. Yet toxic language has an insidious effect on human society by reinforcing stereotypes, supporting structural inequalities, and generally keeping us mired in a past that we’re collectively trying to move beyond. It isn’t clear, with GPT-3, if it will ever be trustworthy enough to act on its own, without human oversight.

OpenAI’s position on GPT-3 mirrors its larger mission, which is to create a game-changing kind of human-level AI, the kind of generally intelligent AI that figures in sci-fi movies—but to do so safely and responsibly. In both the micro and the macro argument, OpenAI’s position comes down to: We need to create the technology and see what can go wrong. We’ll do it responsibly, they say, while other people might not.

Agarwal of OpenAI says about GPT-3: “I do think that there are safety concerns, but it’s a Catch-22.” If they don’t build it and see what terrible things it’s capable of, she says, they can’t find ways to protect society from the terrible things. 

One wonders, though, whether anyone has considered another option: Taking a step back and thinking through the possible worst-case scenarios before proceeding with this technology. And possibly looking for fundamentally different ways to train large language models, so these models would reflect not the horrors of our past, but a world that we’d like to live in. 

A shorter version of this article appears in the February 2021 print issue as “The Troll in the Machine.”

Amazon Lex Introduces an Enhanced Console Experience and New V2 APIs

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/amazon-lex-enhanced-console-experience/

Today, the Amazon Lex team has released a new console experience that makes it easier to build, deploy, and manage conversational experiences. Along with the new console, we have also introduced new V2 APIs, including continuous streaming capability. These improvements allow you to reach new audiences, have more natural conversations, and develop and iterate faster.

The new Lex console and V2 APIs make it easier to build and manage bots focusing on three main benefits. First, you can add a new language to a bot at any time and manage all the languages through the lifecycle of design, test, and deployment as a single resource. The new console experience allows you to quickly move between different languages to compare and refine your conversations. I’ll demonstrate later how easy it was to add French to my English bot.

Second, V2 APIs simplify versioning. The new Lex console and V2 APIs provide a simple information architecture where the bot intents and slot types are scoped to a specific language. Versioning is performed at the bot level so that resources such as intents and slot types do not have to be versioned individually. All resources within the bot (language, intents, and slot types) are archived as part of the bot version creation. This new way of working makes it easier to manage bots.

Lastly, you have additional builder productivity tools and capabilities to give you more flexibility and control of your bot design process. You can now save partially completed work as you develop different bot elements as you script, test and tune your configuration. This provides you with more flexibility as you iterate through the bot development. For example, you can save a slot that refers to a deleted slot type. In addition to saving partially completed work, you can quickly navigate across the configuration without getting lost. The new Conversation flow capability allows you to maintain your orientation as you move across the different intents and slot types.

In addition to the enhanced console and APIs, we are providing a new streaming conversation API. Natural conversations are punctuated with pauses and interruptions. For example, a customer may ask to pause the conversation or hold the line while looking up the necessary information before answering a question to retrieve credit card details when providing bill payments. With streaming conversation APIs, you can pause a conversation and handle interruptions directly as you configure the bot. Overall, the design and implementation of the conversation is simplified and easy to manage. The bot builder can quickly enhance the conversational capability of virtual contact center agents or smart assistants.

Let’s create a new bot and explore how some of Lex’s new console and streaming API features provide an improved bot building experience.

Building a bot
I head over to the new V2 Lex console and click on Create bot to start things off.

I select that I want to Start with an example and select the MakeAppointment example.

Over the years, I have spoken at many conferences, so I now offer to review talks that other community members are producing. Since these speakers are often in different time zones, it can be complicated to organize the various appointments for the different types of reviews that I offer. So I have decided to build a bot to streamline the process. I give my bot the name TalkReview and provide a description. I also select Create a role with basic Amazon Lex permissions and use this as my runtime role.

I must add at least one language to my bot, so I start with English (GB). I also select the text-to-speech voice that I want to use should my bot require voice interaction rather than just text.

During the creation, there is a new button that allows me to Add another language. I click on this to add French (FR) to my bot. You can add languages during creation as I am doing here, or you can add additional languages later on as your bot becomes more popular and needs to work with new audiences.

I can now start defining intents for my bot, and I can begin the iterative process of building and testing my bot. I won’t go into all of the details of how to create a bot or show you all of the intents I added, as we have better tutorials that can show you that step-by-step, but I will point out a few new features that make this new enhanced console really compelling.

The new Conversation flow provides you with a visual flow of the conversation, and you can see how the sample utterances you provide and how your conversation might work in the real world. I love this feature because you can click on the various elements, and it will take you to where you can make changes. For example, I can click on the prompt What type of review would you like to schedule and I am taken to the place where I can edit this prompt.

The new console has a very well thought-out approach to versioning a bot. At anytime, on the Bot versions screen, I can click Create version, and it will take a snapshot of the state of the bot’s current configuration. I can then associate that with an alias. For example, in my application, I have an alias called Production. This Production alias is associated with Version 1. Still, at any time, I could switch it to use a different version or even roll back to a previous version if I discover problems.

The testing experience is now very streamlined. Once I have built the bot, I can click the test button on the bottom right hand of the screen and start speaking to the bot and testing the experience. You can also expand the Inspect window, which gives you details about the conversations state, and you can also explore the raw JSON inputs and outputs.

Things to know
Here are a couple of important things to keep in mind when you use the enhanced console

  • Integration with Amazon Connect – Currently, bots built in the new console cannot be integrated with Amazon Connect contact flows. We plan to provide this integration as part of the near-term roadmap. You can use the current console and existing APIs to create and integrate bots with Amazon Connect.
  • Pricing – You only pay for what you use. The charges remain the same for existing audio and text APIs, renamed as RecognizeUtterance and RecognizeText. For the new Streaming capabilities, please refer to the pricing detail here.
  • All existing APIs and bots will continue to be supported. The newly announced features are only available in the new console and V2 APIs.

Go Build
Lex enhanced console is available now, and you can start using it today. The enhanced experience and V2 APIs are available in all existing regions and support all current languages. So, please give this console a try and let us know what you think. To learn more, check out the documentation for the console and the streaming API.

Happy Building!
— Martin

Raspberry Pi LEGO sorter

Post Syndicated from Ashley Whittaker original https://www.raspberrypi.org/blog/raspberry-pi-lego-sorter/

Raspberry Pi is at the heart of this AI–powered, automated sorting machine that is capable of recognising and sorting any LEGO brick.

And its maker Daniel West believes it to be the first of its kind in the world!

Best ever

This mega-machine was two years in the making and is a LEGO creation itself, built from over 10,000 LEGO bricks.

A beast of 10,000 bricks

It can sort any LEGO brick you place in its input bucket into one of 18 output buckets, at the rate of one brick every two seconds.

While Daniel was inspired by previous LEGO sorters, his creation is a huge step up from them: it can recognise absolutely every LEGO brick ever created, even bricks it has never seen before. Hence the ‘universal’ in the name ‘universal LEGO sorting machine’.


There we are, tucked away, just doing our job


The artificial intelligence algorithm behind the LEGO sorting is a convolutional neural network, the go-to for image classification.

What makes Daniel’s project a ‘world first’ is that he trained his classifier using 3D model images of LEGO bricks, which is how the machine can classify absolutely any LEGO brick it’s faced with, even if it has never seen it in real life before.

We LOVE a thorough project video, and we love TWO of them even more

Daniel has made a whole extra video (above) explaining how the AI in this project works. He shouts out all the open source software he used to run the Raspberry Pi Camera Module and access 3D training images etc. at this point in the video.

LEGO brick separation

The vibration plate in action, feeding single parts into the scanner

Daniel needed the input bucket to carefully pick out a single LEGO brick from the mass he chucks in at once.

This is achieved with a primary and secondary belt slowly pushing parts onto a vibration plate. The vibration plate uses a super fast LEGO motor to shake the bricks around so they aren’t sitting on top of each other when they reach the scanner.

Scanning and sorting

A side view of the LEFO sorting machine showing a large white chute built from LEGO bricks
The underside of the beast

A Raspberry Pi Camera Module captures video of each brick, which Raspberry Pi 3 Model B+ then processes and wirelessly sends to a more powerful computer able to run the neural network that classifies the parts.

The classification decision is then sent back to the sorting machine so it can spit the brick, using a series of servo-controlled gates, into the right output bucket.

Extra-credit homework

A front view of the LEGO sorter with the sorting boxes visible underneath
In all its bricky beauty, with the 18 output buckets visible at the bottom

Daniel is such a boss maker that he wrote not one, but two further reading articles for those of you who want to deep-dive into this mega LEGO creation:

The post Raspberry Pi LEGO sorter appeared first on Raspberry Pi.

Resource leak detection in Amazon CodeGuru Reviewer

Post Syndicated from Pranav Garg original https://aws.amazon.com/blogs/devops/resource-leak-detection-in-amazon-codeguru/

This post discusses the resource leak detector for Java in Amazon CodeGuru Reviewer. CodeGuru Reviewer automatically analyzes pull requests (created in supported repositories such as AWS CodeCommit, GitHub, GitHub Enterprise, and Bitbucket) and generates recommendations for improving code quality. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. This blog does not describe the resource leak detector for Python programs that is now available in preview.

What are resource leaks?

Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.

Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.

Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).

A resource must be closed along all execution paths to prevent resource leaks

Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.

Detecting resource leaks in CodeGuru Reviewer

For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.

 1     private Connection getConnection(final BasicDataSource dataSource, ...)
               throws ValidateConnectionException, SQLException {
 2         boolean connectionAcquired = false;
 3         // Retrying three times to get the connection.
 4         for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
 5             Connection connection = dataSource.getConnection();
 6             // validateConnection may throw ValidateConnectionException
 7             if (! validateConnection(connection, ...)) {
 8                 // connection is invalid
 9                 DbUtils.closeQuietly(connection);
10             } else {
11                 // connection is established
12                 connectionAcquired = true;
13                 return connection;
14             }
15         }
16         return null;
17     }

At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.

However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.

A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:

“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”

As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:

private Connection getConnection(final BasicDataSource dataSource, ...) 
        throws ValidateConnectionException, SQLException {
    boolean connectionAcquired = false;
    // Retrying three times to get the connection.
    for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
        Connection connection = dataSource.getConnection();
        try {
            // validateConnection may throw ValidateConnectionException
            if (! validateConnection(connection, ...)) {
                // connection is invalid
            } else {
                // connection is established
                connectionAcquired = true;
                return connection;
        } finally {
            if (!connectionAcquired) {
    return null;

As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.

Combining static code analysis with machine learning for accurate resource leak detection

In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.

The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.

To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:

  • A connection is established on line 5
  • Method validateConnection() returns false at line 7
  • DbUtils.closeQuietly() is called on line 9

This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.

When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.

The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.

Responses to the resource leak findings

Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.

The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.


Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.