Tag Archives: artificial intelligence

The Algorithms That Make Instacart Roll

Post Syndicated from Sharath Rao original https://spectrum.ieee.org/artificial-intelligence/machine-learning/the-algorithms-that-make-instacart-roll

It’s Sunday morning, and, after your socially distanced morning hike, you look at your schedule for the next few days. You need to restock your refrigerator, but the weekend crowds at the supermarket don’t excite you. Monday and Tuesday are jam-packed with Zoom meetings, and you’ll also be supervising your children’s remote learning. In short, you aren’t going to make it to the grocery store anytime soon. So you pull out your phone, fire up the Instacart app, and select your favorite grocery store. You click through your list of previously purchased items, browse specials, search for a new key-lime sparkling water a friend recommended, then select a delivery window. About 2 hours later, you watch a shopper, wearing a face mask, place bags on your porch.

The transaction seems simple. But this apparent simplicity depends on a complex web of carefully choreographed technologies working behind the scenes, powered by a host of apps, data science, machine-learning algorithms, and human shoppers.

Grocery delivery isn’t a new concept, of course. In our great-grandparents’ day, people could select items at a neighborhood store and then walk home empty-handed, the groceries to follow later, likely transported by a teenager on a bicycle. Customers often had basics like milk and eggs delivered weekly. But with the advent of the fully stocked supermarket, with its broad selection of staples, produce, and specialty foods, customers shifted to selecting goods from store shelves and toting them home themselves, though in some cities local stores still offered delivery services.

Then in 1989, Peapod—followed in the mid-1990s by companies like Webvan and HomeGrocer—tried to revive grocery delivery for the Internet age. They invested heavily in sophisticated warehouses with automated inventory systems and fleets of delivery vehicles. While these services were adored by some for their high-quality products and short delivery windows, they never became profitable. Industry analysts concluded that the cost of building up delivery networks across dozens of large metro areas rapidly ate into the already thin margins of the grocery industry.

Timing, of course, is everything. Cloud computing and inexpensive smartphones emerged in the decade after the launch of the first-generation online grocery companies. By 2012, when Instacart began, these technologies had created an environment in which online grocery ordering could finally come into its own.

Today, retailers like Target and Whole Foods (via Amazon) offer delivery and pickup services, using their existing brick-and-mortar facilities. Some of these retailers run their delivery businesses from warehouses, some pull from the stocked shelves of retail stores, and some fulfill from a mix of both. Small, online-only companies like Good Eggs, Imperfect Foods, and Thrive Market offer curated selections of groceries sourced from local farms and suppliers.

Meanwhile, food and grocery delivery services emerged to bring brick-and-mortar restaurants and stores into the online economy. These businesses—which include DoorDash, Shipt, and Uber Eats in the United States, and Buymie, Deliveroo, and Grofers, based elsewhere—have built technology platforms and fulfillment networks that existing stores and restaurants can use to reach customers online. In this model, the retailer’s or restaurant’s physical location nearest the customer is the “warehouse,” and a community of independent contractors handles fulfillment and delivery.

Our employer, Instacart, is the North American leader in this type of online grocery service, with more than 500 grocers, including Aldi, Costco, Food Lion, Loblaws, Publix, Safeway, Sam’s Club, Sprouts Farmers Market, and Wegmans, encompassing nearly 40,000 physical store locations in the United States and Canada. At the onset of the COVID-19 pandemic, as consumers heeded stay-at-home orders, we saw our order volume surge by as much as 500 percent, compared with the volume during those same weeks in 2019. The increase prompted us to more than double the number of shoppers who can access the platform from 200,000 in early March to 500,000 by year-end.

Here’s how Instacart works.

From the customer’s perspective, the ordering process is simple. Customers start by opening a mobile app or logging on to a website. They enter their delivery zip code to see available retailers. After choosing a store or retail chain, they can browse virtual aisles of produce, deli, or snacks and search for specific products, clicking to add items to an online shopping cart and specifying weights or quantities as appropriate. When finished, they see a list of available 2-hour delivery windows, from later the same day to a week or more in the future. Customers can adjust their orders up until the shoppers start picking their items off store shelves, usually an hour or two before the delivery window. They can enter preferred substitutions beforehand or chat with their shoppers in real time about what’s available. Once the groceries are out of the store and on the move, customers get alerts when their orders are about to be delivered.

That’s Instacart from a customer’s perspective. Behind the scenes, we face huge technical challenges to make this process work. We have to keep track of the products in nearly 40,000 grocery stores—billions of different data points. We have to predict how many of our 500,000-plus shoppers will be online at a given time in a given area and available to work. We have to group multiple orders from different customers together into batches, so that the designated shopper can efficiently pick, pack, and deliver them. When products run out, we suggest the best replacements. Finally, we dispatch shoppers to different delivery addresses along the most efficient route. We’re crunching enormous volumes of data every day to keep the customer-facing Instacart app, our Shopper app, our business management tools, and other software all humming along.

Let’s start with how we keep track of products on the shelf. The average large supermarket has about 40,000 unique items. Our database includes the names of these products, plus images, descriptions, nutritional information, pricing, and close-to-real-time availability at every store. We process petabytes daily in order to keep these billions of data points current.

Back in 2012, when Instacart started making its first deliveries in the San Francisco Bay Area, we relied on manual methods to get this product data into our system. To stock our first set of virtual shelves, our founders and a handful of employees went to a store and purchased one of every item, filling up cart after cart. They took everything back to the office and entered the product data into the system by hand, taking photos with their phones. It worked, but it obviously wasn’t going to scale.

Today, Instacart aggregates product data from a variety of sources, relying on automated rule-based systems to sort it all out. Many stores send us inventory data once a day, including pricing and item availability, while other retailers send updates every few minutes. Large consumer products companies, like General Mills and Procter & Gamble, send us detailed product data, including images and descriptions. We also purchase specialized data from third-party companies, including nutrition and allergy information.

One listing in our database could have information from dozens of sources that must be sorted out. Let’s say a popular apple juice just underwent a rebranding, complete with new packaging. Our system has to decide whether to use the image provided by a third-party data provider last week, the image sent in by the local store last night, or the image submitted by the manufacturer earlier that morning.

Our rules address this problem. Usually images and other data provided by the manufacturer on the morning of a rebrand will be more up-to-date than data provided by individual stores the night before. But what if a store and the manufacturer upload data at about the same time? In this case, our rules tell the system to trust the image provided by the manufacturer and trust the price and availability data provided by the store. Our catalog updates automatically and continuously around the clock to account for all sorts of incremental changes—more than a billion data points every 24 hours on average.

Because Instacart doesn’t own and operate its own stores or warehouses, we don’t have a perfect picture of what is on the shelves of a particular store at any moment, much less what will be there later that day or several days in the future. Instead, we need to make well-informed predictions as we stock our virtual shelves. There’s a lot to consider here. Stores in certain regions may get produce shipments on, say, Monday mornings and meat shipments on Thursday evenings. Some stores restock their shelves periodically throughout the day, while others just restock at night. We’ve built two machine-learning models to help us understand what’s on each store’s shelves and manage our customers’ expectations about what they will actually receive in their grocery bags.

Our Item Availability Model predicts the likelihood that popular items are in stock at any location at any given time. We trained this model using our own data set, which includes millions of anonymized orders from across North America. Some items—like a particular brand of organic eggs, chips, or seasoned salt, or niche items like fresh-made stroopwafels—are considered “active,” meaning they’re regularly ordered year-round from a particular store. “Non-active” items include discontinued products as well as seasonal items like eggnog, Advent calendars, and Peeps marshmallows. The model looks at the history of how often our shoppers are able to purchase the items consumers order most. For each, it calculates an availability score ranging from 0.0 to 1.0; a score of 0.8 means the item has an 80 percent chance of being found in the store by a shopper. We can update this score in real time because our shoppers scan each item they pick up or else mark it as “not found.”

Having this score enables us to reduce the chances our customers will order items that won’t be on store shelves when our shoppers look for them, whether that’s a few hours away or days ahead. We do this in several ways. For example, if a customer’s favorite type of peanut butter has a very low availability score, we will automatically bump that listing down in the search results and, in turn, bump up similar products that have a higher availability score. In these times of supply-chain shortages, we’ll also add “out of stock” labels to affected items and prevent customers from adding them to their carts.

The COVID-19 pandemic pushed our Item Availability Model in a number of other ways and challenged our assumptions about customer behavior. In March 2020, at the start of the U.S. shelter-in-place orders, we saw massive spikes in demand for common household products like toilet paper and disinfecting wipes. Such items were flying off the shelves faster than retailers could stock them. Consumers behaved in new ways—instead of buying their preferred brand of toilet paper, they grabbed any kind of toilet paper they could find. In response, we broadened the number of products our availability model scores to include lesser-known products. We also tweaked the model to give less weight to historical data from several weeks earlier in favor of more recent data that might be more representative of the times.

If a customer adds an item with a low availability score to her cart, a second machine-learning model—our Item Replacement Recommendation Model—gets to work, prompting the customer to select a replacement from a menu of automatically generated alternatives in case the first-choice item isn’t in stock.

Giving customers great replacements is a critical part of making them happy. If you’re shopping in-store for yourself, our research suggests that you’ll have to find replacements for about 7 percent of the items on your list. When Instacart shoppers are shopping for you, they can’t just leave out an item—some items may be critical for you, and if you have to make your own trip to the store after unpacking an Instacart order, you might be less likely to use the service again. But our shoppers aren’t mind readers. If the store is out of your preferred brand of creamy peanut butter, should a shopper replace it with crunchy peanut butter from the same brand? What about a creamy peanut butter from a different brand?

We trained our Item Replacement Recommendation Model on a range of data inputs, including item name, product description, and five years of customer ratings of the success of our chosen replacements. When we present a menu of replacement choices, we rank them according to scores assigned by this model. If you select one of the replacements, we’ll remember it for your future orders; if you don’t, our shopper picks from products our model recommends.

That’s how machine learning helps us set expectations with our customers as they fill their shopping carts. Once an order is placed, another piece of technology enters the picture: the Shopper app. The vast majority of Instacart’s shoppers are independent contractors who have signed up to shop for us, meeting requirements and passing a background check. They drive to the stores, select items off the shelves, check out, and deliver the orders. They can choose to work at any time by logging onto the Shopper app. In certain high-volume stores, we also directly employ part-time shoppers who pick and pack orders and then hand them off to contractors for delivery.

The Shopper app includes a range of tools meant to make it easy to access new orders, address issues that shoppers encounter, and guide checkout and delivery. When shoppers are ready to work, they open up the app and select batches of orders. As they go through the store and fill orders, they can communicate with the customers via in-app chat. The Shopper app suggests an item-picking order to help the shopper navigate the store efficiently. Generally, this picking order puts refrigerated and frozen items, along with hot or fresh deli preparations, near the end of a shopping trip. Meanwhile, customers can watch their shopper’s progress via the Instacart app, tracking each item as it’s scanned into the shopper’s cart, approving replacement items, and viewing yet-to-be-shopped items.

When shoppers check out, they can charge the order to a physical card that Instacart mails to them or use a mobile payments system in the Shopper app. If they encounter a problem, they can communicate with our help team through the app. And when they complete a delivery, they can use the app to transfer their earnings to their bank accounts.

We have many orders coming in at once to the same stores, slated to be delivered in the same general vicinity. In a major metropolitan area, we may get more than 50 orders a minute. So we typically group orders into batches to be picked off store shelves at the same time.

Here, our Matching Algorithm comes into play. This technology applies rules of thumb and machine-learning models to try to balance the number of shoppers with customer demand in real time. The algorithm benefits from scale—the more orders we have in a given area, the more options we can give the algorithm and the better decisions it can make. It considers things like a shopper’s age: If shoppers are not yet 21, they may not be eligible to deliver orders containing alcohol. We rerun the Matching Algorithm as often as every few minutes as we get new information about orders and delivery locations.

The algorithm works hand in hand with our Capacity Model. This model calculates how much delivery capacity we have throughout the day as conditions on the ground change. We used machine learning to build this system; it takes demand predictions based on historical data and historical shopping speeds at individual stores and couples them with real-time data, including the number of shoppers completing orders and the number of orders waiting in a queue for each store. We rerun this model every 2 minutes to ensure that we’re getting a close-to-real-time understanding of our capacity. That’s why a customer may log on at 1:00 pm and see only one late-evening delivery slot remaining, but when they look again at 1:30 pm, they see a host of afternoon delivery slots pop up.

While these models are critical to Instacart’s operation, other tools are crucial for getting the groceries from the store to the customer smoothly and predictably.

Our Drive Time Model uses historical transit times and real-time traffic data to estimate when a shopper will arrive at the store. Our Parking Model calculates how long it can take the shopper to get in and out of a particular store’s parking lot. If a shopper is likely to spend 10 minutes cruising for a spot in a small, crowded parking lot, that needs to be built into delivery-time estimates for that store.

Once the shopper is ready to make deliveries, our Routing Algorithm comes into play. This model is our take on the classic “traveling salesman” problem. Given three customers at three different addresses in the same city, what’s the most efficient route from the store to the first location and from there to the next two? That’s tricky enough, but Instacart has to work with added complexity. For example, in highly dense areas like New York City, some shoppers may walk to their destinations. And we need to ensure that all three deliveries are made within their designated delivery windows—if a customer isn’t home, too early can be just as bad as too late. So our algorithm considers the projected arrival time, using real-time traffic conditions, to create a delivery route. Our system also sends the projected arrival time to the customer and an alert when the shopper is just a few minutes away.

All of our databases, machine-learning models, and geolocation technologies work in concert to build an efficient system. But no model is perfect.

And the COVID-19 pandemic proved to be an unexpected stress test for our systems. As stay-at-home orders rippled across North America, with more data flowing into the platform than ever before, we had to repeatedly reconfigure our databases and tools to keep up with the new demand. At the peak, we found ourselves making upgrades multiple times a week.

We also had to speed up the rollout of a new feature we had just started testing: Leave at My Door Delivery, which allows shoppers and customers to remain socially distant. Shoppers can drop groceries on the porch of a house or the reception or lobby area of an apartment building and send customers a photo of their completed orders at the site.

We are continually looking at ways to optimize our technology and operations. Right now, we are exploring how to improve the suggested picking orders in the Shopper app. Today we rely on a set of rule-based formulas guided by human intuition—for example, that it’s best to pick up fresh vegetables and fruit together, since they’re usually in the same section of the store. But not all stores have the same layout, aisles in a given store can be rearranged, and items may get moved around the store seasonally. We’re hoping we can use machine learning to develop an algorithm that determines such “rhythms” in the way a location should be shopped, based on historical item-picking data along with seasonal additions to store shelves and regular changes in store layouts.

As we add retailers and brands and serve more customers, our algorithms and technologies continue to evolve. We retrain all of our models over and over again to better reflect new activity on our platform. So the next time you click on the Instacart app and order groceries to get you through a busy week, know that anonymized data from your order and from your shopper will get fed into this feedback loop, informing the models we train and the technologies we build.

We are proud that our system has been able to keep groceries flowing to people across North America who have been sheltering at home during the pandemic, especially those who are particularly vulnerable to the novel coronavirus. These are extraordinary times, and we’ve taken our responsibility to serve our customers, shoppers, partners, and corporate employees very seriously, as well as to keep them safe. As the world continues to shop from home, we hope that our investments in machine learning will continue to make it easier for everyone to get access to the food they love and more time to enjoy it together.

About the Authors

Sharath Rao is director of machine learning at Instacart. Lily Zhang is Instacart’s director of software engineering.

AWS DeepRacer League’s 2021 Season Launches With New Open and Pro Divisions

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-deepracer-leagues-2021-season-launches-with-new-open-and-pro-divisions/

AWS DeepRacer League LogoAs a developer, I have been hearing a lot of stories lately about how companies have solved their business problems using machine learning (ML), so one of my goals for 2021 is to learn more about it.

For the last few years I have been using artificial intelligence (AI) services such as, Amazon Rekognition, Amazon Comprehend, and others extensively. AI services provide a simple API to solve common ML problems such as image recognition, text to speech, and analysis of sentiment in a text. When using these high-level APIs, you don’t need to understand how the underlying ML model works, nor do you have to train it, or maintain it in any way.

Even though those services are great and I can solve most of my business cases with them, I want to understand how ML algorithms work, and that is how I started tinkering with AWS DeepRacer.

AWS DeepRacer, a service that helps you learn reinforcement learning (RL), has been around since 2018. RL is an advanced ML technique that takes a very different approach to training models than other ML methods. Basically, it can learn very complex behavior without requiring any labeled training data, and it can make short-term decisions while optimizing for a long-term goal.

AWS DeepRacer is an autonomous 1/18th scale race car designed to test RL models by racing virtually in the AWS DeepRacer console or physically on a track at AWS and customer events. AWS DeepRacer is for developers of all skill levels, even if you don’t have any ML experience. When learning RL using AWS DeepRacer, you can take part in the AWS DeepRacer League where you get experience with machine learning in a fun and competitive environment.

Over the past year, the AWS DeepRacer League’s races have gone completely virtual and participants have competed for different kinds of prizes. However, the competition has become dominated by experts and newcomers haven’t had much of a chance to win.

The 2021 season introduces new skill-based Open and Pro racing divisions, where racers of all skill levels have five times more opportunities to win rewards than in previous seasons.

Image of the leagues in the console

How the New AWS DeepRacer Racing Divisions Work

The 2021 AWS DeepRacer league runs from March 1 through the end of October. When it kicks off, all participants will enter the Open division, a place to have fun and develop your RL knowledge with other community members.

At the end of every month, the top 10% of the Open division leaderboard will advance to the Pro division for the remainder of the season; they’ll also receive a Pro Welcome kit full of AWS DeepRacer swag. Pro division racers can win DeepRacer Evo cars and AWS DeepRacer merchandise such as hats and T-shirts.

At the end of every month, the top 16 racers in the Pro division will compete against each other in a live race console. That race will determine who will advance that month to the 2021 Championship Cup at re:Invent 2021.

The monthly Pro division winner gets an expenses-paid trip to re:Invent 2021 and participates in the Championship Cup to get a chance to win a Machine Learning education sponsorship worth $20k.

In both divisions, you can collect digital rewards, including vehicle customizations and accessories which will be released to participants once the winners are announced each month. 

You can start racing in the Open division any time during the 2021 season. Get started here!

Image of my racer profileNew Racer Profiles Increase the Fun

At the end of March, you will be able to create a new racer profile with an avatar and show the world which country you are representing.

I hope to see you in the new AWS DeepRacer season, where I’ll start in the Open division as MaVi.

Start racing today and train your first model for free! 


Improving the CPU and latency performance of Amazon applications using AWS CodeGuru Profiler

Post Syndicated from Neha Gupta original https://aws.amazon.com/blogs/devops/improving-the-cpu-and-latency-performance-of-amazon-applications-using-aws-codeguru-profiler/

Amazon CodeGuru Profiler is a developer tool powered by machine learning (ML) that helps identify an application’s most expensive lines of code and provides intelligent recommendations to optimize it. You can identify application performance issues and troubleshoot latency and CPU utilization issues in your application.

You can use CodeGuru Profiler to optimize performance for any application running on AWS Lambda, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), AWS Fargate, or AWS Elastic Beanstalk, and on premises.

This post gives a high-level overview of how CodeGuru Profiler has reduced CPU usage and latency by approximately 50% and saved around $100,000 a year for a particular Amazon retail service.

Technical and business value of CodeGuru Profiler

CodeGuru Profiler is easy and simple to use, just turn it on and start using it. You can keep it running in the background and you can just look into the CodeGuru Profiler findings and implement the relevant changes.

It’s fairly low cost and unlike traditional tools that take up lot of CPU and RAM, running CodeGuru Profiler has less than 1% impact on total CPU usage overhead to applications and typically uses no more than 100 MB of memory.

You can run it in a pre-production environment to test changes to ensure no impact occurs on your application’s key metrics.

It automatically detects performance anomalies in the application stack traces that start consuming more CPU or show increased latency. It also provides visualizations and recommendations on how to fix performance issues and the estimated cost of running inefficient code. Detecting the anomalies early prevents escalating the issue in production. This helps you prioritize remediation by giving you enough time to fix the issue before it impacts your service’s availability and your customers’ experience.

How we used CodeGuru Profiler at Amazon

Amazon has on-boarded many of its applications to CodeGuru Profiler, which has resulted in an annual savings of millions of dollars and latency improvements. In this post, we discuss how we used CodeGuru Profiler on an Amazon Prime service. A simple code change resulted in saving around $100,000 for the year.

Opportunity to improve

After a change to one of our data sources that caused its payload size to increase, we expected a slight increase to our service latency, but what we saw was higher than expected. Because CodeGuru Profiler is easy to integrate, we were able to quickly make and deploy the changes needed to get it running on our production environment.

After loading up the profile in Amazon CodeGuru Profiler, it was immediately apparent from the visualization that a very large portion of the service’s CPU time was being taken up by Jackson deserialization (37%, across the two call sites). It was also interesting that most of the blocking calls in the program (in blue) was happening in the jackson.databind method _createAndCacheValueDeserializer.

Flame graphs represent the relative amount of time that the CPU spends at each point in the call graph. The wider it is, the more CPU usage it corresponds to.

The following flame graph is from before the performance improvements were implemented.

The Flame Graph before the deployment

Looking at the source for _createAndCacheValueDeserializer confirmed that there was a synchronized block. From within it, _createAndCache2 was called, which actually did the adding to the cache. Adding to the cache was guarded by a boolean condition which had a comment that indicated that caching would only be enabled for custom serializers if @JsonCachable was set.


Checking the documentation for @JsonCachable confirmed that this annotation looked like the correct solution for this performance issue. After we deployed a quick change to add @JsonCachable to our four custom deserializers, we observed that no visible time was spent in _createAndCacheValueDeserializer.


Adding a one-line annotation in four different places made the code run twice as fast. Because it was holding a lock while it recreated the same deserializers for every call, this was allowing only one of the four CPU cores to be used and therefore causing latency and inefficiency. Reusing the deserializers avoided repeated work and saved us lot of resources.

After the CodeGuru Profiler recommendations were implemented, the amount of CPU spent in Jackson reduced from 37% to 5% across the two call paths, and there was no visible blocking. With the removal of the blocking, we could run higher load on our hosts and reduce the fleet size, saving approximately $100,000 a year in Amazon EC2 costs, thereby resulting in overall savings.

The following flame graph shows performance after the deployment.

The Flame Graph after the deployment


The following graph shows that CPU usage reduced by almost 50%. The blue line shows the CPU usage the week before we implemented CodeGuru Profiler recommendations, and green shows the dropped usage after deploying. We could later safely scale down the fleet to reduce costs, while still having better performance than prior to the change.

Average Fleet CPU Utilization


The following graph shows the server latency, which also dropped by almost 50%. The latency dropped from 100 milliseconds to 50 milliseconds as depicted in the initial portion of the graph. The orange line depicts p99, green p99.9, and blue p50 (mean latency).

Server Latency



With a few lines of changed code and a half-hour investigation, we removed the bottleneck which led to lower utilization of resources and  thus we were able to decrease the fleet size. We have seen many similar cases, and in one instance, a change of literally six characters of inefficient code, reduced CPU usage from 99% to 5%.

Across Amazon, CodeGuru Profiler has been used internally among various teams and resulted in millions of dollars of savings and performance optimization. You can use CodeGuru Profiler for quick insights into performance issues of your application. The more efficient the code and application is, the less costly it is to run. You can find potential savings for any application running in production and significantly reduce infrastructure costs using CodeGuru Profiler. Reducing fleet size, latency, and CPU usage is a major win.



About the Authors

Neha Gupta

Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.

Ian Clark

Ian is a Senior Software engineer with the Last Mile organization at Amazon. In his spare time, he enjoys exploring the Vancouver area with his family.

AI Agents Play “Hide the Toilet Plunger” to Learn Deep Concepts About Life

Post Syndicated from Eliza Strickland original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/ai-agent-learns-about-the-world-by-gameplay

Most papers about artificial intelligence don’t cite Jean Piaget, the social scientist known for his groundbreaking studies of children’s cognitive development in the 1950s. But there he is, in a paper from the Allen Institute for AI (AI2). The researchers state that their AI agents learned the concept of object permanence—the understanding that an object hidden from view is still there—thus making those AI agents similar to a baby who just figured out the trick behind peekaboo. 

The researchers’ AI agents learned this precept and other rudimentary rules about the world by playing many games of hide and seek with objects, which took place within a simulated, but fairly realistic, house. The AI2 team calls the game “Cache,” but I prefer to call it “Hide the Toilet Plunger.” The agents also got to hide tomatoes, loaves of bread, cups, and knives.

The AI agents, which acted as both hiders and seekers, figured out the game via reinforcement learning. Starting out, they didn’t know anything about the 3D visual environment. They began by taking random actions like pulling on the handle of a drawer or pulling on an immovable wall, and they dropped their objects in all sorts of random places. The agents got better by playing against each other and learning from outcomes—if the seeker didn’t find the tomato, the hider knew it had chosen a good hiding place. 

The paper, which was recently accepted for the 2021 International Conference on Learning Representations, hasn’t yet been published in a peer reviewed journal.

Unlike many projects concerning AI and gameplay, the point here wasn’t to create an AI super-player that could destroy puny humans. Rather, the researchers wanted to see if an AI agent could achieve a more generalized kind of visual intelligence if it learned about the world via gameplay.

“For us, the question was: Can it learn very basic things about objects and their attributes by interacting with them?” says Aniruddha Kembhavi, a research manager with AI2’s computer vision team and a paper coauthor.

This AI2 team is working on representation learning, in which AI systems are given some input—images, audio, text, etc.—and learn to categorize the data according to its features. In computer vision, for example, an AI system might learn the features that represent a cat or a traffic light. Ideally, though, it doesn’t learn only the categories, it also learns how to categorize data, making it useful even when given images of objects it has never before seen.

Visual representation learning has evolved over the past decade, Kembhavi explains. When deep learning took off, researchers first trained AI systems on databases of labeled images, such as the famous ImageNet. Because the labels enable the AI system to check its work, that technique is called supervised learning. “Then in past few years, the buzz has gone from supervised learning to self-supervised learning,” says Kembhavi, in which AI systems have to determine the labels for themselves. “We believe that an even more general way of doing it is gameplay—we just let the agents play around, and they figure it out.” 

Once the AI2 agents had gotten good at the game, the researchers ran them through a variety of tests designed to test their understanding of the world. They first tested them on computer-generated images of rooms, asking them to predict traits such as depth of field and the geometry of objects. When compared to a model trained on the gold-standard ImageNet, the AI2 agents performed as well or better. They also tested them on photographs of real rooms; while they didn’t do as well as the ImageNet-trained model there, they did better than expected—an important indication that training in simulated environments could produce AI systems that function in the real world. 

The tests that really excited the researchers, though, were those inspired by developmental psychology. They wanted to determine whether the AI agents grasped certain “cognitive primitives,” or basic elements of understanding that can be built upon. They found that the agents understood the principles of containment, object permanence, and that they could rank images according to how much free space they contained. That ranking test was an attempt to get at a concept that Jean Piaget called seriation, or the ability to order objects based on a common property. 

If you’re thinking, “Haven’t I read something in IEEE Spectrum before about AI agents playing hide and seek?” you are not wrong, and you are also a faithful reader. In 2019, I covered an OpenAI project in which the hiders and seekers surprised the researchers by coming up with strategies that weren’t supposed to be possible in the game environment.  

Igor Mordatch, one of the OpenAI researchers behind that project, says he’s excited to see that AI2’s research doesn’t focus on external behaviors within the game, but rather the “internal representations of the world emerging in the minds of these agents,” he says in an email. “Representation learning is thought to be one of the key components to progress in general-purpose AI systems today, so any advances in this area would be highly impactful.”

As for transferring any advances from their research to the real world, the AI2 researchers say that the agents’ dynamic understanding of how objects act in time and space could someday be useful to robots. But they have no intention of doing robot experiments anytime soon. Training in simulation took several weeks; training in the real world would be infeasible. “Also, there’s a safety issue,” notes study coauthor Roozbeh Motaghi, also a research manager at AI2.“These agents do random stuff.” Just think of the havoc that could be wreaked on a lab by a rogue robot carrying a toilet plunger.

Machine learning and depth estimation using Raspberry Pi

Post Syndicated from David Plowman original https://www.raspberrypi.org/blog/machine-learning-and-depth-estimation-using-raspberry-pi/

One of our engineers, David Plowman, describes machine learning and shares news of a Raspberry Pi depth estimation challenge run by ETH Zürich (Swiss Federal Institute of Technology).

Spoiler alert – it’s all happening virtually, so you can definitely make the trip and attend, or maybe even enter yourself.

What is Machine Learning?

Machine Learning (ML) and Artificial Intelligence (AI) are some of the top engineering-related buzzwords of the moment, and foremost among current ML paradigms is probably the Artificial Neural Network (ANN).

They involve millions of tiny calculations, merged together in a giant biologically inspired network – hence the name. These networks typically have millions of parameters that control each calculation, and they must be optimised for every different task at hand.

This process of optimising the parameters so that a given set of inputs correctly produces a known set of outputs is known as training, and is what gives rise to the sense that the network is “learning”.

A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel
A popular type of ANN used for processing images is the Convolutional Neural Network. Many small calculations are performed on groups of input pixels to produce each output pixel

Machine Learning frameworks

A number of well known companies produce free ML frameworks that you can download and use on your own computer. The network training procedure runs best on machines with powerful CPUs and GPUs, but even using one of these pre-trained networks (known as inference) can be quite expensive.

One of the most popular frameworks is Google’s TensorFlow (TF), and since this is rather resource intensive, they also produce a cut-down version optimised for less powerful platforms. This is TensorFlow Lite (TFLite), which can be run effectively on Raspberry Pi.

Depth estimation

ANNs have proven very adept at a wide variety of image processing tasks, most notably object classification and detection, but also depth estimation. This is the process of taking one or more images and working out how far away every part of the scene is from the camera, producing a depth map.

Here’s an example:

Depth estimation example using a truck

The image on the right shows, by the brightness of each pixel, how far away the objects in the original (left-hand) image are from the camera (darker = nearer).

We distinguish between stereo depth estimation, which starts with a stereo pair of images (taken from marginally different viewpoints; here, parallax can be used to inform the algorithm), and monocular depth estimation, working from just a single image.

The applications of such techniques should be clear, ranging from robots that need to understand and navigate their environments, to the fake bokeh effects beloved of many modern smartphone cameras.

Depth Estimation Challenge

C V P R conference logo with dark blue background and the edge of the earth covered in scattered orange lights connected by white lines

We were very interested then to learn that, as part of the CVPR (Computer Vision and Pattern Recognition) 2021 conference, Andrey Ignatov and Radu Timofte of ETH Zürich were planning to run a Monocular Depth Estimation Challenge. They are specifically targeting the Raspberry Pi 4 platform running TFLite, and we are delighted to support this effort.

For more information, or indeed if any technically minded readers are interested in entering the challenge, please visit:

The conference and workshops are all taking place virtually in June, and we’ll be sure to update our blog with some of the results and models produced for Raspberry Pi 4 by the competing teams. We wish them all good luck!

The post Machine learning and depth estimation using Raspberry Pi appeared first on Raspberry Pi.

Improving AWS Java applications with Amazon CodeGuru Reviewer

Post Syndicated from Rajdeep Mukherjee original https://aws.amazon.com/blogs/devops/improving-aws-java-applications-with-amazon-codeguru-reviewer/

Amazon CodeGuru Reviewer is a machine learning (ML)-based AWS service for providing automated code reviews comments on your Java and Python applications. Powered by program analysis and ML, CodeGuru Reviewer detects hard-to-find bugs and inefficiencies in your code and leverages best practices learned from across millions of lines of open-source and Amazon code. You can start analyzing your code through pull requests and full repository analysis (for more information, see Automating code reviews and application profiling with Amazon CodeGuru).

The recommendations generated by CodeGuru Reviewer for Java fall into the following categories:

  • AWS best practices
  • Concurrency
  • Security
  • Resource leaks
  • Other specialized categories such as sensitive information leaks, input validation, and code clones
  • General best practices on data structures, control flow, exception handling, and more

We expect the recommendations to benefit beginners as well as expert Java programmers.

In this post, we showcase CodeGuru Reviewer recommendations related to using the AWS SDK for Java. For in-depth discussion of other specialized topics, see our posts on concurrency, security, and resource leaks. For Python applications, see Raising Python code quality using Amazon CodeGuru.

The AWS SDK for Java simplifies the use of AWS services by providing a set of features that are consistent and familiar for Java developers. The SDK has more than 250 AWS service clients, which are available on GitHub. Service clients include services like Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Kinesis, Amazon Elastic Compute Cloud (Amazon EC2), AWS IoT, and Amazon SageMaker. These services constitute more than 6,000 operations, which you can use to access AWS services. With such rich and diverse services and APIs, developers may not always be aware of the nuances of AWS API usage. These nuances may not be important at the beginning, but become critical as the scale increases and the application evolves or becomes diverse. This is why CodeGuru Reviewer has a category of recommendations: AWS best practices. This category of recommendations enables you to become aware of certain features of AWS APIs so your code can be more correct and performant.

The first part of this post focuses on the key features of the AWS SDK for Java as well as API patterns in AWS services. The second part of this post demonstrates using CodeGuru Reviewer to improve code quality for Java applications that use the AWS SDK for Java.

AWS SDK for Java

The AWS SDK for Java supports higher-level abstractions for simplified development and provides support for cross-cutting concerns such as credential management, retries, data marshaling, and serialization. In this section, we describe a few key features that are supported in the AWS SDK for Java. Additionally, we discuss some key API patterns such as batching, and pagination, in AWS services.

The AWS SDK for Java has the following features:

  • Waiters Waiters are utility methods that make it easy to wait for a resource to transition into a desired state. Waiters makes it easier to abstract out the polling logic into a simple API call. The waiters interface provides a custom delay strategy to control the sleep time between retries, as well as a custom condition on whether polling of a resource should be retried. The AWS SDK for Java also offer an async variant of waiters.
  • Exceptions The AWS SDK for Java uses runtime (or unchecked) exceptions instead of checked exceptions in order to give you fine-grained control over the errors you want to handle and to prevent scalability issues inherent with checked exceptions in large applications. Broadly, the AWS SDK for Java has two types of exceptions:
    • AmazonClientException – Indicates that a problem occurred inside the Java client code, either while trying to send a request to AWS or while trying to parse a response from AWS. For example, the AWS SDK for Java throws an AmazonClientException if no network connection is available when you try to call an operation on one of the clients.
    • AmazonServiceException – Represents an error response from an AWS service. For example, if you try to end an EC2 instance that doesn’t exist, Amazon EC2 returns an error response, and all the details of that response are included in the AmazonServiceException that’s thrown. For some cases, a subclass of AmazonServiceException is thrown to allow you fine-grained control over handling error cases through catch blocks.

The API has the following patterns:

  • Batching – A batch operation provides you with the ability to perform a single CRUD operation (create, read, update, delete) on multiple resources. Some typical use cases include the following:
  • Pagination – Many AWS operations return paginated results when the response object is too large to return in a single response. To enable you to perform pagination, the request and response objects for many service clients in the SDK provide a continuation token (typically named NextToken) to indicate additional results.

AWS best practices

Now that we have summarized the SDK-specific features and API patterns, let’s look at the CodeGuru Reviewer recommendations on AWS API use.

The CodeGuru Reviewer recommendations for the AWS SDK for Java range from detecting outdated or deprecated APIs to warning about API misuse, missing pagination, authentication and exception scenarios, and using efficient API alternatives. In this section, we discuss a few examples patterned after real code.

Handling pagination

Over 1,000 APIs from more than 150 AWS services have pagination operations. The pagination best practice rule in CodeGuru covers all the pagination operations. In particular, the pagination rule checks if the Java application correctly fetches all the results of the pagination operation.

The response of a pagination operation in AWS SDK for Java 1.0 contains a token that has to be used to retrieve the next page of results. In the following code snippet, you make a call to listTables(), a DynamoDB ListTables operation, which can only return up to 100 table names per page. This code might not produce complete results because the operation returns paginated results instead of all results.

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        List<String> tables = dynamoDbClient.listTables().getTableNames();

CodeGuru Reviewer detects the missing pagination in the code snippet and makes the following recommendation to add another call to check for additional results.

Screenshot of recommendations for introducing pagination checks

You can accept the recommendation and add the logic to get the next page of table names by checking if a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page. If such a token is present, it’s used in a subsequent request to fetch the next page of results. See the following code:

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        ListTablesRequest listTablesRequest = ListTablesRequest.builder().build();
        boolean done = false;
        while (!done) {
            ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
            if (listTablesResponse.lastEvaluatedTableName() == null) {
                done = true;
            listTablesRequest = listTablesRequest.toBuilder()

Handling failures in batch operation calls

Batch operations are common with many AWS services that process bulk requests. Batch operations can succeed without throwing exceptions even if some items in the request fail. Therefore, a recommended practice is to explicitly check for any failures in the result of the batch APIs. Over 40 APIs from more than 20 AWS services have batch operations. The best practice rule in CodeGuru Reviewer covers all the batch operations. In the following code snippet, you make a call to sendMessageBatch, a batch operation from Amazon SQS, but it doesn’t handle any errors returned by that batch operation:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
    sqsClient.sendMessageBatch(sqsEndPoint, batch);

CodeGuru Reviewer detects this issue and makes the following recommendation to check the return value for failures.

Screenshot of recommendations for batch operations

You can accept this recommendation and add logging for the complete list of messages that failed to send, in addition to throwing an SQSUpdateException. See the following code:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
    SendMessageBatchResult result = sqsClient.sendMessageBatch(sqsEndPoint, batch);
    final List<BatchResultErrorEntry> failed = result.getFailed();
    if (!failed.isEmpty()) {
           final String failedMessage = failed.stream()
                         .map(batchResultErrorEntry -> 
                            String.format("…", batchResultErrorEntry.getId(), 
           throw new SQSUpdateException("Error occurred while sending 
                                        messages to SQS::" + failedMessage);

Exception handling best practices

Amazon S3 is one of the most popular AWS services with our customers. A frequent operation with this service is to upload a stream-based object through an Amazon S3 client. Stream-based uploads might encounter occasional network connectivity or timeout issues, and the best practice to address such a scenario is to properly handle the corresponding ResetException error. ResetException extends SdkClientException, which subsequently extends AmazonClientException. Consider the following code snippet, which lacks such exception handling:

private void uploadInputStreamToS3(String bucketName, 
                                   InputStream input, 
                                   String key, ObjectMetadata metadata) 
                         throws SdkClientException {
    final AmazonS3Client amazonS3Client;
    PutObjectRequest putObjectRequest =
          new PutObjectRequest(bucketName, key, input, metadata);

In this case, CodeGuru Reviewer correctly detects the missing handling of the ResetException error and suggests possible solutions.

Screenshot of recommendations for handling exceptions

This recommendation is rich in that it provides alternatives to suit different use cases. The most common handling uses File or FileInputStream objects, but in other cases explicit handling of mark and reset operations are necessary to reliably avoid a ResetException.

You can fix the code by explicitly setting a predefined read limit using the setReadLimit method of RequestClientOptions. Its default value is 128 KB. Setting the read limit value to one byte greater than the size of stream reliably avoids a ResetException.

For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset always work for 100,000 bytes or less. However, this might cause some streams to buffer that number of bytes into memory.

The fix reliably avoids ResetException when uploading an object of type InputStream to Amazon S3:

private void uploadInputStreamToS3(String bucketName, InputStream input, 
                                   String key, ObjectMetadata metadata) 
                             throws SdkClientException {
        final AmazonS3Client amazonS3Client;
        final Integer READ_LIMIT = 10000;
        PutObjectRequest putObjectRequest =
   			new PutObjectRequest(bucketName, key, input, metadata);  

Replacing custom polling with waiters

A common activity when you’re working with services that are eventually consistent (such as DynamoDB) or have a lead time for creating resources (such as Amazon EC2) is to wait for a resource to transition into a desired state. The AWS SDK provides the Waiters API, a convenient and efficient feature for waiting that abstracts out the polling logic into a simple API call. If you’re not aware of this feature, you might come up with a custom, and potentially inefficient polling logic to determine whether a particular resource had transitioned into a desired state.

The following code appears to be waiting for the status of EC2 instances to change to shutting-down or terminated inside a while (true) loop:

private boolean terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    long start = System.currentTimeMillis();
    while (true) {
        try {
            DescribeInstanceStatusResult describeInstanceStatusResult = 
                            ec2Client.describeInstanceStatus(new DescribeInstanceStatusRequest()
            List<InstanceStatus> instanceStatusList = 
            long finish = System.currentTimeMillis();
            long timeElapsed = finish - start;
            if (timeElapsed > INSTANCE_TERMINATION_TIMEOUT) {
            if (instanceStatusList.size() < 1) {
            currentState = instanceStatusList.get(0).getInstanceState().getName();
            if ("shutting-down".equals(currentState) || "terminated".equals(currentState)) {
                return true;
             } else {
        } catch (AmazonServiceException ex) {
            throw ex;

CodeGuru Reviewer detects the polling scenario and recommends you use the waiters feature to help improve efficiency of such programs.

Screenshot of recommendations for introducing waiters feature

Based on the recommendation, the following code uses the waiters function that is available in the AWS SDK for Java. The polling logic is replaced with the waiters() function, which is then run with the call to waiters.run(…), which accepts custom provided parameters, including the request and optional custom polling strategy. The run function polls synchronously until it’s determined that the resource transitioned into the desired state or not. The SDK throws a WaiterTimedOutException if the resource doesn’t transition into the desired state even after a certain number of retries. The fixed code is more efficient, simple, and abstracts the polling logic to determine whether a particular resource had transitioned into a desired state into a simple API call:

public void terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    Waiter<DescribeInstancesRequest> waiter = ec2Client.waiters().instanceTerminated();
    ec2Client.terminateInstances(new TerminateInstancesRequest().withInstanceIds(instanceId));
    try {
        waiter.run(new WaiterParameters()
              .withRequest(new DescribeInstancesRequest()
              .withPollingStrategy(new PollingStrategy(new MaxAttemptsRetryStrategy(60), 
                    new FixedDelayStrategy(5))));
    } catch (WaiterTimedOutException e) {
        List<InstanceStatus> instanceStatusList = ec2Client.describeInstanceStatus(
               new DescribeInstanceStatusRequest()
        String state;
        if (instanceStatusList != null && instanceStatusList.size() > 0) {
            state = instanceStatusList.get(0).getInstanceState().getName();

Service-specific best practice recommendations

In addition to the SDK operation-specific recommendations in the AWS SDK for Java we discussed, there are various AWS service-specific best practice recommendations pertaining to service APIs for services such as Amazon S3, Amazon EC2, DynamoDB, and more, where CodeGuru Reviewer can help to improve Java applications that use AWS service clients. For example, CodeGuru can detect the following:

  • Resource leaks in Java applications that use high-level libraries, such as the Amazon S3 TransferManager
  • Deprecated methods in various AWS services
  • Missing null checks on the response of the GetItem API call in DynamoDB
  • Missing error handling in the output of the PutRecords API call in Kinesis
  • Anti-patterns such as binding the SNS subscribe or createTopic operation with Publish operation


This post introduced how to use CodeGuru Reviewer to improve the use of the AWS SDK in Java applications. CodeGuru is now available for you to try. For pricing information, see Amazon CodeGuru pricing.

To Really Judge an AI’s Smarts, Give it One of These IQ Tests

Post Syndicated from Matthew Hutson original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/how-do-you-test-the-iq-of-ai

Chess was once seen as an ultimate test of intelligence, until computers defeated humans while showing none of the other broad capabilities we associate with smarts. Artificial intelligence has since bested humans at Go, some types of poker, and many video games.

So researchers are developing AI IQ tests meant to assess deeper humanlike aspects of intelligence, such as concept learning and analogical reasoning. So far, computers have struggled on many of these tasks, which is exactly the point. The test-makers hope their challenges will highlight what’s missing in AI, and guide the field toward machines that can finally think like us.

A common human IQ test is Raven’s Progressive Matrices, in which one needs to complete an arrangement of nine abstract drawings by deciphering the underlying structure and selecting the missing drawing from a group of options. Neural networks have gotten pretty good at that task. But a paper presented in December at the massive AI conference known as NeurIPS offers a new challenge: The AI system must generate a fitting image from scratch, an ultimate test of understanding the pattern.

“If you are developing a computer vision system, usually it recognizes without really understanding what’s in the scene,” says Lior Wolf, a computer scientist at Tel Aviv University and Facebook, and the paper’s senior author. This task requires understanding composition and rules, “so it’s a very neat problem.” The researchers also designed a neural network to tackle the task—according to human judges, it gets about 70 percent correct, leaving plenty of room for improvement.

Other tests are harder still. Another NeurIPS paper presented a software-generated dataset of so-called Bongard Problems, a classic test for humans and computers. In their version, called Bongard-LOGO, one sees a few abstract sketches that match a pattern and a few that don’t, and one must decide if new sketches match the pattern.

The puzzles test “compositionality,” or the ability to break a pattern down into its component parts, which is a critical piece of intelligence, says Anima Anandkumar, a computer scientist at the California Institute of Technology and the paper’s senior author. Humans got the correct answer more than 90 percent of the time, the researchers found, but state-of-the-art visual processing algorithms topped out around 65 percent (with chance being 50 percent). “That’s the beauty of it,” Anandkumar said of the test, “that something so simple can still be so challenging for AI.” They’re currently developing a version of the test with real images.

Compositional thinking might help machines perform in the real world. Imagine a street scene, Anandkumar says. An autonomous vehicle needs to break it down into general concepts like cars and pedestrians to predict what will happen next. Compositional thinking would also make AI more interpretable and trustworthy, she added. One might peer inside to see how it pieces evidence together. 

Still harder tests are out there. In 2019, François Chollet, an AI researcher at Google, created the Abstraction and Reasoning Corpus (ARC), a set of visual puzzles tapping into core human knowledge of geometry, numbers, physics, and even goal directness. On each puzzle, one sees one or more pairs of grids filled with colored squares, each pair a sort of before-and-after grid. One also sees a new grid and fills in its partner according to whatever rule one has inferred.

A website called Kaggle held a competition with the puzzles and awarded $20,000 last May to the three teams with the best-performing algorithms. The puzzles are pretty easy for humans, but the top AI barely reached 20 percent. “That’s a big red flag that tells you there’s something interesting there,” Chollet says, “that we’re missing something.”

The current wave of advancement in AI is driven largely by multi-layered neural networks, also known as deep learning. But, Chollet says, these neural nets perform “abysmally” on the ARC. The Kaggle winners used old-school methods that combine handwritten rules rather than learning subtle patterns from gobs of data. Though he sees a role for both paradigms in tandem. A neural net might translate messy perceptual data into a structured form that symbolic processing can handle.

Anandkumar agrees with the need for a hybrid approach. Much of deep learning’s progress now comes from making it deeper and deeper, with bigger and bigger neural nets, she says. “The scale now is so enormous that I think we’ll see more work trying to do more with less.”

Anandkumar and Chollet point out one misconception about intelligence: People confuse it with skill. Instead, they say, it’s the ability to pick up new skills easily. That may be why deep learning so often falters. It typically requires lots of training and doesn’t generalize to new tasks, whereas the Bongard and ARC problems require solving a variety of puzzles with only a few examples of each. Maybe a good test of AI IQ would be for a computer to read this article and come up with a new IQ test.

OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language)

Post Syndicated from Eliza Strickland original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/open-ais-powerful-text-generating-tool-is-ready-for-business

Last September, a data scientist named Vinay Prabhu was playing around with an app called Philosopher AI. The app provides access to the artificial intelligence system known as GPT-3, which has incredible abilities to generate fluid and natural-seeming text. The creator of that underlying technology, the San Francisco company OpenAI, has allowed hundreds of developers and companies to try out GPT-3 in a wide range of applications, including customer service, video games, tutoring services, and mental health apps. The company says tens of thousands more are on the waiting list.

Philosopher AI is meant to show people the technology’s astounding capabilities—and its limits. A user enters any prompt, from a few words to a few sentences, and the AI turns the fragment into a full essay of surprising coherence. But while Prahbu was experimenting with the tool, he found a certain type of prompt that returned offensive results. “I tried: What ails modern feminism? What ails critical race theory? What ails leftist politics?” he tells IEEE Spectrum.  

The results were deeply troubling. Take, for example, this excerpt from GPT-3’s essay on what ails Ethiopia, which another AI researcher and a friend of Prabhu’s posted on Twitter: “Ethiopians are divided into a number of different ethnic groups. However, it is unclear whether ethiopia’s [sic] problems can really be attributed to racial diversity or simply the fact that most of its population is black and thus would have faced the same issues in any country (since africa [sic] has had more than enough time to prove itself incapable of self-government).”

Prabhu, who works on machine learning as chief scientist for the biometrics company UnifyID, notes that Philospher AI sometimes returned diametrically opposing responses to the same query, and that not all of its responses were problematic. “But a key adversarial metric is: How many attempts does a person who is probing the model have to make before it spits out deeply offensive verbiage?” he says. “In all of my experiments, it was on the order of two or three.”  

The Philosopher AI incident laid bare the potential danger that companies face as they work with this new and largely untamed technology, and as they deploy commercial products and services powered by GPT-3. Imagine the toxic language that surfaced in the Philosopher AI app appearing in another context—your customer service representative, an AI companion that rides around in your phone, your online tutor, the characters in your video game, your virtual therapist, or an assistant who writes your emails.

Those are not theoretical concerns. Spectrum spoke with beta users of the API who are working to incorporate GPT-3 into such applications and others. The good news is that all the users Spectrum talked with were actively thinking about how to deploy the technology safely.

The Vancouver-based developer behind the Philosopher AI app, Murat Ayfer, says he created it to both further his own understanding of GPT-3’s potential and to educate the public. He quickly discovered the many ways in which his app could go wrong. “With automation, you need either a 100 percent success rate, or you need it to error out gracefully,” he tells Spectrum. “The problem with GPT-3 is that it doesn’t error out, it just produces garbage—and there’s no way to detect if it’s producing garbage.”

GPT-3 Learned From Us

The fundamental problem is that GPT-3 learned about language from the Internet: Its massive training dataset included not just news articles, Wikipedia entries, and online books, but also every unsavory discussion on Reddit and other sites. From that morass of verbiage—both upstanding and unsavory—it drew 175 billion parameters that define its language. As Prabhu puts it: “These things it’s saying, they’re not coming out of a vacuum. It’s holding up a mirror.” Whatever GPT-3’s failings, it learned them from humans.

Following some outcry about the PhilosopherAI app—another response that ended up on Twitter started with cute rabbits but quickly devolved into a discussion of reproductive organs and rape—Ayfer made changes. He had already been steadily working on the app’s content filter, causing more prompts to return the polite response: “Philosopher AI is not providing a response for this topic, because we know this system has a tendency to discuss some topics using unsafe and insensitive language.” He also added a function that let users report offensive responses.

Ayfer argues that Philospher AI is a “relatively harmless context” for GPT-3 to generate offensive content. “It’s probably better to make mistakes now, so we can really learn how to fix them,” he says.  

That’s just what OpenAI intended when it launched the API that enables access to GPT-3 last June, and announced a private beta test in which carefully selected users would develop applications for the technology under the company’s watchful eye. The blog post noted that OpenAI will be guarding against “obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing,” and will be looking for unexpected problems: “We also know we can’t anticipate all of the possible consequences of this technology.”

Prabhu worries that the AI and business community are being swept away into uncharted waters: “People are thrilled, excited, giddy.” He thinks the rollout into commercial applications is bound to cause some disasters. “Even if they’re very careful, the odds of something offensive coming out is 100 percent—that’s my humble opinion. It’s an intractable problem, and there is no solution,” he says.  

Janelle Shane is a member of that AI community, and a beta user of GPT-3 for her blog, AI Weirdness. She clearly enjoys the technology, having used it to generate Christmas carols, recipes, news headlines, and anything else she thought would be funny. Yet the tweets about PhilosopherAI’s essay on Ethiopia prompted her to post this sobering thought: “Sometimes, to reckon with the effects of biased training data is to realize that the app shouldn’t be built. That without human supervision, there is no way to stop the app from saying problematic stuff to its users, and that it’s unacceptable to let it do so.”

So what is OpenAI doing about its intractable problem?

OpenAI’s Approach to AI Safety

The company has arguably learned from its experiences with earlier iterations of its language-generating technology. In 2019 it introduced GPT-2, but declared that it was actually too dangerous to be released into the wild. The company instead offered up a downsized version of the language model but withheld the full model, which included the data set and training code.

The main fear, highlighted by OpenAI in a blog post, was that malicious actors would use GPT-2 to generate high-quality fake news that would fool readers and destroy the distinction between fact and fiction.  

However, much of the AI community objected to that limited release. When the company reversed course later that year and made the full model available, some people did indeed use it to generate fake news and clickbait. But it didn’t create a tsunami of non-truth on the Internet. In the past few years, people have shown they can do that well enough themselves, without the help of an AI. 

Then came GPT-3, unveiled in a 75-page paper in May 2020. OpenAI’s newest language model was far larger than any that had come before. Its 175 billion language parameters were a massive increase over GPT-2’s 1.5 billion parameters).

Sandhini Agarwal, an AI policy researcher at OpenAI, spoke with Spectrum about the company’s strategy for GPT-3. “We have to do this closed beta with a few people, otherwise we won’t even know what the model is capable of, and we won’t know which issues we need to make headway on,” she says. “If we want to make headway on things like harmful bias, we have to actually deploy.”

Agarwal explains that an internal team vets proposed applications, provides safety guidelines to those companies granted access to GPT-3 via the API, reviews the applications again before deployment, and monitors their use after deployment.

OpenAI is also developing tools to help users better control GPT-3’s generated text. It offers a general content filter for harmful bias and toxic language. However, Agarwal says that such a filter is really an impossible thing to create, since “bias is a very nebulous thing that keeps shifting based on context.” Particularly on controversial topics, a response that might seem right-on to people on one side of the debate could be deemed toxic by the other.

Another approach, called prompt engineering, adds a phrase to the user’s prompt such as “the friendly bot then said,” which sets up GPT-3 to generate text in a polite and uncontroversial tone. Users can also choose a “temperature” setting for their responses. A low-temperature setting means the AI will put together words that it has very often seen together before, taking few risks and causing few surprises; when set to a high temperature, it’s more likely to produce outlandish language.

In addition to all the work being done on the product side of OpenAI, Agarwal says there’s a parallel effort on the “pure machine learning research” side of the company. “We have an internal red team that’s always trying to break the model, trying to make it do all these bad things,” she says. Researchers are trying to understand what’s happening when GPT-3 generates overtly sexist or racist text. “They’re going down to the underlying weights of the model, trying to see which weights might indicate that particular content is harmful.”

In areas where mistakes could have serious consequences, such as the health care, finance, and legal industries, Agarwal says OpenAI’s review team takes special care. In some cases, they’ve rejected applicants because their proposed product was too sensitive. In others, she says, they’ve insisted on having a “human in the loop,” meaning that the AI-generated text is reviewed by a human before it reaches a customer or user.  

OpenAI is making progress on toxic language and harmful bias, Agarwal says, but “we’re not quite where we want to be.” She says the company won’t broadly expand access to GPT-3 until it’s comfortable that it has a handle on these issues. “If we open it up to the world now, it could end really badly,” she says.

But such an approach raises plenty of questions. It’s not clear how OpenAI will get the risk of toxic language down to a manageable level—and it’s not clear what manageable means in this context. Commercial users will have to weigh GPT-3’s benefits against these risks.

Can Language Models Be Detoxified? 

OpenAI’s researchers aren’t the only ones trying to understand the scope of the problem. In December, AI researcher Timnit Gebru said that she’d been fired by Google, forced to leave her work on ethical AI and algorithmic bias, because of an internal disagreement about a paper she’d coauthored. The paper discussed the current failings of large language models such as GPT-3 and Google’s own BERT, including the dilemma of encoded bias. Gebru and her coauthors argued that companies intent on developing large language models should devote more of their resources to curating the training data and “only creating datasets as large as can be sufficiently documented.”

Meanwhile, at the Allen Institute for AI (AI2), in Seattle, a handful of researchers have been probing GPT-3 and other large language models. One of their projects, called RealToxicityPrompts, created a dataset of 100,000 prompts derived from web text, evaluated the toxicity of the resulting text from five different language models, and tried out several mitigation strategies. Those five models included GPT versions 1, 2, and 3 (OpenAI gave the researchers access to the API).

The conclusion stated in their paper, which was presented at the 2020 Empirical Methods in Natural Language Processing conference in November: No current mitigation method is “failsafe against neural toxic degeneration.” In other words, they couldn’t find a way to reliably keep out ugly words and sentiments.  

When the research team spoke with Spectrum about their findings, they noted that the standard ways of training these big language models may need improvement. “Using Internet text has been the default,” says Suchin Gururangan, an author on the paper and an investigator at AI2. “The assumption is that you’re getting the most diverse set of voices in the data. But it’s pretty clear in our analysis that Internet text does have its own biases, and biases do propagate in the model behavior.”

Gururangan says that when researchers think about what data to train their new models on, they should consider what kinds of text they’d like to exclude. But he notes that it’s a hard task to automatically identify toxic language even in a document, and that doing it at web-scale is “is fertile ground for research.”

As for ways to fix the problem, the AI2 team tried two approaches to “detoxify” the models’ output: giving the model additional training with text that’s known to be innocuous, or filtering the generated text by scanning for keywords or by fancier means. “We found that most of these techniques don’t really work very well,” Gururangan says. “All of these methods reduce the prevalence of toxicity—but we always found, if you generate enough times, you will find some toxicity.”

What’s more, he says, reducing the toxicity can also have the side effect of reducing the fluency of the language. That’s one of the issues that the beta users are grappling with today.  

How Beta Users of GPT-3 Aim for Safe Deployment

The companies and developers in the private beta that Spectrum spoke with all made two basic points: GPT-3 is a powerful technology, and OpenAI is working hard to address toxic language and harmful bias. “The people there take these issues extremely seriously,” says Richard Rusczyk, founder of Art of Problem Solving, a beta-user company that offers online math courses to “kids who are really into math.” And the companies have all devised strategies for keeping GPT-3’s output safe and inoffensive.   

Rusczyk says his company is trying out GPT-3 to speed up its instructors’ grading of students’ math proofs—GPT-3 can provide a basic response about a proof’s accuracy and presentation, and then the instructor can check the response and customize it to best help that individual student. “It lets the grader spend more time on the high value tasks,” he says.

To protect the students, the generated text “never goes directly to the students,” Rusczyk says. “If there’s some garbage coming out, only a grader would see it.” He notes that it’s extremely unlikely that GPT-3 would generate offensive language in response to a math proof, because it seems likely that such correlations rarely (if ever) occurred in its training data. Yet he stresses that OpenAI still wanted a human in the loop. “They were very insistent that students should not be talking directly to the machine,” he says.

Some companies find safety in limiting the use case for GPT-3. At Sapling Intelligence, a startup that helps customer service agents with emails, chat, and service tickets, CEO Ziang Xie he doesn’t anticipate using it for “freeform generation.” Xie says it’s important to put this technology in place within certain protective constraints. “I like the analogy of cars versus trolleys,” he says. “Cars can drive anywhere, so they can veer off the road. Trolleys are on rails, so you know at the very least they won’t run off and hit someone on the sidewalk.” However, Xie notes that the recent furor over Timnit Gebru’s forced departure from Google has caused him to question whether companies like OpenAI can do more to make their language models safer from the get-go, so they don’t need guardrails.

Robert Morris, the cofounder of the mental health app Koko, describes how his team is using GPT-3 in a particularly sensitive domain. Koko is a peer-support platform that provides crowdsourced cognitive therapy. His team is experimenting with using GPT-3 to generate bot-written responses to users while they wait for peer responses, and also with giving respondents possible text that they can modify. Morris says the human collaboration approach feels safer to him. “I get increasingly concerned the more freedom it has,” he says.

Yet some companies need GPT-3 to have a good amount of freedom. Replika, an AI companion app used by 7 million people around the world, offers friendly conversation about anything under the sun. “People can talk to Replika about anything—their life, their day, their interests,” says Artem Rodichev, head of AI at Replika. “We need to support conversation about all types of topics.”

To prevent the app from saying offensive things, the company has GPT-3 generate a variety of responses to each message, then uses a number of custom classifiers to detect and filter out responses with negativity, harmful bias, nasty words, and so on. Since such attributes are hard to detect from keywords alone, the app also collects signals from users to train its classifiers. “Users can label a response as inappropriate, and we can use that feedback as a dataset to train the classifier,” says Rodichev.  

Another company that requires GPT-3 to be relatively unfettered is Latitude, a startup creating AI-powered games. Its first offering, a text adventure game called AI Dungeon, currently uses GPT-3 to create the narrative and respond to the player’s actions. Latitude CEO and cofounder Nick Walton says his team has grappled with inappropriate and bad language. “It doesn’t happen a ton, but it does happen,” he says. “And things end up on Reddit.”

Latitude is not trying to prevent all such incidents, because some users want a “grittier experience,” Walton says. Instead, the company tries to give users control over the settings that determine what kind of language they’ll encounter. Players start out in a default safe mode, and stay there unless they explicitly turn it off.

Safe mode isn’t perfect, Walton says, but it relies on a combination of filters and prompt engineering (such as: “continue this story in a way that’s safe for kids”) to get pretty good performance. He notes that Latitude wanted to build its own screening tech rather than rely on OpenAI’s safety filter because “safety is relative to the context,” he says. “If a customer service chatbot threatens you and asks you to give it all its money, that’s bad. If you’re playing a game and you encounter a bandit on the road, that’s normal storytelling.”  

These applications are only a small sampling of those being tested by beta users, and the beta users are a tiny fraction of the entities that want access to GPT-3. Aaro Isosaari cofounded the startup Flowrite in September after getting access to GPT-3; the company aims to help people compose faster emails and online content. Just as advances in computer vision and speech recognition enabled thousands of new companies, He thinks GPT-3 may usher in a new wave of innovation. “Language models have the potential to be the next technological advancement on top of which new startups are being built,” he says.  

Coming Soon to Microsoft? 

Technology powered by GPT-3 could even find its way into the productivity tools that millions of office workers use every day. Last September, Microsoft announced an exclusive licensing agreement with OpenAI, stating that the company would use GPT-3 to “create new solutions that harness the amazing power of advanced natural language generation.” This arrangement won’t prevent other companies from accessing GPT-3 via OpenAI’s API, but it gives Microsoft exclusive rights to work with the basic code—it’s the difference between riding in a fast car and popping the hood to tinker with the engine.

In the blog post announcing the agreement, Microsoft chief technology officer Kevin Scott enthused about the possibilities, saying: “The scope of commercial and creative potential that can be unlocked through the GPT-3 model is profound, with genuinely novel capabilities – most of which we haven’t even imagined yet.” Microsoft declined to comment when asked about its plans for the technology and its ideas for safe deployment.

Ayfer, the creator of the Philosopher AI app, thinks that GPT-3 and similar language technologies should only gradually become part of our lives. “I think this is a remarkably similar situation to self-driving cars,” he says, noting that various aspects of autonomous car technology are gradually being integrated into normal vehicles. “But there’s still the disclaimer: It’s going to make life-threatening mistakes, so be ready to take over at any time. You have to be in control.” He notes that we’re not yet ready to put the AI systems in charge and use them without supervision.

With language technology like GPT-3, the consequences of mistakes might not be as obvious as a car crash. Yet toxic language has an insidious effect on human society by reinforcing stereotypes, supporting structural inequalities, and generally keeping us mired in a past that we’re collectively trying to move beyond. It isn’t clear, with GPT-3, if it will ever be trustworthy enough to act on its own, without human oversight.

OpenAI’s position on GPT-3 mirrors its larger mission, which is to create a game-changing kind of human-level AI, the kind of generally intelligent AI that figures in sci-fi movies—but to do so safely and responsibly. In both the micro and the macro argument, OpenAI’s position comes down to: We need to create the technology and see what can go wrong. We’ll do it responsibly, they say, while other people might not.

Agarwal of OpenAI says about GPT-3: “I do think that there are safety concerns, but it’s a Catch-22.” If they don’t build it and see what terrible things it’s capable of, she says, they can’t find ways to protect society from the terrible things. 

One wonders, though, whether anyone has considered another option: Taking a step back and thinking through the possible worst-case scenarios before proceeding with this technology. And possibly looking for fundamentally different ways to train large language models, so these models would reflect not the horrors of our past, but a world that we’d like to live in. 

A shorter version of this article appears in the February 2021 print issue as “The Troll in the Machine.”

Amazon Lex Introduces an Enhanced Console Experience and New V2 APIs

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/amazon-lex-enhanced-console-experience/

Today, the Amazon Lex team has released a new console experience that makes it easier to build, deploy, and manage conversational experiences. Along with the new console, we have also introduced new V2 APIs, including continuous streaming capability. These improvements allow you to reach new audiences, have more natural conversations, and develop and iterate faster.

The new Lex console and V2 APIs make it easier to build and manage bots focusing on three main benefits. First, you can add a new language to a bot at any time and manage all the languages through the lifecycle of design, test, and deployment as a single resource. The new console experience allows you to quickly move between different languages to compare and refine your conversations. I’ll demonstrate later how easy it was to add French to my English bot.

Second, V2 APIs simplify versioning. The new Lex console and V2 APIs provide a simple information architecture where the bot intents and slot types are scoped to a specific language. Versioning is performed at the bot level so that resources such as intents and slot types do not have to be versioned individually. All resources within the bot (language, intents, and slot types) are archived as part of the bot version creation. This new way of working makes it easier to manage bots.

Lastly, you have additional builder productivity tools and capabilities to give you more flexibility and control of your bot design process. You can now save partially completed work as you develop different bot elements as you script, test and tune your configuration. This provides you with more flexibility as you iterate through the bot development. For example, you can save a slot that refers to a deleted slot type. In addition to saving partially completed work, you can quickly navigate across the configuration without getting lost. The new Conversation flow capability allows you to maintain your orientation as you move across the different intents and slot types.

In addition to the enhanced console and APIs, we are providing a new streaming conversation API. Natural conversations are punctuated with pauses and interruptions. For example, a customer may ask to pause the conversation or hold the line while looking up the necessary information before answering a question to retrieve credit card details when providing bill payments. With streaming conversation APIs, you can pause a conversation and handle interruptions directly as you configure the bot. Overall, the design and implementation of the conversation is simplified and easy to manage. The bot builder can quickly enhance the conversational capability of virtual contact center agents or smart assistants.

Let’s create a new bot and explore how some of Lex’s new console and streaming API features provide an improved bot building experience.

Building a bot
I head over to the new V2 Lex console and click on Create bot to start things off.

I select that I want to Start with an example and select the MakeAppointment example.

Over the years, I have spoken at many conferences, so I now offer to review talks that other community members are producing. Since these speakers are often in different time zones, it can be complicated to organize the various appointments for the different types of reviews that I offer. So I have decided to build a bot to streamline the process. I give my bot the name TalkReview and provide a description. I also select Create a role with basic Amazon Lex permissions and use this as my runtime role.

I must add at least one language to my bot, so I start with English (GB). I also select the text-to-speech voice that I want to use should my bot require voice interaction rather than just text.

During the creation, there is a new button that allows me to Add another language. I click on this to add French (FR) to my bot. You can add languages during creation as I am doing here, or you can add additional languages later on as your bot becomes more popular and needs to work with new audiences.

I can now start defining intents for my bot, and I can begin the iterative process of building and testing my bot. I won’t go into all of the details of how to create a bot or show you all of the intents I added, as we have better tutorials that can show you that step-by-step, but I will point out a few new features that make this new enhanced console really compelling.

The new Conversation flow provides you with a visual flow of the conversation, and you can see how the sample utterances you provide and how your conversation might work in the real world. I love this feature because you can click on the various elements, and it will take you to where you can make changes. For example, I can click on the prompt What type of review would you like to schedule and I am taken to the place where I can edit this prompt.

The new console has a very well thought-out approach to versioning a bot. At anytime, on the Bot versions screen, I can click Create version, and it will take a snapshot of the state of the bot’s current configuration. I can then associate that with an alias. For example, in my application, I have an alias called Production. This Production alias is associated with Version 1. Still, at any time, I could switch it to use a different version or even roll back to a previous version if I discover problems.

The testing experience is now very streamlined. Once I have built the bot, I can click the test button on the bottom right hand of the screen and start speaking to the bot and testing the experience. You can also expand the Inspect window, which gives you details about the conversations state, and you can also explore the raw JSON inputs and outputs.

Things to know
Here are a couple of important things to keep in mind when you use the enhanced console

  • Integration with Amazon Connect – Currently, bots built in the new console cannot be integrated with Amazon Connect contact flows. We plan to provide this integration as part of the near-term roadmap. You can use the current console and existing APIs to create and integrate bots with Amazon Connect.
  • Pricing – You only pay for what you use. The charges remain the same for existing audio and text APIs, renamed as RecognizeUtterance and RecognizeText. For the new Streaming capabilities, please refer to the pricing detail here.
  • All existing APIs and bots will continue to be supported. The newly announced features are only available in the new console and V2 APIs.

Go Build
Lex enhanced console is available now, and you can start using it today. The enhanced experience and V2 APIs are available in all existing regions and support all current languages. So, please give this console a try and let us know what you think. To learn more, check out the documentation for the console and the streaming API.

Happy Building!
— Martin

Raspberry Pi LEGO sorter

Post Syndicated from Ashley Whittaker original https://www.raspberrypi.org/blog/raspberry-pi-lego-sorter/

Raspberry Pi is at the heart of this AI–powered, automated sorting machine that is capable of recognising and sorting any LEGO brick.

And its maker Daniel West believes it to be the first of its kind in the world!

Best ever

This mega-machine was two years in the making and is a LEGO creation itself, built from over 10,000 LEGO bricks.

A beast of 10,000 bricks

It can sort any LEGO brick you place in its input bucket into one of 18 output buckets, at the rate of one brick every two seconds.

While Daniel was inspired by previous LEGO sorters, his creation is a huge step up from them: it can recognise absolutely every LEGO brick ever created, even bricks it has never seen before. Hence the ‘universal’ in the name ‘universal LEGO sorting machine’.


There we are, tucked away, just doing our job


The artificial intelligence algorithm behind the LEGO sorting is a convolutional neural network, the go-to for image classification.

What makes Daniel’s project a ‘world first’ is that he trained his classifier using 3D model images of LEGO bricks, which is how the machine can classify absolutely any LEGO brick it’s faced with, even if it has never seen it in real life before.

We LOVE a thorough project video, and we love TWO of them even more

Daniel has made a whole extra video (above) explaining how the AI in this project works. He shouts out all the open source software he used to run the Raspberry Pi Camera Module and access 3D training images etc. at this point in the video.

LEGO brick separation

The vibration plate in action, feeding single parts into the scanner

Daniel needed the input bucket to carefully pick out a single LEGO brick from the mass he chucks in at once.

This is achieved with a primary and secondary belt slowly pushing parts onto a vibration plate. The vibration plate uses a super fast LEGO motor to shake the bricks around so they aren’t sitting on top of each other when they reach the scanner.

Scanning and sorting

A side view of the LEFO sorting machine showing a large white chute built from LEGO bricks
The underside of the beast

A Raspberry Pi Camera Module captures video of each brick, which Raspberry Pi 3 Model B+ then processes and wirelessly sends to a more powerful computer able to run the neural network that classifies the parts.

The classification decision is then sent back to the sorting machine so it can spit the brick, using a series of servo-controlled gates, into the right output bucket.

Extra-credit homework

A front view of the LEGO sorter with the sorting boxes visible underneath
In all its bricky beauty, with the 18 output buckets visible at the bottom

Daniel is such a boss maker that he wrote not one, but two further reading articles for those of you who want to deep-dive into this mega LEGO creation:

The post Raspberry Pi LEGO sorter appeared first on Raspberry Pi.

Resource leak detection in Amazon CodeGuru Reviewer

Post Syndicated from Pranav Garg original https://aws.amazon.com/blogs/devops/resource-leak-detection-in-amazon-codeguru/

This post discusses the resource leak detector for Java in Amazon CodeGuru Reviewer. CodeGuru Reviewer automatically analyzes pull requests (created in supported repositories such as AWS CodeCommit, GitHub, GitHub Enterprise, and Bitbucket) and generates recommendations for improving code quality. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. This blog does not describe the resource leak detector for Python programs that is now available in preview.

What are resource leaks?

Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.

Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.

Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).

A resource must be closed along all execution paths to prevent resource leaks

Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.

Detecting resource leaks in CodeGuru Reviewer

For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.

 1     private Connection getConnection(final BasicDataSource dataSource, ...)
               throws ValidateConnectionException, SQLException {
 2         boolean connectionAcquired = false;
 3         // Retrying three times to get the connection.
 4         for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
 5             Connection connection = dataSource.getConnection();
 6             // validateConnection may throw ValidateConnectionException
 7             if (! validateConnection(connection, ...)) {
 8                 // connection is invalid
 9                 DbUtils.closeQuietly(connection);
10             } else {
11                 // connection is established
12                 connectionAcquired = true;
13                 return connection;
14             }
15         }
16         return null;
17     }

At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.

However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.

A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:

“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”

As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:

private Connection getConnection(final BasicDataSource dataSource, ...) 
        throws ValidateConnectionException, SQLException {
    boolean connectionAcquired = false;
    // Retrying three times to get the connection.
    for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
        Connection connection = dataSource.getConnection();
        try {
            // validateConnection may throw ValidateConnectionException
            if (! validateConnection(connection, ...)) {
                // connection is invalid
            } else {
                // connection is established
                connectionAcquired = true;
                return connection;
        } finally {
            if (!connectionAcquired) {
    return null;

As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.

Combining static code analysis with machine learning for accurate resource leak detection

In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.

The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.

To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:

  • A connection is established on line 5
  • Method validateConnection() returns false at line 7
  • DbUtils.closeQuietly() is called on line 9

This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.

When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.

The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.

Responses to the resource leak findings

Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.

The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.


Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.

How Facial Recognition Technology Is Helping Identify the U.S. Capitol Attackers

Post Syndicated from Mark Harris original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/facial-recognition-and-the-us-capitol-insurrection

The FBI is still trying to identify some of the hundreds of people who launched a deadly attack on the U.S. Congress last week. “We have deployed our full investigative resources and are working closely with our federal, state, and local partners to aggressively pursue those involved in criminal activity during the events of January 6,” reads a page that contains images of dozens of unknown individuals, including one suspected of planting several bombs around Washington, D.C.

But while the public is being urged to put names to faces, America’s law enforcement agencies already have access to technologies that could do much of the heavy lifting. “We have over three billion photos that we indexed from the public internet, like Google for faces,” Hoan Ton-That, CEO of facial recognition start-up Clearview AI told Spectrum.

Ton-That said that Clearview’s customers, including the FBI, were using it to help identify the perpetrators: “Use our system, and in about a second it might point to someone’s Instagram page.” 

Clearview has attracted criticism because it relies on images scraped from social media sites without their—or their users’—permission. 

“The Capitol images are very good quality for automatic face recognition,” agreed a senior face recognition expert at one of America’s largest law enforcement agencies, who asked not to be named because they were talking to Spectrum without the permission of their superiors. 

Face recognition technology is commonplace in 2021. But the smartphone that recognizes your face in lieu of a passcode is solving a much simpler problem than trying to ID a masked (or often in the Capitol attacks surprisingly unmasked) intruder from a snatched webcam frame. 

The first is comparing a live, high resolution image to a single, detailed record stored in the phone. “Modern algorithms can basically see past issues such as how your head is oriented and variations in illumination or expression,” says Arun Vemury, director of the Department of Homeland Security (DHS) Science and Technology Directorate Biometric and Identity Technology Center. In a recent DHS test of such screening systems at airports, the best algorithm identified the correct person at least 96 percent of the time. 

The second scenario, however, is attempting to connect a fleeting, unposed image against one of hundreds of millions of people in the country or around the world. “Most law enforcement agencies can only search against mugshots of people who have been arrested in their jurisdictions, not even DMV records,” says the law enforcement officer.

And as the size of the database grows, so does the likelihood of the system generating incorrect identifications. “Very low false positive rates are still fairly elusive,” says Vemury. “Because there are lots of people out there who might look like you, from siblings and children to complete strangers. Honestly, faces are not all that different from one another.”

Nevertheless, advances in machine learning techniques and algorithms mean that facial recognition technologies are improving. After the COVID-19 pandemic hit last year, the National Institute of Standards and Technology tested industry-leading algorithms [PDF] with images of people wearing facemasks. While some of the algorithms saw error rates soar, others had only a modest decrease in effectiveness compared to maskless facial recognition efforts. Incredibly, the best algorithm’s performance with masks on was comparable to the state-of-the-art on unmasked images from just three years earlier.

In fact, claims Vemury, AI-powered facial recognition systems are now better at matching unfamiliar faces than even the best trained human. “There’s almost always a human adjudicating the result or figuring out whether or not to follow up,” he says, “But if a human is more likely to make an error than the algorithm, are we are we really thinking about this process correctly? It’s almost like asking a third grader to check a high school student’s calculus homework.”

Yet such technological optimism worries Elizabeth Rowe, a law professor at the University of Florida Levin College of Law. “Just because we have access to all of this information doesn’t mean that we should necessarily use it,” she says. “Part of the problem is that there’s no reporting accountability of who’s using what and why, especially among private companies.”

Last week, The Washington Times as well as Republican Congressman Matt Gaetz incorrectly claimed that face recognition software from New York startup XRVision had revealed two of the Capitol attackers as incognito left-wing instigators. In fact, XRVision’s algorithms had identified the agitators as being the very right-wing extremists they appeared to be.  

There are also ongoing concerns that the way some face recognition technologies work (or fail to work) with different demographic groups [PDF] can exacerbate institutional racial biases. “We’re doing some additional research in this area,” says Vemury. “But even if you made the technologies totally fair, you could still deploy them in ways that could have a discriminative outcome.”

But if facial recognition technologies are linked to apprehending high profile suspects such as the Capitol attackers, enthusiasm for their use is likely only going to grow, says Rowe. 

“Just as consumers have gotten attached to the convenience of using biometrics to access our toys, I think we’ll find law enforcement agencies doing exactly the same thing,” she says. “It’s easy, and it gives them the potential to conduct investigations in a way that they couldn’t before.”

Can Computer Models Select the Best Public Health Interventions for COVID-19?

Post Syndicated from Matthew Hutson original https://spectrum.ieee.org/the-human-os/artificial-intelligence/medical-ai/can-computer-models-select-the-best-public-health-interventions-for-covid19

Many associate XPrize with a $10-million award offered in 1996 to motivate a breakthrough in private space flight. But the organization has since held other competitions related to exploration, ecology, and education. And in November, they launched the Pandemic Response Challenge, which will culminate in a $500,000 award to be split between two teams that not only best predict the continuing global spread of COVID-19, but also prescribe policies to curtail it.

“The whole point was to create a platform to create pandemic mitigation strategies based on evidence and science,” says Amir Banifatemi, XPrize’s chief innovation and growth officer. “But also to make the resulting insights available freely to everyone, in an open-source manner—especially for all those communities that may not have access to data and epidemiology divisions, statisticians, or data scientists.”

Pandemic predictions are hard enough, as we’ve seen with forecasting’s spotted track record over the past year. Prescriptions are harder still. Any non-pharmaceutical intervention (NPI), like closing schools and businesses, limiting travel, or establishing contact tracing, will be implemented differently in different areas; these interventions can also interact in surprising ways.

The XPrize Pandemic Response Challenge emerged from a paper published in May 2020 by a team led by Risto Miikkulainen, a computer scientist at the University of Texas at Austin and associate vice president for evolutionary intelligence at Cognizant Technology Solutions, an IT and consulting company.

The paper, by Miikkulainen and colleagues at UT and Cognizant, lays out a way to go from prediction to prescription for COVID-19. As a first step, the team trained a neural network to predict new infections, using past data on infections and NPIs implemented. Then they created another neural net to serve as the prescriptor, taking in past infections and NPIs and outputting a new set of NPIs. To optimize the prescriptor, they created a whole population of prescriptors and used artificial evolution. They evaluated the prescriptors using the predictor as a surrogate for reality; in other words, based on the interventions prescribed, what would be the predicted effect on case numbers? The best performing prescriptors were kept, copied, and mutated.

Notably, evolution produced not a single good prescriptor but a set of them, each good in its own way. They were selected for their ability to minimize not just infections, but also interventions themselves—otherwise, they’d just prescribe total lockdowns, which have serious impacts on the economy and quality of life. Policymakers could theoretically look at the set of prescriptors and pick one, depending on how much they wanted to emphasize physical health or social and economic health.

Miikkulainen’s team placed an interactive demo online. “Amir [Banifatemi] saw that and figured that this would make a great XPrize,” Miikkulainen says. Suddenly, artificial intelligence and big data seemed capable of authoring useful policy recommendations. Cognizant is partnering with XPrize to run the challenge, and their code is offered to contestants as an optional starting point.

Some XPrizes span years. This one has a compressed schedule, for obvious reasons. There are two phases. For Phase 1, teams had to submit prediction models by 22 December. They were given data on infections and NPIs around the world (the NPI data came from the comprehensive Oxford COVID-19 Government Response Tracker), and the models are now being judged over a three-week period on how closely their predictions of new cases each day match reality across more than 200 regions (countries, U.S. states, and provinces of Canada and Brazil). Teams will also be judged qualitatively on factors such as innovation, model speed, prediction consistency, explanation, and collaboration with other teams.

Up to 50 teams will make it to Phase 2, where they must submit a prescription model. The best predictors from Phase 1 will be combined to evaluate the prescriptions in Phase 2. Prescriptors can offer up to 10 prescriptions per region per day, covering different infection-intervention tradeoffs. (The economic cost of each intervention will be given to the models. Of course, figuring out the real costs is a problem in itself.) Again, these will be evaluated both quantitatively and qualitatively. The top two teams will split half a million dollars.

The competition may not end there. XPrize’s Banifatemi says a third phase might test models on vaccine deployment prescriptions. And beyond the contest, some cities or countries might put some of the Phase 2 or 3 models into practice, if Banifatemi can find adventurous takers.

The organizers expect a wide variety of solutions. Banifatemi says the field includes teams from AI strongholds such as Stanford, Microsoft, MIT, Oxford, and Quebec’s Mila, but one team consists of three women in Tunisia. In all, 104 teams from 28 countries have registered.

“We’re hoping that this competition can be a springboard for developing solutions for other really big problems as well,” Miikkulainen says. Those problems include pandemics, global warming, and challenges in business, education, and healthcare. In this scenario, “humans are still in charge,” he emphasizes. “They still decide what they want, and AI gives them the best alternatives from which the decision-makers choose.”

But Miikkulainen hopes that data science can help humanity find its way. “Maybe in the future, it’s considered irresponsible not to use AI for making these policies,” he says.

New for Amazon CodeGuru – Python Support, Security Detectors, and Memory Profiling

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-codeguru-python-support-security-detectors-and-memory-profiling/

Amazon CodeGuru is a developer tool that helps you improve your code quality and has two main components:

  • CodeGuru Reviewer uses program analysis and machine learning to detect potential defects that are difficult to find in your code and offers suggestions for improvement.
  • CodeGuru Profiler collects runtime performance data from your live applications, and provides visualizations and recommendations to help you fine-tune your application performance.

Today, I am happy to announce three new features:

  • Python Support for CodeGuru Reviewer and Profiler (Preview) – You can now use CodeGuru to improve applications written in Python. Before this release, CodeGuru Reviewer could analyze Java code, and CodeGuru Profiler supported applications running on a Java virtual machine (JVM).
  • Security Detectors for CodeGuru Reviewer – A new set of detectors for CodeGuru Reviewer to identify security vulnerabilities and check for security best practices in your Java code.
  • Memory Profiling for CodeGuru Profiler – A new visualization of memory retention per object type over time. This makes it easier to find memory leaks and optimize how your application is using memory.

Let’s see these functionalities in more detail.

Python Support for CodeGuru Reviewer and Profiler (Preview)
Python Support for CodeGuru Reviewer is available in Preview and offers recommendations on how to improve the Python code of your applications in multiple categories such as concurrency, data structures and control flow, scientific/math operations, error handling, using the standard library, and of course AWS best practices.

You can now also use CodeGuru Profiler to collect runtime performance data from your Python applications and get visualizations to help you identify how code is running on the CPU and where time is consumed. In this way, you can detect the most expensive lines of code of your application. Focusing your tuning activities on those parts helps you reduce infrastructure cost and improve application performance.

Let’s see the CodeGuru Reviewer in action with some Python code. When I joined AWS eight years ago, one of the first projects I created was a Filesystem in Userspace (FUSE) interface to Amazon Simple Storage Service (S3) called yas3fs (Yet Another S3-backed File System). It was inspired by the more popular s3fs-fuse project but rewritten from scratch to implement a distributed cache synchronized by Amazon Simple Notification Service (SNS) notifications (now, thanks to the many contributors, it’s using S3 event notifications). It was also a good excuse for me to learn more about Python programming and S3. It’s a personal project that at the time made available as open source. Today, if you need a shared file system, you can use Amazon Elastic File System (EFS).

In the CodeGuru console, I associate the yas3fs repository. You can associate repositories from GitHub, including GitHub Enterprise Cloud and GitHub Enterprise Server, Bitbucket, or AWS CodeCommit.

After that, I can get a code review from CodeGuru in two ways:

  • Automatically, when I create a pull request. This is a great way to use it as you and your team are working on a code base.
  • Manually, creating a repository analysis to get a code review for all the code in one branch. This is useful to start using GodeGuru with an existing code base.

Since I just associated the whole repository, I go for a full analysis and write down the branch name to review (apologies, I was still using master at the time, now I use main for new projects).

After a few minutes, the code review is completed, and there are 14 recommendations. Not bad, but I can definitely improve the code. Here’s a few of the recommendations I get. I was using exceptions and global variables too much at the time.

Security Detectors for CodeGuru Reviewer
The new CodeGuru Reviewer Security Detector uses automated reasoning to analyze all code paths and find potential security issues deep in your Java code, even ones that span multiple methods and files and that may involve multiple sequences of operations. To build this detector, we used learning and best practices from Amazon’s 20+ years of experience.

The Security Detector is also identifying security vulnerabilities in the top 10 Open Web Application Security Project (OWASP) categories, such as weak hash encryption.

If the security detector discovers an issue, it offers a suggested remediation along with an explanation. In this way, it’s much easier to follow security best practices for AWS APIs, such as those for AWS Key Management Service (KMS) and Amazon Elastic Compute Cloud (EC2), and for common Java cryptography and TLS/SSL libraries.

With help from the security detector, security engineers can focus on architectural and application-specific security best-practices, and code reviewers can focus their attention on other improvements.

Memory Profiling for CodeGuru Profiler
For applications running on a JVM, CodeGuru Profiler can now show the Heap Summary, a consolidated view of memory retention during a time frame, tracking both overall sizes and number of objects per object type (such as String, int, char[], and custom types). These metrics are presented in a timeline graph, so that you can easily spot trends and peaks of memory utilization per object type.

Here are a couple of scenarios where this can help:

Memory Leaks – A constantly growing memory utilization curve for one or more object types may indicate a leak (intended here as unnecessary retention of memory objects by the application), possibly leading to out-of-memory errors and application crashes.

Memory Optimizations – Having a breakdown of memory utilization per object type is a step beyond traditional memory utilization monitoring, based solely on JVM-level metrics like total heap usage. By knowing that an unexpectedly high amount of memory has been associated with a specific object type, you can focus your analysis and optimization efforts on the parts of your application that are responsible for allocating and referencing objects of that type.

For example, here is a graph showing how memory is used by a Java application over an interval of time. Apart from the total capacity available and the used space, I can see how memory is being used by some specific object types, such as byte[], java.lang.UUID, and the entries of a java.util.LinkedHashMap. The continuous growth over time of the memory retained by these object types is suspicious. There is probably a memory leak I have to investigate.

In the table just below, I have a longer list of object types allocating memory on the heap. The first three are selected and for that reason are shown in the graph above. Here, I can inspect other object types and select them to see their memory usage over time. It looks like the three I already selected are the ones with more risk of being affected by a memory leak.

Available Now
These new features are available today in all regions where Amazon CodeGuru is offered. For more information, please see the AWS Regional Services table.

There are no pricing changes for Python support, security detectors, and memory profiling. You pay for what you use without upfront fees or commitments.

Learn more about Amazon CodeGuru and start using these new features today to improve the code quality of your applications.  


Amazon SageMaker JumpStart Simplifies Access to Pre-built Models and Machine Learning Solutions

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-jumpstart-simplifies-access-to-prebuilt-models-and-machine-learning-models/

Today, I’m extremely happy to announce the availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that accelerates your machine learning workflows with one-click access to popular model collections (also known as “model zoos”), and to end-to-end solutions that solve common use cases.

In recent years, machine learning (ML) has proven to be a valuable technique in improving and automating business processes. Indeed, models trained on historical data can accurately predict outcomes across a wide range of industry segments: financial services, retail, manufacturing, telecom, life sciences, and so on. Yet, working with these models requires skills and experience that only a subset of scientists and developers have: preparing a dataset, selecting an algorithm, training a model, optimizing its accuracy, deploying it in production, and monitoring its performance over time.

In order to simplify the model building process, the ML community has created model zoos, that is to say, collections of models built with popular open source libraries, and often pretrained on reference datasets. For example, the TensorFlow Hub and the PyTorch Hub provide developers with a long list of models ready to be downloaded, and integrated in applications for computer vision, natural language processing, and more.

Still, downloading a model is just part of the answer. Developers then need to deploy it for evaluation and testing, using either a variety of tools, such as the TensorFlow Serving and TorchServe model servers, or their own bespoke code. Once the model is running, developers need to figure out the correct format that incoming data should have, a long-lasting pain point. I’m sure I’m not the only one regularly pulling my hair out here!

Of course, a full-ML application usually has a lot of moving parts. Data needs to be preprocessed, enriched with additional data fetched from a backend, and funneled into the model. Predictions are often postprocessed, and stored for further analysis and visualization. As useful as they are, model zoos only help with the modeling part. Developers still have lots of extra work to deliver a complete ML solution.

Because of all this, ML experts are flooded with a long backlog of projects waiting to start. Meanwhile, less experienced practitioners struggle to get started. These barriers are incredibly frustrating, and our customers asked us to remove them.

Introducing Amazon SageMaker JumpStart
Amazon SageMaker JumpStart is integrated in Amazon SageMaker Studio, our fully integrated development environment (IDE) for ML, making it intuitive to discover models, solutions, and more. At launch, SageMaker JumpStart includes:

  • 15+ end-to-end solutions for common ML use cases such as fraud detection, predictive maintenance, and so on.
  • 150+ models from the TensorFlow Hub and the PyTorch Hub, for computer vision (image classification, object detection), and natural language processing (sentence classification, question answering).
  • Sample notebooks for the built-in algorithms available in Amazon SageMaker.

SageMaker JumpStart also provides notebooks, blogs, and video tutorials designed to help you learn and remove roadblocks. Content is easily accessible within Amazon SageMaker Studio, enabling you to get started with ML faster.

It only takes a single click to deploy solutions and models. All infrastructure is fully managed, so all you have to do is enjoy a nice cup of tea or coffee while deployment takes place. After a few minutes, you can start testing, thanks to notebooks and sample prediction code that are readily available in Amazon SageMaker Studio. Of course, you can easily modify them to use your own data.

SageMaker JumpStart makes it extremely easy for experienced practitioners and beginners alike to quickly deploy and evaluate models and solutions, saving days or even weeks of work. By drastically shortening the path from experimentation to production, SageMaker JumpStart accelerates ML-powered innovation, particularly for organizations and teams that are early on their ML journey, and haven’t yet accumulated a lot of skills and experience.

Now, let me show you how SageMaker JumpStart works.

Deploying a Solution with Amazon SageMaker JumpStart
Opening SageMaker Studio, I select the “JumpStart” icon on the left. This opens a new tab showing me all available content (solutions, models, and so on).

Let’s say that I’m interested in using computer vision to detect defects in manufactured products. Could ML be the answer?

Browsing the list of available solutions, I see one for product defect detection.

Opening it, I can learn more about the type of problems that it solves, the sample dataset used in the demo, the AWS services involved, and more.

SageMaker screenshot

A single click is all it takes to deploy this solution. Under the hood, AWS CloudFormation uses a built-in template to provision all appropriate AWS resources.

A few minutes later, the solution is deployed, and I can open its notebook.

SageMaker screenshot

The notebook opens immediately in SageMaker Studio. I run the demo, and understand how ML can help me detect product defects. This is also a nice starting point for my own project, making it easy to experiment with my own dataset (feel free to click on the image below to zoom in).

SageMaker screenshot

Once I’m done with this solution, I can delete all its resources in one click, letting AWS CloudFormation clean up without having to worry about leaving idle AWS resources behind.

SageMaker screenshot

Now, let’s look at models.

Deploying a Model with Amazon SageMaker JumpStart
SageMaker JumpStart includes a large collection of models available in the TensorFlow Hub and the PyTorch Hub. These models are pre-trained on reference datasets, and you can use them directly to handle a wide range of computer vision and natural language processing tasks. You can also fine-tune them on your own datasets for greater accuracy, a technique called transfer learning.

SageMaker screenshot
Here, I pick a version of the BERT model trained on question answering. I can either deploy it as is, or fine-tune it. For the sake of brevity, I go with the former here, and I just click on the “Deploy” button.

SageMaker screenshot

A few minutes later, the model has been deployed to a real-time endpoint powered by fully managed infrastructure.

SageMaker screenshot

Time to test it! Clicking on “Open Notebook” launches a sample notebook that I run right away to test the model, without having to change a line of code (again, feel free to click on the image below to zoom in). Here, I’m asking two questions (“What is Southern California often abbreviated as?” and “Who directed Spectre?“), passing some context containing the answer. In both cases, the BERT model gives the correct answer, respectively “socal” and “Sam Mendes“.

SageMaker screenshot

When I’m done testing, I can delete the endpoint in one click, and stop paying for it.

Getting Started
As you can see, it’s extremely easy to deploy models and solutions with SageMaker JumpStart in minutes, even if you have little or no ML skills.

You can start using this capability today in all regions where SageMaker Studio is available, at no additional cost.

Give it a try and let us know what you think.

As always, we’re looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Special thanks to my colleague Jared Heywood for his precious help during early testing.

New – Amazon SageMaker Pipelines Brings DevOps Capabilities to your Machine Learning Projects

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-pipelines-brings-devops-to-machine-learning-projects/

Today, I’m extremely happy to announce Amazon SageMaker Pipelines, a new capability of Amazon SageMaker that makes it easy for data scientists and engineers to build, automate, and scale end to end machine learning pipelines.

Machine learning (ML) is intrinsically experimental and unpredictable in nature. You spend days or weeks exploring and processing data in many different ways, trying to crack the geode open to reveal its precious gemstones. Then, you experiment with different algorithms and parameters, training and optimizing lots of models in search of highest accuracy. This process typically involves lots of different steps with dependencies between them, and managing it manually can become quite complex. In particular, tracking model lineage can be difficult, hampering auditability and governance. Finally, you deploy your top models, and you evaluate them against your reference test sets. Finally? Not quite, as you’ll certainly iterate again and again, either to try out new ideas, or simply to periodically retrain your models on new data.

No matter how exciting ML is, it does unfortunately involve a lot of repetitive work. Even small projects will require hundreds of steps before they get the green light for production. Over time, not only does this work detract from the fun and excitement of your projects, it also creates ample room for oversight and human error.

To alleviate manual work and improve traceability, many ML teams have adopted the DevOps philosophy and implemented tools and processes for Continuous Integration and Continuous Delivery (CI/CD). Although this is certainly a step in the right direction, writing your own tools often leads to complex projects that require more software engineering and infrastructure work than you initially anticipated. Valuable time and resources are diverted from the actual ML project, and innovation slows down. Sadly, some teams decide to revert to manual work, for model management, approval, and deployment.

Introducing Amazon SageMaker Pipelines
Simply put, Amazon SageMaker Pipelines brings in best-in-class DevOps practices to your ML projects. This new capability makes it easy for data scientists and ML developers to create automated and reliable end-to-end ML pipelines. As usual with SageMaker, all infrastructure is fully managed, and doesn’t require any work on your side.

Care.com is the world’s leading platform for finding and managing high-quality family care. Here’s what Clemens Tummeltshammer, Data Science Manager, Care.com, told us: “A strong care industry where supply matches demand is essential for economic growth from the individual family up to the nation’s GDP. We’re excited about Amazon SageMaker Feature Store and Amazon SageMaker Pipelines, as we believe they will help us scale better across our data science and development teams, by using a consistent set of curated data that we can use to build scalable end-to-end machine learning (ML) model pipelines from data preparation to deployment. With the newly announced capabilities of Amazon SageMaker, we can accelerate development and deployment of our ML models for different applications, helping our customers make better informed decisions through faster real-time recommendations.

Let me tell you more about the main components in Amazon SageMaker Pipelines: pipelines, model registry, and MLOps templates.

Pipelines – Model building pipelines are defined with a simple Python SDK. They can include any operation available in Amazon SageMaker, such as data preparation with Amazon SageMaker Processing or Amazon SageMaker Data Wrangler, model training, model deployment to a real-time endpoint, or batch transform. You can also add Amazon SageMaker Clarify to your pipelines, in order to detect bias prior to training, or once the model has been deployed. Likewise, you can add Amazon SageMaker Model Monitor to detect data and prediction quality issues.

Once launched, model building pipelines are executed as CI/CD pipelines. Every step is recorded, and detailed logging information is available for traceability and debugging purposes. Of course, you can also visualize pipelines in Amazon SageMaker Studio, and track their different executions in real time.

Model Registry – The model registry lets you track and catalog your models. In SageMaker Studio, you can easily view model history, list and compare versions, and track metadata such as model evaluation metrics. You can also define which versions may or may not be deployed in production. In fact, you can even build pipelines that automatically trigger model deployment once approval has been given. You’ll find that the model registry is very useful in tracing model lineage, improving model governance, and strengthening your compliance posture.

MLOps TemplatesSageMaker Pipelines includes a collection of built-in CI/CD templates for popular pipelines (build/train/deploy, deploy only, and so on). You can also add and publish your own templates, so that your teams can easily discover them and deploy them. Not only do templates save lots of time, they also make it easy for ML teams to collaborate from experimentation to deployment, using standard processes and without having to manage any infrastructure. Templates also let Ops teams customize steps as needed, and give them full visibility for troubleshooting.

Now, let’s do a quick demo!

Building an End-to-end Pipeline with Amazon SageMaker Pipelines
Opening SageMaker Studio, I select the “Components” tab and the “Projects” view. This displays a list of built-in project templates. I pick one to build, train, and deploy a model.

SageMaker screenshot

Then, I simply give my project a name, and create it.

A few seconds later, the project is ready. I can see that it includes two Git repositories hosted in AWS CodeCommit, one for model training, and one for model deployment.

SageMaker screenshot

The first repository provides scaffolding code to create a multi-step model building pipeline: data processing, model training, model evaluation, and conditional model registration based on accuracy. As you’ll see in the pipeline.py file, this pipeline trains a linear regression model using the XGBoost algorithm on the well-known Abalone dataset. This repository also includes a build specification file, used by AWS CodePipeline and AWS CodeBuild to execute the pipeline automatically.

Likewise, the second repository contains code and configuration files for model deployment, as well as test scripts required to pass the quality gate. This operation is also based on AWS CodePipeline and AWS CodeBuild, which run a AWS CloudFormation template to create model endpoints for staging and production.

Clicking on the two blue links, I clone the repositories locally. This triggers the first execution of the pipeline.

SageMaker screenshot

A few minutes later, the pipeline has run successfully. Switching to the “Pipelines” view, I can visualize its steps.

SageMaker screenshot

Clicking on the training step, I can see the Root Mean Square Error (RMSE) metrics for my model.

SageMaker screenshot

As the RMSE is lower than the threshold defined in the conditional step, my model is added to the model registry, as visible below.

SageMaker screenshot

For simplicity, the registration step sets the model status to “Approved”, which automatically triggers its deployment to a real-time endpoint in the same account. Within seconds, I see that the model is being deployed.

SageMaker screenshot

Alternatively, you could register your model with a “Pending manual approval” status. This will block deployment until the model has been reviewed and approved manually. As the model registry supports cross-account deployment, you could also easily deploy in a different account, without having to copy anything across accounts.

A few minutes later, the endpoint is up, and I could use it to test my model.

SageMaker screenshot

Once I’ve made sure that this model works as expected, I could ping the MLOps team, and ask them to deploy the model in production.

Putting my MLOps hat on, I open the AWS CodePipeline console, and I see that my deployment is indeed waiting for approval.

SageMaker screenshot

I then approve the model for deployment, which triggers the final stage of the pipeline.

SageMaker screenshot

Reverting to my Data Scientist hat, I see in SageMaker Studio that my model is being deployed. Job done!

SageMaker screenshot

Getting Started
As you can see, Amazon SageMaker Pipelines makes it really easy for Data Science and MLOps teams to collaborate using familiar tools. They can create and execute robust, automated ML pipelines that deliver high quality models in production quicker than before.

You can start using SageMaker Pipelines in all commercial regions where SageMaker is available. The MLOps capabilities are available in the regions where CodePipeline is also available.

Sample notebooks are available to get you started. Give them a try, and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Special thanks to my colleague Urvashi Chowdhary for her precious assistance during early testing.

Introducing Amazon SageMaker Data Wrangler, a Visual Interface to Prepare Data for Machine Learning

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/introducing-amazon-sagemaker-data-wrangler-a-visual-interface-to-prepare-data-for-machine-learning/

Today, I’m extremely happy to announce Amazon SageMaker Data Wrangler, a new capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface.

Whenever I ask a group of data scientists and ML engineers how much time they actually spend studying ML problems, I often hear a collective sigh, followed by something along the lines of, “20%, if we’re lucky.” When I ask them why, the answer is invariably the same, “data preparation consistently takes up to 80% of our time!

Indeed, preparing data for training is a crucial step of the ML process, and no one would think about botching it up. Typical tasks include:

  • Locating data: finding where raw data is stored, and getting access to it
  • Data visualization: examining statistical properties for each column in the dataset, building histograms, studying outliers
  • Data cleaning: removing duplicates, dropping or filling entries with missing values, removing outliers
  • Data enrichment and feature engineering: processing columns to build more expressive features, selecting a subset of features for training

In the early stage of a new ML project, this is a highly manual process, where intuition and experience play a large part. Using a mix of bespoke tools and open source tools such as pandas or PySpark, data scientists often experiment with different combinations of data transformations, and use them to process datasets before training models. Then, they analyze prediction results and iterate. As important as this is, looping through this process again and again can be time-consuming, tedious, and error-prone.

At some point, you will hit the right level of accuracy (or whatever other metric you’ve picked), and you’ll then want to train on the full dataset in your production environment. However, you’ll first have to reproduce and automate the exact data preparation steps that you experimented within your sandbox. Unfortunately, there’s always room for error given the interactive nature of this work, even if you carefully document it.

Last but not least, you’ll have to manage and scale your data processing infrastructure before you get to the finish line. Now that I think of it, 80% of your time may not be enough to do all of this!

Introducing Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler is integrated in Amazon SageMaker Studio, our fully managed integrated development environment (IDE) for ML. With just a few clicks, you can connect to data sources, explore and visualize data, apply built-in transformations as well as your own, export the resulting code to an auto-generated script, and run it on managed infrastructure. Let’s look at each step in more detail.

Obviously, data preparation starts with locating and accessing data. Out of the box, SageMaker Data Wrangler lets you easily and quickly connect to Amazon Simple Storage Service (S3), Amazon Athena, Amazon Redshift and AWS Lake Formation. You can also import data from Amazon SageMaker Feature Store. As all things AWS, access management is governed by AWS Identity and Access Management (IAM), based on the permissions attached to your SageMaker Studio instance.

Once you’ve connected to your data sources, you’ll probably want to visualize your data. Using the SageMaker Data Wrangler user interface, you can view table summaries, histograms, and scatter plots in seconds. You can also build your own custom graphs by simply copying and running code written with the popular Altair open source library.

Once you’ve got a good grasp on what your data looks like, it’s time to start preparing it. SageMaker Data Wrangler includes 300+ built-in transformations, such as finding and replacing data, splitting/renaming/dropping columns, scaling numerical values, encoding categorical values, and so on. All you have to do is select the transformation in a drop-down list, and fill in the parameters it may require. You can then preview the change, and decide whether you’d like to add it or not to the list of preparation steps for this dataset. If you’d like, you can also add your own code to implement custom transformations, using either pandas, PySpark, or PySpark SQL.

As you add transformation steps to your processing pipeline, you can view its graphical summary in SageMaker Studio. You can also add new stages to the pipeline, for example a new data source, or another group of transformation steps (say, a data cleaning group, followed by a feature engineering group). Thanks to the intuitive user interface, your data preparation pipeline will take shape in front of your eyes, and you’ll instantly be able to check that processed data looks the way that it should.

Early on, you’d certainly love to check your data preparation steps, and also get a sense of their predictive power, wouldn’t you? Good news, then! For regression and classification problem types, the “Quick model” capability lets you select a subset of your data, train a model, and determine which features are contributing most to the predicted outcome. Looking at the model, you can easily diagnose and fix data preparation issues as early as possible, and to determine if additional feature engineering is needed to improve your model performance.

Once you’re happy with your pipeline, you can export it in one click to a Python script that faithfully reproduces your manual steps. You won’t waste any time chasing discrepancies, and you can directly add this code to your ML project.

In addition, you can also export your processing code to:

Now, let’s do a quick demo, and show you how easy it is to work with SageMaker Data Wrangler .

Using Amazon SageMaker Data Wrangler
Opening SageMaker Studio, I create a new data flow in order to process the Titanic dataset, which contains information on passengers, and labels showing whether they survived the wreck or not.

SageMaker screenshot

My dataset is stored as a CSV file in Amazon Simple Storage Service (S3), and I select the appropriate data source.

SageMaker screenshot

Using the built-in tool, I quickly navigate my S3 buckets, and I locate the CSV file containing my data. For larger datasets, SageMaker Data Wrangler also supports the Parquet format.

As I select my file, SageMaker Data Wrangler shows me the first few rows.

SageMaker screenshot

I import the dataset, and I’m presented with an initial view of the data flow. Right-clicking on the dataset, I select “Edit data types” to make sure that SageMaker Data Wrangler has correctly detected the type of each column in the dataset.

SageMaker screenshot

Checking each column, it looks like all types are correct.

SageMaker screenshot

Moving back to the data flow view, I select “Add analysis” this time. This opens a new view where I can visualize data using histograms, scatterplots, and more. For example, I build an histogram showing me the age distribution of passengers according to their survival status, and coloring the bins using their gender. Of course, I can save it for future use.

SageMaker screenshot

Moving back to the data flow view once again, I select “Add transform” in order to start processing the dataset. This opens a new view, showing me the first lines of the dataset, as well as a list of 300+ built-in transforms.

SageMaker screenshot

Pclass, the passenger class, is a categorical variable, and I decide to encode it using one-hot encoding. This creates 3 new columns representing different dimensions, and I can preview them. As this is exactly what I wanted, I apply this transform for good. Likewise, I apply the same transform to the Sex column.

SageMaker screenshot

Then, I drop the original Pclass column. Using the same transform, I also drop the Name column.

SageMaker screenshot

In order to get a quick idea on whether these transformations increase or decrease the accuracy of the model, I can create a analysis that trains a model on the spot. As my problem is a binary classification problem, SageMaker Data Wrangler uses a metric called the F1 score. 0.749 is a good start, and additional processing would certainly improve it. I can also see which features contribute most to the predicted outcome: sex, age, and being a third class passenger.

SageMaker screenshot

Then, moving to the “Export” view, I select all the transforms I’ve created so far, in order to add them to my ML project.

SageMaker screenshot

Here, I select “Python Code” to generate a Python script. Other options are available for Amazon SageMaker Processing, Amazon SageMaker Pipelines, and Amazon SageMaker Feature Store.

SageMaker screenshot

A few seconds later, the script is available. I could add it as is to my ML project, and rest assured that my data preparation steps would be consistent with the interactive transforms that I’ve created above.

SageMaker screenshot

Getting Started
As you can see, Amazon SageMaker Data Wrangler makes it really easy to work interactively on data preparation steps, before transforming them into code that can be used immediately for experimentation and production.

You can start using this capability today in all regions where SageMaker Studio is available.

Give it a try, and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Special thanks to my colleague Peter Liu for his precious help during early testing.

New – Store, Discover, and Share Machine Learning Features with Amazon SageMaker Feature Store

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-store-discover-and-share-machine-learning-features-with-amazon-sagemaker-feature-store/

Today, I’m extremely happy to announce Amazon SageMaker Feature Store, a new capability of Amazon SageMaker that makes it easy for data scientists and machine learning engineers to securely store, discover and share curated data used in training and prediction workflows.

For all the importance of selecting the right algorithm to train machine learning (ML) models, experienced practitioners know how crucial it is to feed it with high-quality data. Cleaning data is a good first step, and ML workflows routinely include steps to fill missing values, remove outliers, and so on. Then, they often move on to transforming data, using a mix of common and arcane techniques known as “feature engineering.”

Simply put, the purpose of feature engineering is to transform your data and to increase its expressiveness so that the algorithm may learn better. For instance, many columnar datasets include strings, such as street addresses. To most ML algorithms, strings are meaningless, and they need to be encoded in a numerical representation. Thus, you could replace street addresses with GPS coordinates, a much more expressive way to learn the concept of location. In other words, if data is the new oil, then feature engineering is the refining process that turns it into high-octane jet fuel that helps models get to stratospheric accuracy.

Indeed, ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project will lead to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity, as different teams often run identical jobs, or even write duplicate feature engineering code because they have no knowledge of prior work.

There’s another hard problem that ML teams have to solve. As models are trained on engineered datasets, it’s imperative to apply the same transformations to data sent for prediction. This often means rewriting feature engineering code, sometimes in a different language, integrating it in your prediction workflow, and running it at prediction time. This whole process is not only time-consuming, it can also introduce inconsistencies, as even the tiniest variation in a data transform can have a large impact on predictions.

In order to solve these problems, ML teams sometimes build a feature store, a central repository where they can keep and retrieve engineered data used in their training and predictions jobs. As useful as feature stores are, building and managing your own involves a lot of engineering, infrastructure, and operational effort that takes valuable time away from actual ML work. Customers asked us for a better solution, and we got to work.

Introducing Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed centralized repository for your ML features, making it easy to securely store and retrieve features without having to manage any infrastructure. It’s part of Amazon SageMaker, our fully managed service for ML, and supports all algorithms. It’s also integrated with Amazon SageMaker Studio, our web-based development environment for ML.

Features stored in SageMaker Feature Store are organized in groups, and tagged with metadata. Thanks to this, you can quickly discover which features are available, and whether they’re suitable for your models. Multiple teams can also easily share and re-use features, reducing the cost of development and accelerating innovation.

Once stored, features can be retrieved and used in your SageMaker workflows: model training, batch transform, and real-time prediction with low latency. Not only do you avoid duplicating work, you also build consistent workflows that use the same consistent features stored in the offline and online stores.

The Climate Corporation (Climate) is a subsidiary of Bayer, and the industry leader in bringing digital innovation to farmers. Says Daniel McCaffrey, Vice President, Data and Analytics, Climate: “At Climate, we believe in providing the world’s farmers with accurate information to make data driven decisions and maximize their return on every acre. To achieve this, we have invested in technologies such as machine learning tools to build models using measurable entities known as features, such as yield for a grower’s field. With Amazon SageMaker Feature Store, we can accelerate the development of ML models with a central feature store to access and reuse features across multiple teams easily. SageMaker Feature Store makes it easy to access features in real-time using the online store, or run features on a schedule using the offline store for different use cases, and we can develop ML models faster.”

Care.com, the world’s leading platform for finding and managing high-quality family care, is also using Amazon SageMaker Feature Store. This is what Clemens Tummeltshammer, Data Science Manager, Care.com, told us: “A strong care industry where supply matches demand is essential for economic growth from the individual family up to the nation’s GDP. We’re excited about Amazon SageMaker Feature Store and Amazon SageMaker Pipelines , as we believe they will help us scale better across our data science and development teams, by using a consistent set of curated data that we can use to build scalable end-to-end machine learning model pipelines from data preparation to deployment. With the newly announced capabilities of Amazon SageMaker, we can accelerate development and deployment of our ML models for different applications, helping our customers make better informed decisions through faster real-time recommendations.

Now, let’s see how you can get started.

Storing and Retrieving Features with Amazon SageMaker Feature Store
Once you’ve run your feature engineering code on your data, you can organize and store your engineered features in SageMaker Feature Store, by grouping them in feature groups. A feature group is a collection of records, similar to rows in a table. Each record has a unique identifier, and holds the engineered feature values for one of the data instances in your original data source. Optionally, you can choose to encrypt the data at rest using your own AWS Key Management Service (KMS) key that is unique for each feature group.

How you define feature groups is up to you. For example, you could create one per data source (CSV files, database tables, and so on), and use a convenient unique column as the record identifier (primary key, customer id, transaction id, and so on).

Once you’ve got your groups figured out, you should repeat the following steps for each group:

  1. Create feature definitions, with the name and the type of each feature in a record (Fractional, Integral, or String).
  2. Create each feature group with the create_feature_group() API:
         # The name of the feature group
         # The name of the column acting as the record identifier
         # The name of the column action as the feature timestamp
         EventTimeFeatureName = event_time_feature_name,
         # A list of feature names and types
         # The S3 location for the offline feature store
         # Optionally, enable the online feature store
         # An IAM role
  3. In each feature group, store records containing a collection of feature name/feature value pairs, using the put_record() API:

    For faster ingestion, you could create multiple threads and parallelize this operation.

At this point, features will be available in Amazon SageMaker Feature Store. Thanks to the offline store, you can use services such as Amazon Athena, AWS Glue, or Amazon EMR to build datasets for training: fetch the corresponding JSON objects in S3, select the features that you need, and save them in S3 in the format expected by your ML algorithm. From then on, it’s SageMaker business as usual!

In addition, you can use the get_record() API to access individual records stored in the online store, passing the group name and the unique identifier of the record you want to access, like so:

record = sm_feature_store.get_record(
    RecordIdentifierValue={"IntegralValue": 5962}

Amazon SageMaker Feature Store is designed for fast and efficient access for real time inference, with a P95 latency lower than 10ms for a 15-kilobyte payload. This makes it possible to query for engineered features at prediction time, and to replace raw features sent by the upstream application with the exact same features used to train the model. Feature inconsistencies are eliminated by design, letting you focus on building the best models instead of chasing bugs.

Finally, as SageMaker Feature Store includes feature creation timestamps, you can retrieve the state of your features at a particular point in time.

As Amazon SageMaker Feature Store is integrated with SageMaker Studio, I can see my two features groups there.

SageMaker screenshot

Right-clicking on “Open feature group detail”, I open the identity feature group.

SageMaker screenshot

I can see feature definitions.

SageMaker screenshot

Finally, I can generate queries for the offline store, which I could add to a Amazon SageMaker Data Wrangler workflow to load features prior to training.

SageMaker screenshot

How to Get Started with Amazon SageMaker Feature Store
As you can see, SageMaker Feature Store makes it easy to store, retrieve, and share features required by your training and prediction workflows.

SageMaker Feature Store is available in all regions where SageMaker is available. Pricing is based on feature reads and writes, and on the total amount of data stored.

Here are sample notebooks that will help you get started right away. Give them a try, and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Amazon HealthLake Stores, Transforms, and Analyzes Health Data in the Cloud

Post Syndicated from Harunobu Kameda original https://aws.amazon.com/blogs/aws/new-amazon-healthlake-to-store-transform-and-analyze-petabytes-of-health-and-life-sciences-data-in-the-cloud/

Healthcare organizations collect vast amounts of patient information every day, from family history and clinical observations to diagnoses and medications. They use all this data to try to compile a complete picture of a patient’s health information in order to provide better healthcare services. Currently, this data is distributed across various systems (electronic medical records, laboratory systems, medical image repositories, etc.) and exists in dozens of incompatible formats.

Emerging standards, such as Fast Healthcare Interoperability Resources (FHIR), aim to address this challenge by providing a consistent format for describing and exchanging structured data across these systems. However, much of this data is unstructured information contained in medical records (e.g., clinical records), documents (e.g., PDF lab reports), forms (e.g., insurance claims), images (e.g., X-rays, MRIs), audio (e.g., recorded conversations), and time series data (e.g., heart electrocardiogram) and it is challenging to extract this information.

It can take weeks or months for a healthcare organization to collect all this data and prepare it for transformation (tagging and indexing), structuring, and analysis. Furthermore, the cost and operational complexity of doing all this work is prohibitive for most healthcare organizations.

Many data to analyze

Today, we are happy to announce Amazon HealthLake, a fully managed, HIPAA-eligible service, now in preview, that allows healthcare and life sciences customers to aggregate their health information from different silos and formats into a centralized AWS data lake. HealthLake uses machine learning (ML) models to normalize health data and automatically understand and extract meaningful medical information from the data so all this information can be easily searched. Then, customers can query and analyze the data to understand relationships, identify trends, and make predictions.

How It Works
Amazon HealthLake supports copying your data from on premises to the AWS Cloud, where you can store your structured data (like lab results) as well as unstructured data (like clinical notes), which HealthLake will tag and structure in FHIR. All the data is fully indexed using standard medical terms so you can quickly and easily query, search, analyze, and update all of your customers’ health information.

Overview of HealthLake

With HealthLake, healthcare organizations can collect and transform patient health information in minutes and have a complete view of a patients medical history, structured in the FHIR industry standard format with powerful search and query capabilities.

From the AWS Management Console, healthcare organizations can use the HealthLake API to copy their on-premises healthcare data to a secure data lake in AWS with just a few clicks. If your source system is not configured to send data in FHIR format, you can use a list of AWS partners to easily connect and convert your legacy healthcare data format to FHIR.

HealthLake is Powered by Machine Learning
HealthLake uses specialized ML models such as natural language processing (NLP) to automatically transform raw data. These models are trained to understand and extract meaningful information from unstructured health data.

For example, HealthLake can accurately identify patient information from medical histories, physician notes, and medical imaging reports. It then provides the ability to tag, index, and structure the transformed data to make it searchable by standard terms such as medical condition, diagnosis, medication, and treatment.

Queries on tens of thousands of patient records are very simple. For example, a healthcare organization can create a list of diabetic patients based on similarity of medications by selecting “diabetes” from the standard list of medical conditions, selecting “oral medications” from the treatment menu, and refining the gender and search.

Healthcare organizations can use Juypter Notebook templates in Amazon SageMaker to quickly and easily run analysis on the normalized data for common tasks like diagnosis predictions, hospital re-admittance probability, and operating room utilization forecasts. These models can, for example, help healthcare organizations predict the onset of disease. With just a few clicks in a pre-built notebook, healthcare organizations can apply ML to their historical data and predict when a diabetic patient will develop hypertension in the next five years. Operators can also build, train, and deploy their own ML models on data using Amazon SageMaker directly from the AWS management console.

Let’s Create Your Own Data Store and Start to Test
Starting to use HealthLake is simple. You access AWS Management Console, and click select Create a datastore.

If you click Preload data, HealthLake will load test data and you can start to test its features. You can also upload your own data if you already have FHIR 4 compliant data. You upload it to S3 buckets, and import it to set its bucket name.

Once your Data Store is created, you can perform a Search, Create, Read, Update or Delete FHIR Query Operation. For example, if you need a list of every patient located in New York, your query setting looks like the screenshots below. As per the FHIR specification, deleted data is only hidden from analysis and results; it is not deleted from the service, only versioned.

Creating Query


You can choose Add search parameter for more nested conditions of the query as shown below.

Amazon HealthLake is Now in Preview
Amazon HealthLake is in preview starting today in US East (N. Virginia). Please check our web site and technical documentation for more information.

– Kame

Bringing machine learning to more builders through databases and analytics services

Post Syndicated from Swami Sivasubramanian original https://aws.amazon.com/blogs/big-data/bringing-machine-learning-to-more-builders-through-databases-and-analytics-services/

Machine learning (ML) is becoming more mainstream, but even with the increasing adoption, it’s still in its infancy. For ML to have the broad impact that we think it can have, it has to get easier to do and easier to apply. We launched Amazon SageMaker in 2017 to remove the challenges from each stage of the ML process, making it radically easier and faster for everyday developers and data scientists to build, train, and deploy ML models. SageMaker has made ML model building and scaling more accessible to more people, but there’s a large group of database developers, data analysts, and business analysts who work with databases and data lakes where much of the data used for ML resides. These users still find it too difficult and involved to extract meaningful insights from that data using ML.

This group is typically proficient in SQL but not Python, and must rely on data scientists to build the models needed to add intelligence to applications or derive predictive insights from data. And even when you have the model in hand, there’s a long and involved process to prepare and move data to use the model. The result is that ML isn’t being used as much as it can be.

To meet the needs of this large and growing group of builders, we’re integrating ML into AWS databases, analytics, and business intelligence (BI) services.

AWS customers generate, process, and collect more data than ever to better understand their business landscape, market, and customers. And you don’t just use one type of data store for all your needs. You typically use several types of databases, data warehouses, and data lakes, to fit your use case. Because all these use cases could benefit from ML, we’re adding ML capabilities to our purpose-built databases and analytics services so that database developers, data analysts, and business analysts can train models on their data or add inference results right from their database, without having to export and process their data or write large amounts of ETL code.

Machine Learning for database developers

At re:Invent last year, we announced ML integrated inside Amazon Aurora for developers working with relational databases. Previously, adding ML using data from Aurora to an application was a very complicated process. First, a data scientist had to build and train a model, then write the code to read data from the database. Next, you had to prepare the data so it can be used by the ML model. Then, you called an ML service to run the model, reformat the output for your application, and finally load it into the application.

Now, with a simple SQL query in Aurora, you can add ML to an enterprise application. When you run an ML query in Aurora using SQL, it can directly access a wide variety of ML models from Amazon SageMaker and Amazon Comprehend. The integration between Aurora and each AWS ML service is optimized, delivering up to 100 times better throughput when compared to moving data between Aurora and SageMaker or Amazon Comprehend without this integration. Because the ML model is deployed separately from the database and the application, each can scale up or scale out independently of the other.

In addition to making ML available in relational databases, combining ML with certain types of non-relational database models can also lead to better predictions. For example, database developers use Amazon Neptune, a purpose-built, high-performance graph database, to store complex relationships between data in a graph data model. You can query these graphs for insights and patterns and apply the results to implement capabilities such as product recommendations or fraud detection.

However, human intuition and analyzing individual queries is not enough to discover the full breadth of insights available from large graphs. ML can help, but as was the case with relational databases it requires you to do a significant amount of heavy lifting upfront to prepare the graph data and then select the best ML model to run against that data. The entire process can take weeks.

To help with this, today we announced the general availability of Amazon Neptune ML to provide database developers access to ML purpose-built for graph data. This integration is powered by SageMaker and uses the Deep Graph Library (DGL), a framework for applying deep learning to graph data. It does the hard work of selecting the graph data needed for ML training, automatically choosing the best model for the selected data, exposing ML capabilities via simple graph queries, and providing templates to allow you to customize ML models for advanced scenarios. The following diagram illustrates this workflow.

And because the DGL is purpose-built to run deep learning on graph data, you can improve accuracy of most predictions by over 50% compared to that of traditional ML techniques.

Machine Learning for data analysts

At re:Invent last year, we announced ML integrated inside Amazon Athena for data analysts. With this integration, you can access more than a dozen built-in ML models or use your own models in SageMaker directly from ad-hoc queries in Athena. As a result, you can easily run ad-hoc queries in Athena that use ML to forecast sales, detect suspicious logins, or sort users into customer cohorts.

Similarly, data analysts also want to apply ML to the data in their Amazon Redshift data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day. These Amazon Redshift users want to run ML on their data in Amazon Redshift without having to write a single line of Python. Today we announced the preview of Amazon Redshift ML to do just that.

Amazon Redshift now enables you to run ML algorithms on Amazon Redshift data without manually selecting, building, or training an ML model. Amazon Redshift ML works with Amazon SageMaker Autopilot, a service that automatically trains and tunes the best ML models for classification or regression based on your data while allowing full control and visibility.

When you run an ML query in Amazon Redshift, the selected data is securely exported from Amazon Redshift to Amazon Simple Storage Service (Amazon S3). SageMaker Autopilot then performs data cleaning and preprocessing of the training data, automatically creates a model, and applies the best model. All the interactions between Amazon Redshift, Amazon S3, and SageMaker are abstracted away and automatically occur. When the model is trained, it becomes available as a SQL function for you to use. The following diagram illustrates this workflow.

Rackspace Technology – a leading end-to-end multicloud technology services company, and Slalom –  a modern consulting firm focused on strategy, technology, and business transformation are both users of Redshift ML in preview.

Nihar Gupta, General Manager for Data Solutions at Rackspace Technology says “At Rackspace Technology, we help companies elevate their AI/ML operationsthe seamless integration with Amazon SageMaker will empower data analysts to use data in new ways, and provide even more insight back to the wider organization.”

And Marcus Bearden, Practice Director at Slalom shared “We hear from our customers that they want to have the skills and tools to get more insight from their data, and Amazon Redshift is a popular cloud data warehouse that many of our customers depend on to power their analytics, the new Amazon Redshift ML feature will make it easier for SQL users to get new types of insight from their data with machine learning, without learning new skills.”

Machine Learning for business analysts

To bring ML to business analysts, we launched new ML capabilities in Amazon QuickSight earlier this year called ML Insights. ML Insights uses SageMaker Autopilot to enable business analysts to perform ML inference on their data and visualize it in BI dashboards with just a few clicks. You can get results for different use cases that require ML, such as anomaly detection to uncover hidden insights by continuously analyzing billions of data points, to do forecasting, to predict growth, and other business trends. In addition, QuickSight can also give you an automatically generated summary in plain language (a capability we call auto-narratives), which interprets and describes what the data in your dashboard means. See the following screenshot for an example.

Customers like Expedia Group, Tata Consultancy Services, and Ricoh Company are already benefiting from ML out of the box with QuickSight. These human-readable narratives enable you to quickly interpret the data in a shared dashboard and focus on the insights that matter most.

In addition, customers have also been interested in asking questions of their business data in plain language and receiving answers in near-real time. Although some BI tools and vendors have attempted to solve this challenge with Natural Language Query (NLQ), the existing approaches require that you first spend months in advance preparing and building a model on a pre-defined set of data, and even then, you still have no way of asking ad hoc questions when those questions require a new calculation that wasn’t pre-defined in the data model. For example, the question “What is our year-over-year growth rate?” requires that “growth rate” be pre-defined as a calculation in the model. With today’s BI tools, you need to work with your BI teams to create and update the model to account for any new calculation or data, which can take days or weeks of effort.

Last week, we announced Amazon QuickSight Q. ‘Q’ gives business analysts the ability to ask any question of all their data and receive an accurate answer in seconds. To ask a question, you simply type it into the QuickSight Q search bar using natural language and business terminology that you’re familiar with. Q uses ML (natural language processing, schema understanding, and semantic parsing for SQL code generation) to automatically generate a data model that understands the meaning of and relationships between business data, so you can get answers to your business questions without waiting weeks for a data model to be built. Because Q eliminates the need to build a data model, you’re also not limited to asking only a specific set of questions. See the following screenshot for an example.

Best Western Hotels & Resorts is a privately-held hotel brand with a global network of approximately 4,700 hotels in over 100 countries and territories worldwide. “With Amazon QuickSight Q, we look forward to enabling our business partners to self-serve their ad hoc questions while reducing the operational overhead on our team for ad hoc requests,” said Joseph Landucci, Senior Manager of Database and Enterprise Analytics at Best Western Hotels & Resorts. “This will allow our partners to get answers to their critical business questions quickly by simply typing and searching their questions in plain language.”


For ML to have a broad impact, we believe it has to get easier to do and easier to apply. Database developers, data analysts, and business analysts who work with databases and data lakes have found it too difficult and involved to extract meaningful insights from their data using ML. To meet the needs of this large and growing group of builders, we’ve added ML capabilities to our purpose-built databases and analytics services so that database developers, data analysts, and business analysts can all use ML more easily without the need to be an ML expert. These capabilities put ML in the hands of every data professional so that they can get the most value from their data.

About the Authors

Swami Sivasubramanian is Vice President at AWS in charge of all Amazon AI and Machine Learning services. His team’s mission is “to put machine learning capabilities in the hands on every developer and data scientist.” Swami and the AWS AI and ML organization work on all aspects of machine learning, from ML frameworks (Tensorflow, Apache MXNet and PyTorch) and infrastructure, to Amazon SageMaker (an end-to-end service for building, training and deploying ML models in the cloud and at the edge), and finally AI services (Transcribe, Translate, Personalize, Forecast, Rekognition, Textract, Lex, Comprehend, Kendra, etc.) that make it easier for app developers to incorporate ML into their apps with no ML experience required.

Previously, Swami managed AWS’s NoSQL and big data services. He managed the engineering, product management, and operations for AWS database services that are the foundational building blocks for AWS: DynamoDB, Amazon ElastiCache (in-memory engines), Amazon QuickSight, and a few other big data services in the works. Swami has been awarded more than 250 patents, authored 40 referred scientific papers and journals, and participates in several academic circles and conferences.


Herain Oberoi leads Product Marketing for AWS’s Databases, Analytics, BI, and Blockchain services. His team is responsible for helping customers learn about, adopt, and successfully use AWS services. Prior to AWS, he held various product management and marketing leadership roles at Microsoft and a successful startup that was later acquired by BEA Systems. When he’s not working, he enjoys spending time with his family, gardening, and exercising.