Tag Archives: artificial intelligence

Can AI and Automation Deliver a COVID-19 Antiviral While It Still Matters?

Post Syndicated from Megan Scudellari original https://spectrum.ieee.org/artificial-intelligence/medical-ai/can-ai-and-automation-deliver-a-covid19-antiviral-while-it-still-matters

Within moments of meeting each other at a conference last year, Nathan Collins and Yann Gaston-Mathé began devising a plan to work together. Gaston-Mathé runs a startup that applies automated software to the design of new drug candidates. Collins leads a team that uses an automated chemistry platform to synthesize new drug candidates.

“There was an obvious synergy between their technology and ours,” recalls Gaston-Mathé, CEO and cofounder of Paris-based Iktos.

In late 2019, the pair launched a project to create a brand-new antiviral drug that would block a specific protein exploited by influenza viruses. Then the COVID-19 pandemic erupted across the world stage, and Gaston-Mathé and Collins learned that the viral culprit, SARS-CoV-2, relied on a protein that was 97 percent similar to their influenza protein. The partners pivoted.

Their companies are just two of hundreds of biotech firms eager to overhaul the drug-discovery process, often with the aid of artificial intelligence (AI) tools. The first set of antiviral drugs to treat COVID-19 will likely come from sifting through existing drugs. Remdesivir, for example, was originally developed to treat Ebola, and it has been shown to speed the recovery of hospitalized COVID-19 patients. But a drug made for one condition often has side effects and limited potency when applied to another. If researchers can produce an ­antiviral that specifically targets SARS-CoV-2, the drug would likely be safer and more effective than a repurposed drug.

There’s one big problem: Traditional drug discovery is far too slow to react to a pandemic. Designing a drug from scratch typically takes three to five years—and that’s before human clinical trials. “Our goal, with the combination of AI and automation, is to reduce that down to six months or less,” says Collins, who is chief strategy officer at SRI Biosciences, a division of the Silicon Valley research nonprofit SRI International. “We want to get this to be very, very fast.”

That sentiment is shared by small biotech firms and big pharmaceutical companies alike, many of which are now ramping up automated technologies backed by supercomputing power to predict, design, and test new antivirals—for this pandemic as well as the next—with unprecedented speed and scope.

“The entire industry is embracing these tools,” says Kara Carter, president of the International Society for Antiviral Research and executive vice president of infectious disease at Evotec, a drug-discovery company in Hamburg. “Not only do we need [new antivirals] to treat the SARS-CoV-2 infection in the population, which is probably here to stay, but we’ll also need them to treat future agents that arrive.”

There are currently about 200 known viruses that infect humans. Although viruses represent less than 14 percent of all known human pathogens, they make up two-thirds of all new human pathogens discovered since 1980.

Antiviral drugs are fundamentally different from vaccines, which teach a person’s immune system to mount a defense against a viral invader, and antibody treatments, which enhance the body’s immune response. By contrast, anti­virals are chemical compounds that directly block a virus after a person has become infected. They do this by binding to specific proteins and preventing them from functioning, so that the virus cannot copy itself or enter or exit a cell.

The SARS-CoV-2 virus has an estimated 25 to 29 proteins, but not all of them are suitable drug targets. Researchers are investigating, among other targets, the virus’s exterior spike protein, which binds to a receptor on a human cell; two scissorlike enzymes, called proteases, that cut up long strings of viral proteins into functional pieces inside the cell; and a polymerase complex that makes the cell churn out copies of the virus’s genetic material, in the form of single-stranded RNA.

But it’s not enough for a drug candidate to simply attach to a target protein. Chemists also consider how tightly the compound binds to its target, whether it binds to other things as well, how quickly it metabolizes in the body, and so on. A drug candidate may have 10 to 20 such objectives. “Very often those objectives can appear to be anticorrelated or contradictory with each other,” says Gaston-Mathé.

Compared with antibiotics, antiviral drug discovery has proceeded at a snail’s pace. Scientists advanced from isolating the first antibacterial molecules in 1910 to developing an arsenal of powerful antibiotics by 1944. By contrast, it took until 1951 for researchers to be able to routinely grow large amounts of virus particles in cells in a dish, a breakthrough that earned the inventors a Nobel Prize in Medicine in 1954.

And the lag between the discovery of a virus and the creation of a treatment can be heartbreaking. According to the World Health Organization, 71 million people worldwide have chronic hepatitis C, a major cause of liver cancer. The virus that causes the infection was discovered in 1989, but effective antiviral drugs didn’t hit the market until 2014.

While many antibiotics work on a range of microbes, most antivirals are highly specific to a single virus—what those in the business call “one bug, one drug.” It takes a detailed understanding of a virus to develop an antiviral against it, says Che Colpitts, a virologist at Queen’s University, in Canada, who works on antivirals against RNA viruses. “When a new virus emerges, like SARS-CoV-2, we’re at a big disadvantage.”

Making drugs to stop viruses is hard for three main reasons. First, viruses are the Spartans of the pathogen world: They’re frugal, brutal, and expert at evading the human immune system. About 20 to 250 nanometers in diameter, viruses rely on just a few parts to operate, hijacking host cells to reproduce and often destroying those cells upon departure. They employ tricks to camouflage their presence from the host’s immune system, including preventing infected cells from sending out molecular distress beacons. “Viruses are really small, so they only have a few components, so there’s not that many drug targets available to start with,” says Colpitts.

Second, viruses replicate quickly, typically doubling in number in hours or days. This constant copying of their genetic material enables viruses to evolve quickly, producing mutations able to sidestep drug effects. The virus that causes AIDS soon develops resistance when exposed to a single drug. That’s why a cocktail of antiviral drugs is used to treat HIV infection.

Finally, unlike bacteria, which can exist independently outside human cells, viruses invade human cells to propagate, so any drug designed to eliminate a virus needs to spare the host cell. A drug that fails to distinguish between a virus and a cell can cause serious side effects. “Discriminating between the two is really quite difficult,” says Evotec’s Carter, who has worked in antiviral drug discovery for over three decades.

And then there’s the money barrier. Developing antivirals is rarely profitable. Health-policy researchers at the London School of Economics recently estimated that the average cost of developing a new drug is US $1 billion, and up to $2.8 billion for cancer and other specialty drugs. Because antivirals are usually taken for only short periods of time or during short outbreaks of disease, companies rarely recoup what they spent developing the drug, much less turn a profit, says Carter.

To change the status quo, drug discovery needs fresh approaches that leverage new technologies, rather than incremental improvements, says Christian Tidona, managing director of BioMed X, an independent research institute in Heidelberg, Germany. “We need breakthroughs.”

Iktos’s AI platform was created by a medicinal chemist and an AI expert. To tackle SARS-CoV-2, the company used generative models—deep-learning algorithms that generate new data—to “imagine” molecular structures with a good chance of disabling a key coronavirus protein.

For a new drug target, the software proposes and evaluates roughly 1 million compounds, says Gaston-Mathé. It’s an iterative process: At each step, the system generates 100 virtual compounds, which are tested in silico with predictive models to see how closely they meet the objectives. The test results are then used to design the next batch of compounds. “It’s like we have a very, very fast chemist who is designing compounds, testing compounds, getting back the data, then designing another batch of compounds,” he says.

The computer isn’t as smart as a human chemist, Gaston-Mathé notes, but it’s much faster, so it can explore far more of what people in the field call “chemical space”—the set of all possible organic compounds. Unexplored chemical space is huge: Biochemists estimate that there are at least 1063 possible druglike molecules, and that 99.9 percent of all possible small molecules or compounds have never been synthesized.

Still, designing a chemical compound isn’t the hardest part of creating a new drug. After a drug candidate is designed, it must be synthesized, and the highly manual process for synthesizing a new chemical hasn’t changed much in 200 years. It can take days to plan a synthesis process and then months to years to optimize it for manufacture.

That’s why Gaston-Mathé was eager to send Iktos’s AI-generated designs to Collins’s team at SRI Biosciences. With $13.8 million from the Defense Advanced Research Projects Agency, SRI Biosciences spent the last four years automating the synthesis process. The company’s automated suite of three technologies, called SynFini, can produce new chemical compounds in just hours or days, says Collins.

First, machine-learning software devises possible routes for making a desired molecule. Next, an inkjet printer platform tests the routes by printing out and mixing tiny quantities of chemical ingredients to see how they react with one another; if the right compound is produced, the platform runs tests on it. Finally, a tabletop chemical plant synthesizes milligrams to grams of the desired compound.

Less than four months after Iktos and SRI Biosciences announced their collaboration, they had designed and synthesized a first round of antiviral candidates for SARS-CoV-2. Now they’re testing how well the compounds work on actual samples of the virus.

Theirs isn’t the only collaboration applying new tools to drug discovery. In late March, Alex Zhavoronkov, CEO of Hong Kong–based Insilico Medicine, came across a YouTube video showing three virtual-reality avatars positioning colorful, sticklike fragments in the side of a bulbous blue protein. The three researchers were using VR to explore how compounds might bind to a SARS-CoV-2 enzyme. Zhavoronkov contacted the startup that created the simulation—Nanome, in San Diego—and invited it to examine Insilico’s ­AI-generated molecules in virtual reality.

Insilico runs an AI platform that uses biological data to train deep-learning algorithms, then uses those algorithms to identify molecules with druglike features that will likely bind to a protein target. A four-day training sprint in late January yielded 100 molecules that appear to bind to an important SARS-CoV-2 protease. The company recently began synthesizing some of those molecules for laboratory testing.

Nanome’s VR software, meanwhile, allows researchers to import a molecular structure, then view and manipulate it on the scale of individual atoms. Like human chess players who use computer programs to explore potential moves, chemists can use VR to predict how to make molecules more druglike, says Nanome CEO Steve McCloskey. “The tighter the interface between the human and the computer, the more information goes both ways,” he says.

Zhavoronkov sent data about several of Insilico’s compounds to Nanome, which re-created them in VR. Nanome’s chemist demonstrated chemical tweaks to potentially improve each compound. “It was a very good experience,” says Zhavoronkov.

Meanwhile, in March, Takeda Pharmaceutical Co., of Japan, invited Schrödinger, a New York–based company that develops chemical-simulation software, to join an alliance working on antivirals. Schrödinger’s AI focuses on the physics of how proteins interact with small molecules and one another.

The software sifts through billions of molecules per week to predict a compound’s properties, and it optimizes for multiple desired properties simultaneously, says Karen Akinsanya, chief biomedical scientist and head of discovery R&D at Schrödinger. “There’s a huge sense of urgency here to come up with a potent molecule, but also to come up with molecules that are going to be well tolerated” by the body, she says. Drug developers are seeking compounds that can be broadly used and easily administered, such as an oral drug rather than an intravenous drug, she adds.

Schrödinger evaluated four protein targets and performed virtual screens for two of them, a computing-intensive process. In June, Google Cloud donated the equivalent of 16 million hours of Nvidia GPU time for the company’s calculations. Next, the alliance’s drug companies will synthesize and test the most promising compounds identified by the virtual screens.

Other companies, including Amazon Web Services, IBM, and Intel, as well as several U.S. national labs are also donating time and resources to the Covid-19 High Performance Computing Consortium. The consortium is supporting 87 projects, which now have access to 6.8 million CPU cores, 50,000 GPUs, and 600 petaflops of computational resources.

While advanced technologies could transform early drug discovery, any new drug candidate still has a long road after that. It must be tested in animals, manufactured in large batches for clinical trials, then tested in a series of trials that, for antivirals, lasts an average of seven years.

In May, the BioMed X Institute in Germany launched a five-year project to build a Rapid Antiviral Response Platform, which would speed drug discovery all the way through manufacturing for clinical trials. The €40 million ($47 million) project, backed by drug companies, will identify ­outside-the-box proposals from young scientists, then provide space and funding to develop their ideas.

“We’ll focus on technologies that allow us to go from identification of a new virus to 10,000 doses of a novel potential therapeutic ready for trials in less than six months,” says BioMed X’s Tidona, who leads the project.

While a vaccine will likely arrive long before a bespoke antiviral does, experts expect COVID-19 to be with us for a long time, so the effort to develop a direct-acting, potent antiviral continues. Plus, having new antivirals—and tools to rapidly create more—can only help us prepare for the next pandemic, whether it comes next month or in another 102 years.

“We’ve got to start thinking differently about how to be more responsive to these kinds of threats,” says Collins. “It’s pushing us out of our comfort zones.”

This article appears in the October 2020 print issue as “Automating Antivirals.”

Why Modeling the Spread of COVID-19 Is So Damn Hard


Post Syndicated from Matthew Hutson original https://spectrum.ieee.org/artificial-intelligence/medical-ai/why-modeling-the-spread-of-covid19-is-so-damn-hard

If you wanted to “flatten the curve” in 2019, you might have been changing students’ grades or stamping down a rug ripple. Today, that phrase refers only to the vital task of reducing the peak number of people concurrently infected with the COVID-19 virus. Beginning in early 2020, graphs depicting the expected number of infections spread through social networks, much like the virus itself. We’ve all become consumers of epidemiological models, the mathematical entities that spit out these ominous trend lines.

Such models have existed for decades but have never received such widespread attention. They’re informing public policy, financial planning, health care allocation, doomsday speculation, and Twitter hot takes. In the first quarter of 2020, government leaders were publicly parsing these computational speculations, making huge decisions about whether to shut down schools, businesses, and travel. Would an unchecked outbreak kill millions, or fizzle out? Which interventions would help the most? How sure could we be of any forecast? Models disagreed, and some people pointed to whichever curve best supported their predilections. It didn’t help that the researchers building the models were still figuring out what the heck they were doing.

There’s more than one way to model an epidemic. Some approaches are pure mathematical abstraction, just trying to get the lines right. Some re-create society in silicon, down to the person. Some combine several techniques. As modelers—a mix of computer scientists, epidemiologists, physicians, statisticians—fumble their way through the darkness of this pandemic, they pull tools off shelves, modify them, and create new ones, adapting as they learn what works and as new information emerges.

They hope, of course, to help quell the current outbreak. But their larger goal is to have tools in place to model any future disease, whether it’s a seasonal flu or the next big bug. “In some ways, forecasting epidemics was still in its infancy when this pandemic started spreading,” said biologist Lauren Ancel Meyers in June. Meyers, the head of the COVID-19 Modeling Consortium at the University of Texas at Austin, added, “And it has matured quite a bit over the last two or three months.” So what has worked—and what hasn’t?

The most common approach to modeling an epidemic is what’s called a compartmental model. You divide the population into several categories and write mathematical rules that dictate how many people move from one category to another at each tick of the model’s clock. First, everyone is susceptible. They’re in the S compartment. Then some people become infected (I), and later they’re removed (R) from the pathogen’s path, through either recovery or death. These models are sometimes called SIR models. Variations include a group that’s exposed (E) to the pathogen but not yet contagious—SEIR models. If postrecovery immunity is temporary, you might recycle recovered people back to S, making it an SIRS (or SEIRS) model. At its most basic, the model is a handful of numbers indicating how many people are in each compartment, plus differential equations governing the transitions between compartments. Each equation has adjustable parameters—the knobs that set the flow rates.

A graph over time of the removed (R) population usually resembles a sigmoid or elongated S-curve, as the numbers of dead or recovered rise slowly at first, then more steeply, then gradually plateau. The susceptible population (S) follows the same trend but downward, falling slowly, then quickly, then slowly. Around where the lines cross, at their steepest sections, the line for currently infected (I) forms a hump. This is the curve we want to flatten, lowering the hump’s peak and stretching it out, to lighten the load on hospitals at any given time.

Forecasting the shapes of these lines requires getting the equations right. But their parameters—which can change over time—depend on such varied factors as biology, behavior, politics, the economy, and the weather. Compartmental models are the gold standard, says Sunetra Gupta, an epidemiologist at the University of Oxford, but “it’s a question of what do you strap onto it.”

A prominent group employing a compartmental model is the Institute for Health Metrics and Evaluation (IHME), at the University of Washington, in Seattle. The team actually started out early in the pandemic with a completely different approach called a curve-fitting model. Because the outbreak in the United States lagged behind those in some other countries, this model assumed that the U.S. curve would resemble those prior curves. According to Theo Vos, an epidemiologist at the University of Washington, the aim was to predict the peak in hospital use with curves for China, Italy, and Spain. In late March, with just a few thousand cumulative deaths in the United States, the IHME accurately predicted a rise to about 50,000 over the next four weeks. By April, policymakers and the media were lavishing the IHME model with attention. Dr. Deborah Birx, the White House’s Coronavirus Response Coordinator, and her team talked with the IHME group almost daily.

But the U.S. curve didn’t flatten as quickly as the IHME model anticipated. In mid-April, for example, it predicted that the death toll would reach 60,000 in mid-May, while the actual number turned out to be around 80,000. As the weeks went on, the model began to garner harsh criticism from some epidemiologists and biostatisticians for failing to account for all sources of uncertainty, and for being based on the unlikely assumption that social-distancing policies would be as extensive and effective in the United States as they were in other countries. (Vos notes, “If you read our documentation that we published on our website with model results at the time, you will see that this assumption was clearly stated.”) By the end of April, IHME director Christopher Murray was admitting that his model was “orders of magnitude more optimistic” than others, while still defending its usefulness. In early May, the IHME team added an SEIR model as a central component of their continually evolving system.

Instead of manually defining the parameters in their SEIR equations, the team let computers do it, using Bayesian statistical methods, which estimate the likelihood of various causes for a given outcome. The group regularly receives statistics on COVID-19’s course: how long it’s taking people to show symptoms, how many people are reporting to hospitals, how many people are dying. They also collect data on factors such as mask wearing (from online surveys) and, as a proxy for social distancing, mobility (from anonymized phone location tracking).

To tune the SEIR model, the system tests different model parameters to see which ones result in predictions that best match the recent data. Once the best parameters are chosen, the SEIR model uses them, along with expected changes in the other inputs, to forecast infections and deaths over the next several months. Bayesian techniques incorporate uncertainty, so the model runs a thousand times with slightly different control-knob settings, creating a range of possible outcomes.

One of the most important knobs is reproduction number, or R (not the same as the R in SEIR). R is the number of people each infected person is expected to infect. Typically, if R is above 1.0, the early epidemic grows exponentially. Below 1.0, it fades away. “We learned how to tame an SEIR model,” Vos says. “They’re very reactive to small changes. The tendency is to go exponential.” In a completely abstract model, slight differences in parameters such as R can cause wildly different outcomes, unbound by real-world social and environmental contingencies. Without using statistics to ground parameter setting in hard data, Vos says, “your cases go completely bonkers.”

Others have also combined compartmental models with machine learning. One model called YYG has done well on a hub that feeds forecasts to the U.S. Centers for Disease Control and Prevention (CDC). The YYG model is run solely by an independent data scientist with a master’s degree in computer science from MIT named Youyang Gu. His model is very simple: The only data it uses is daily deaths. From this statistic, it sets parameters—including reproduction number, infection mortality rate, and lockdown fatigue—using a grid search. For each parameter, it considers several options, and it tests every possible combination before choosing the set that best matches the data. It’s like mixing and matching outfits—now let’s try the red shirt with the green pants and yellow socks.

“I was frustrated at the quality of the models back in early April and late March,” Gu says. “Back then, one of the most frequently cited models in the media”—the IHME curve-fitting model—“had deaths going to zero by June. When I looked at the data, I could not see how that was possible, so I just wanted to take my own shot.” By 9 May, when the U.S. death toll almost exactly matched Gu’s prediction of 80,000 by that date, the physician and public-health leader Eric Topol praised the YYG model as “the most accurate #COVID19 model.”

“We’ve shown that a very simple model like ours can do a good job,” Gu says. One benefit of simplicity is agility, he adds: He forecasts 50 states and 70 countries, all in under 30 minutes on his laptop. “Because it’s so simple, it allows me to make changes quickly.” In addition, simpler models with fewer parameters are more likely to generalize to new situations and can also be easier to understand.

One alternative to SEIR models is data-driven models. These churn through data without explicitly accounting for separate categories of people, explains B. Aditya Prakash, a computer scientist at Georgia Tech. His team uses a set of deep-learning models—large neural networks, with tens of thousands of parameters. These networks infer complex relations between input data (such as mobility, testing, and social media) and pandemic outcomes (such as hospitalizations and deaths).

Prakash points out that data-driven models can be good for predicting “composite signals, signals which don’t have a clear epidemiological counterpart.” For instance, if you’re predicting medical visits, that’s a “noisy” signal that depends on not only the number of infections but also all the social and economic factors that might make someone visit a doctor or stay at home. But he concedes that compartmental models are better than deep-learning models for exploring hypotheticals—if we could enact policies that reduced R (reproduction number) by 20 percent, would that change the curve much?—because the model’s control knobs are more visible. And since SEIR models rely on epidemiological theory, they can make longer-term predictions. Deep learning is more tied to the data, so it can be more accurate in the short term, but it’s a black box, with thousands of incomprehensible parameters that are determined by the learning process—so it’s hard to know how well it will extrapolate to other situations or the distant future.

While data-driven models occupy the abstract number-crunching end of the modeling spectrum, the opposite, hyperrealistic end is marked by agent-based models. These are much like the video game The Sims. Each individual in a population is represented by their own bit of code, called an agent, which interacts with other agents as it moves around the world. One of the most successful agent-based models was designed at the University of Sydney. The model has three layers, beginning with a layer of demographics. “We’re essentially creating a digital twin for every person represented in the census,” said Mikhail Prokopenko, a computer scientist at the university. He and his colleagues built a virtual Australia comprising 24 million agents, whose distribution matches the real thing in terms of age, household size, neighborhood size, school size, and so on. The second layer is mobility, in which agents are assigned to both a household and a school or workplace. On top of demographics and mobility, they add the disease, including transmission rates within households, schools, and workplaces, and how the disease progresses in individuals. In 2018, the group published a similar model for the flu that used older census data. They were building an updated model for further flu studies when the COVID-19 epidemic broke out, so they pivoted to capture its distinctive characteristics in their disease-transmission layer.

When set in motion, the model ticks twice a day: People come in contact at school or work in the daytime, then at home at night. It’s like throwing dice over and over. The model covers 180 days in a few hours. The team typically runs tens or hundreds of copies of the model in parallel on a computing cluster to generate a range of outcomes.

The biggest insight reported by the Sydney group was that social distancing helps very little if only 70 percent of people practice it, but successfully squashes COVID-19 incidence if 80 percent of people can manage it over a span of a few months. And 90 percent compliance achieved the same effect in a faster time frame. The model informed both a report to the federal government from the Group of Eight Australian universities, and two reports from the World Health Organization. “We’re all pleased,” Prokopenko says, “that an agent-based model—which we’ve been trying to advocate for so long—at the time of need did a good job.”

Prokopenko says SEIR models have done a “rough job” in Australia, where some forecasts have been off by orders of magnitude. Further, they help you explore hypotheticals but don’t tell you exactly how to intervene. Let’s say the SEIR model tells you that reducing R by 20 percent will cut the speed of the pandemic’s spread in half. But how do you reduce R by 20 percent in the real world? With agent-based models, you can make everyone stay home one day a week and see the predicted effects of that policy.

To date, agent-based models haven’t been used extensively—possibly because they require massive computation power that hasn’t been widely available until recently. Also, they’re hard to calibrate. The Sydney model only started matching reality once the team made the ratio of ill people who were symptomatic much lower in children than in adults—one of COVID-19’s stark differences from flu. “Now that we have the technology and expertise to deploy large-scale agent-based models,” Prokopenko said, “it might make a real difference for the next pandemic.”

Researchers say they’ve learned a lot of lessons modeling this pandemic, lessons that will carry over to the next.

The first set of lessons is all about data. Garbage in, garbage out, they say. Jarad Niemi, an associate professor of statistics at Iowa State University who helps run the forecast hub used by the CDC, says it’s not clear what we should be predicting. Infections, deaths, and hospitalization numbers each have problems, which affect their usefulness not only as inputs for the model but also as outputs. It’s hard to know the true number of infections when not everyone is tested. Deaths are easier to count, but they lag weeks behind infections. Hospitalization numbers have immense practical importance for planning, but not all hospitals release those figures. How useful is it to predict those numbers if you never have the true numbers for comparison? What we need, he said, is systematized random testing of the population, to provide clear statistics of both the number of people currently infected and the number of people who have antibodies against the virus, indicating recovery. Prakash, of Georgia Tech, says governments should collect and release data quickly in centralized locations. He also advocates for central repositories of policy decisions, so modelers can quickly see which areas are implementing which distancing measures.

Researchers also talked about the need for a diversity of models. At the most basic level, averaging an ensemble of forecasts improves reliability. More important, each type of model has its own uses—and pitfalls. An SEIR model is a relatively simple tool for making long-term forecasts, but the devil is in the details of its parameters: How do you set those to match real-world conditions now and into the future? Get them wrong and the model can head off into fantasyland. Data-driven models can make accurate short-term forecasts, and machine learning may be good for predicting complicated factors. But will the inscrutable computations of, for instance, a neural network remain reliable when conditions change? Agent-based models look ideal for simulating possible interventions to guide policy, but they’re a lot of work to build and tricky to calibrate.

Finally, researchers emphasize the need for agility. Niemi of Iowa State says software packages have made it easier to build models quickly, and the code-sharing site GitHub lets people share and compare their models. COVID-19 is giving modelers a chance to try out all their newest tools, says Meyers, of the University of Texas. “The pace of innovation, the pace of development, is unlike ever before,” she says. “There are new statistical methods, new kinds of data, new model structures.”

“If we want to beat this virus,” Prokopenko says, “we have to be as adaptive as it is.”

This article appears in the October 2020 print issue as “The Mess Behind the Models.”

Amazon Transcribe Now Supports Automatic Language Identification

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-transcribe-now-supports-automatic-language-identification/

In 2017, we launched Amazon Transcribe, an automatic speech recognition service that makes it easy for developers to add a speech-to-text capability to their applications. Since then, we added support for more languages, enabling customers globally to transcribe audio recordings in 31 languages, including 6 in real-time.

A popular use case for Amazon Transcribe is transcribing customer calls. This allows companies to analyze the transcribed text using natural language processing techniques to detect sentiment or to identify the most common call causes. If you operate in a country with multiple official languages or across multiple regions, your audio files can contain different languages. Thus, files have to be tagged manually with the appropriate language before transcription can take place. This typically involves setting up teams of multi-lingual speakers, which creates additional costs and delays in processing audio files.

The media and entertainment industry often uses Amazon Transcribe to convert media content into accessible and searchable text files. Use cases include generating subtitles or transcripts, moderating content, and more. Amazon Transcribe is also used by operations team for quality control, for example checking that audio and video are in sync thanks to the timestamps present in the extracted text. However, other problems couldn’t be easily solved, such as verifying that the main spoken language in your videos is correctly labeled to avoid streaming video in the wrong language.

Today, I’m extremely happy to announce that Amazon Transcribe can now automatically identify the dominant language in an audio recording. This feature will help customers build more efficient transcription workflows by getting rid of manual tagging. In addition to the examples mentioned above, you can now also easily use Amazon Transcribe to automatically recognize and transcribe voicemails, meetings, and any form of recorded communication.

Introducing Automatic Language Identification
With a minimum of 30 seconds of audio, Amazon Transcribe can efficiently generate transcripts in the spoken language without wasting time and resources on manual tagging. Automatic identification of the dominant language is available in batch transcription mode for all 31 languages. Thanks to sampling techniques, language identification happens much faster than the transcription itself, in the matter of seconds.

If you’re already using Amazon Transcribe for speech recognition, you just need to enable the feature in the StartTranscriptionJob API. Before your transcription job is complete, the response of the GetTranscriptionJob API will tell the dominant language of the audio recording, and its confidence score between 0 and 1. The transcript lists the top five languages and their respective confidence scores.

Of course, if you want to use Amazon Transcribe exclusively for automatic language identification, you can simply process the API response and ignore the transcript. In this case, you should stick to short 30-45 second audio recordings to minimize costs.

You can also restrict languages that Amazon Transcribe tries to identify, by passing a list of languages to the StartTranscriptionJob API. For example, if your company call center only receives calls in English, Spanish and French, then restricting identifiable languages to this list will increase language identification accuracy.

Now, I’d like to show you how easy it us to use this new feature!

Detecting the Dominant Language With Amazon Transcribe
First, let’s try a high quality sample. I’ll use the audio track from one of my breakout sessions at AWS Summit Paris 2019. I can easily download it using the youtube-dl tool.

$ youtube-dl -f bestaudio https://www.youtube.com/watch?v=AFN5jaTurfA
$ mv AWS\ \&\ EarthCube\ _\ Deep\ learning\ démarrer\ avec\ MXNet\ et\ Tensorflow\ en\ 10\ minutes-AFN5jaTurfA.m4a video.m4a

Using ffmpeg, I shorten the audio clip to 1 minute.

$ ffmpeg -i video.m4a -ss 00:00:00.00 -t 00:01:00.00 video-1mn.m4a

Then, I upload the clip to an Amazon Simple Storage Service (S3) bucket.

$ aws s3 cp video-1mn.m4a s3://jsimon-transcribe-uswest2/

Next, I use the AWS CLI to run a transcription job on this audio clip, with language identification enabled.

$ awscli transcribe start-transcription-job --transcription-job-name video-test --identify-language --media MediaFileUri=s3://jsimon-transcribe-uswest2/video-1mn.m4a

Waiting only a few seconds, I check the status of the job. I could also use a Amazon CloudWatch event to be notified that language identification is complete.

$ awscli transcribe get-transcription-job --transcription-job-name video-test
{
    "TranscriptionJob": {
        "TranscriptionJobName": "video-test",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "fr-FR",
        "MediaSampleRateHertz": 44100,
        "MediaFormat": "mp4",
        "Media": {
        "MediaFileUri": "s3://jsimon-transcribe-uswest2/video-1mn.m4a"
    },
    "Transcript": {},
    "StartTime": 1593704323.312,
"CreationTime": 1593704323.287,

    "Settings": {
        "ChannelIdentification": false,
        "ShowAlternatives": false
    },
    "IdentifyLanguage": true,
    "IdentifiedLanguageScore": 0.915885329246521
    }
}

As highlighted in the output, the dominant language has been correctly detected in seconds, with a high confidence score of 91.59%. A few more seconds later, the transcription job is complete. Running the same CLI call, I can retrieve a link to the transcription, which also includes the top 5 languages for the audio clip, sorted by decreasing score.

"language_identification":[{"score":"0.9159","code":"fr-FR"},{"score":"0.0839","code":"fr-CA"},{"score":"0.0001","code":"en-GB"},{"score":"0.0001","code":"pt-PT"},{"score":"0.0001","code":"de-CH"}]

Adding up French and Canadian French, we pretty much get a score of 100%, so there’s no doubt that this clip is in French. In some cases, you may not care for that level of detail, and you’ll see in the next example how to restrict the list of detected languages.

Restricting the List of Detected Languages
As customer call transcription is a popular use case for Amazon Transcribe, here is a 40-second audio clip (WAV, 8KHz, 16-bit resolution), where I’m reading a paragraph from the French version of the Amazon Transcribe page. As you can hear, quality is pretty awful, and I added background music (Bach-ground, actually) for good measure.

Again, I upload the clip to an S3 bucket, and I use the AWS CLI to transcribe it. This time, I restrict the list of languages to French, Spanish, German, US English, and British English.

$ aws s3 cp speech-8k.wav s3://jsimon-transcribe-uswest2/
$ awscli transcribe start-transcription-job --transcription-job-name speech-8k-test --identify-language --media MediaFileUri=s3://jsimon-transcribe-uswest2/speech-8k.wav --language-options fr-FR es-ES de-DE en-US en-GB

A few seconds later, I check the status of the job.

$ awscli transcribe get-transcription-job --transcription-job-name speech-8k-test
{
    "TranscriptionJob": {
    "TranscriptionJobName": "speech-8k-test",
    "TranscriptionJobStatus": "IN_PROGRESS",
    "LanguageCode": "fr-FR",
    "MediaSampleRateHertz": 8000,
    "MediaFormat": "wav",
    "Media": {
        "MediaFileUri": "s3://jsimon-transcribe-uswest2/speech-8k.wav"
    },
    "Transcript": {},
    "StartTime": 1593705151.446,
"CreationTime": 1593705151.423,

    "Settings": {
        "ChannelIdentification": false,
        "ShowAlternatives": false
    },
    "IdentifyLanguage": true,
    "LanguageOptions": [
        "fr-FR","es-ES","de-DE","en-US","en-GB"
    ],
    "IdentifiedLanguageScore": 0.9995
    }
}

As highlighted in the output, the dominant language has been correctly detected with a very high confidence score in spite of the terrible audio quality. Restricting the list of languages certainly helps, and you should use it whenever possible.

Getting Started
Automatic Language Identification is available today in these regions:

  • US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), AWS GovCloud (US-West).
  • Canada (Central).
  • South America (São Paulo).
  • Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt).
  • Middle East (Bahrain).
  • Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

There is no additional charge on top of the existing pricing. Give it a try, and please send us feedback either through your usual AWS Support contacts, or on the AWS Forum for Amazon Transcribe.

– Julien

AWS Architecture Monthly Magazine: Robotics

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/architecture-monthly-magazine-robotics/

Architecture Monthly: RoboticsSeptember’s issue of AWS Architecture Monthly issue is all about robotics. Discover why iRobot, the creator of your favorite (though maybe not your pet’s favorite) little robot vacuum, decided to move its mission-critical platform to the serverless architecture of AWS. Learn how and why you sometimes need to test in a virtual environment instead of a physical one. You’ll also have the opportunity to hear from technical experts from across the robotics industry who came together for the AWS Cloud Robotics Summit in August.

Our expert this month, Matt Hansen (who has dreamed of building robots since he was a teen), gives us his outlook for the industry and explains why cloud will be an essential part of that.

In September’s Robotics issue

  • Ask an Expert: Matt Hansen, Principle Solutions Architect
  • Blog: Testing a PR2 Robot in a Simulated Hospital
  • Case Study: iRobot
  • Blog: Introduction to Automatic Testing of Robotics Applications
  • Case Study: Multiply Labs Uses AWS RoboMaker to Manufacture Individualized Medicines
  • Demos & Videos: AWS Cloud Robotics Summit (August 18-19, 2020)
  • Related Videos: iRobot and ZS Associates

Survey opportunity

This month, we’re also asking you to take a 10-question survey about your experiences with this magazine. The survey is hosted by an external company (Qualtrics), so the below survey button doesn’t lead to our website. Please note that AWS will own the data gathered from this survey, and we will not share the results we collect with survey respondents. Your responses to this survey will be subject to Amazon’s Privacy Notice. Please take a few moments to give us your opinions.

How to access the magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Can AI Make Bluetooth Contract Tracing Better?

Post Syndicated from Jeremy Hsu original https://spectrum.ieee.org/the-human-os/artificial-intelligence/machine-learning/ai-bluetooth-contact-tracing

IEEE COVID-19 coverage logo, link to landing page

During the COVID-19 pandemic, digital contact tracing apps based on the Bluetooth technology found in smartphones have been deployed by various countries despite the fact that Bluetooth’s baseline performance as a proximity detector remains mostly a mystery.

That is why the U.S. National Institute of Standards and Technology organized a months-long event that leveraged the talents of AI researchers around the world to help evaluate and potentially improve upon that baseline Bluetooth performance for helping detect when smartphone users are standing too close to one another.

The appropriately named Too Close for Too Long (TC4TL) Challenge has yielded a mixed bag for anyone looking to be optimistic about the performance of Bluetooth-based contact tracing and exposure notification apps. The challenge attracted a diverse array of AI research teams from around the world who showed how machine learning might help boost proximity detection by analyzing the patterns in Bluetooth signals and data from other phone sensors. But the teams’ testing results presented during a final evaluation workshop held on August 28 also showed that Bluetooth’s capability alone in detecting nearby phones is shaky at best.

“We showed that if you hold both phones in hand, you’re going to get relatively better proximity detection results on this pilot dataset,” says Omid Sadjadi, a computer scientist at the National Institute of Standards and Technology. “When it comes to everyday scenarios where you put your phones in your pocket or your purse or in other other carrier states and in other locations, that’s when the performance of this proximity detection technology seems to start to degrade.”

Bluetooth Low Energy (BLE) technology was not originally designed to use Bluetooth signals from phones to accurately estimate the distance between phones. But the technology has been thrown into the breach to help hold the line for contact tracing during the pandemic. The main reason why many countries have gravitated toward Bluetooth-based apps is that they generally represent a more privacy-preserving option compared to using location-based technologies such as GPS.

Given the highly improvised nature of this Bluetooth-based solution, it made sense for the U.S. National Institute of Standards and Technology (NIST) to assist in evaluating the technology’s performance. NIST has previously helped establish testing benchmarks and international standards for evaluating widely-used modern technologies such as online search engines and facial recognition. And so the agency was more than willing to step up again when asked by researchers working on digital contact tracing technologies through the MIT PACT project.

But repeating the same evaluation process during a global public health emergency proved far from easy. NIST found itself condensing the typical full-year cycle for the evaluation challenge down into just five months starting in April and ending in August. “It was a tight timeline and could have been stressful at times,” Sadjadi says. “Hopefully we were able to make it easier for the participating teams.” 

But the sped-up schedule did put pressure on the research teams that signed up. A total of 14 groups hailing from six continents registered at the beginning, including seven teams from academia and seven teams from industry. Just four teams ended up meeting the challenge deadline: Contact-Tracing-Project from the Hong Kong University of Science and Technology, LCD from the National University of Córdoba in Argentina, PathCheck representing the MIT-based PathCheck Foundation, and the MITRE team representing the U.S. nonprofit MITRE Corporation.

The challenge did not specifically test Bluetooth-based app frameworks such as the Google Apple Exposure Notification (GAEN) protocol that are currently used in exposure notification or contact tracing apps. Instead, the challenge focused on evaluating whether teams’ machine learning models could improve on the process of detecting a phone’s proximity based on the combination on Bluetooth signal information and data from other common phone sensors such as accelerometers, gyroscopes, and magnetometers.

To provide the training data necessary for the teams’ machine learning models, MITRE Corporation and MIT Lincoln Laboratory staff members helped collect data from pairs of phones held at certain distances and heights near one another. They also included data from different scenarios such as both people holding the phones in their hands, as well as one or both people having the phones in their pockets. The latter is important given how Bluetooth signals can be weakened or deflected by a number of different materials.

“If you’re collecting data for the purpose of training and evaluating automated proximity detection technologies, you need to consider all possible scenarios and phone carriage states that could happen in everyday conditions, whether people are moving around and going shopping, or in nursing homes, or they’re sitting in a classroom or they are sitting at their desk at their work organization,” Sadjadi says. 

One unexpected hiccup occurred when the original NIST development data set—based on 10 different recording sessions with MIT Lincoln Laboratory researchers holding phone pairs in different positions—led to the classic “overfitting” problem where machine learning performance is tuned too specifically to the conditions in a particular data set. The machine learning models were able to identify specific recording sessions by using air pressure information from the altitude sensors of the iPhones. That gave the models a performance boost in phone proximity detection for that specific training data set, but their performance could fall when faced with new data in real-world situations.

Luckily, one of the teams participating in the challenge reported the issue to NIST when it noticed its machine learning model prioritizing data from the altitude sensors. Once Sadjadi and his colleagues figured out what happened, they enlisted the help of the MITRE Corporation to collect new data based on the same data collection protocol and released the new training data set within a few days.

The team results on the final TC4TL leaderboard reflect the machine learning models’ performances based on the new training data set. But NIST still included a second table below the leaderboard results showing how the overfitted models performed on the original training data set. Such results are presented as a normalized decision cost function (NDCF) that represents proximity detection performance when accounting for the combination of false negatives (failing to detect a nearby phone) and false positives (falsely saying a nearby phone has been detected).

If the machine learning models only performed as accurately as flipping a coin on those binary yes-or-no questions about false positives and false negatives, their NDCF values would be 1. The fact that most of the machine learning models seemed to get values significantly below 1 represents a good sign for the promise of applying machine learning to boosting digital contact tracing down the line.

However, it’s still unclear what these normalized DCF values would actually mean for a person’s infection risk in real life. For future evaluations, the NIST team may focus on figuring out the best way to weight both the false positive and false negative error measures. “The next question is ‘What’s the relative importance of false-positives and false-negatives?’” Sadjadi explains. “How can the metric be adjusted to better correlate with realistic conditions?” 

It’s also hard to tell which specific machine learning models perform the best for enhancing phone proximity detection. The four teams ended up trying out a variety of different approaches without necessarily finding the most optimal method. Still, Sadjadi seemed encouraged by the fact that even these early results suggest that machine learning can improve upon the baseline performance of Bluetooth signal detection alone.

“We hope that in the future the participants use our datasets and our metrics to drive the errors down further,” Sadjadi says. “But these results are far better than random.”

The fact that the baseline performance of Bluetooth signal detection for detecting nearby phones still seems quite weak may not bode well for many of the current digital contact tracing efforts using Bluetooth-based apps—especially given the much higher error rates for situations when one or both phones is in someone’s pocket or purse. But Sadjadi suggests that current Bluetooth-based apps could still provide some value for overwhelmed public health systems and humans doing manual contact tracing.

“It seems like we’re not there yet when you consider everyday scenarios and real-life scenarios,” Sadjadi says. “But again, even in case of not so strong performance, it can still be useful, and it can probably still be used to augment manual contact tracing, because as humans we don’t remember exactly who we were in contact with or where we were.”

Many future challenges remain before researchers can deliver enhanced Bluetooth-based proximity detection and a possible performance boost from machine learning. For example, Bluetooth-based proximity detection could likely become more accurate if phones spent more time listening for Bluetooth chirps from nearby phones, but tech companies such as Google and Apple have currently limited that listening time period in the interest of preserving phone battery life.

The NIST team is also thinking about how to collect more training data for what comes next beyond the TC4TL Challenge. Some groups such as the MIT Lincoln Laboratory have been testing the use of robots to conduct phone data collection sessions, which could improve the reliability of accurately-reported distances and other factors involved in tests. That may be useful for collecting training data. But Sadjadi believes that it would still be best to use humans in collecting the data used for the test data sets that measure machine learning models’ performances, so that the conditions match real life as closely as possible.

“This is not the first pandemic and it does not seem to be the last one,” Sadjadi says. “And given how important contact tracing is—either manual or digital contact tracing—for this kind of pandemic and health crisis, the next TC4TL challenge cycle is definitely going to be longer.”

Learn why AWS is the best cloud to run Microsoft Windows Server and SQL Server workloads

Post Syndicated from Fred Wurden original https://aws.amazon.com/blogs/compute/learn-why-aws-is-the-best-cloud-to-run-microsoft-windows-server-and-sql-server-workloads/

Fred Wurden, General Manager, AWS Enterprise Engineering (Windows, VMware, RedHat, SAP, Benchmarking)

For companies that rely on Windows Server but find it daunting to move those workloads to the cloud, there is no easier way to run Windows in the cloud than AWS. Customers as diverse as Expedia, Pearson, Seven West Media, and RepricerExpress have chosen AWS over other cloud providers to unlock the Microsoft products they currently rely on, including Windows Server and SQL Server. The reasons are several: by embracing AWS, they’ve achieved cost savings through forthright pricing options and expanded breadth and depth of capabilities. In this blog, we break down these advantages to understand why AWS is the simplest, most popular and secure cloud to run your business-critical Windows Server and SQL Server workloads.

AWS lowers costs and increases choice with flexible pricing options

Customers expect accurate and transparent pricing so you can make the best decisions for your business. When assessing which cloud to run your Windows workloads, customers look at the total cost of ownership (TCO) of workloads.

Not only does AWS provide cost-effective ways to run Windows and SQL Server workloads, we also regularly lower prices to make it even more affordable. Since launching in 2006, AWS has reduced prices 85 times. In fact, we recently dropped pricing by and average of 25% for Amazon RDS for SQL Server Enterprise Edition database instances in the Multi-AZ configuration, for both On-Demand Instance and Reserved Instance types on the latest generation hardware.

The AWS pricing approach makes it simple to understand your costs, even as we actively help you pay AWS less now and in the future. For example, AWS Trusted Advisor provides real-time guidance to provision your resources more efficiently. This means that you spend less money with us. We do this because we know that if we aren’t creating more and more value for you each year, you’ll go elsewhere.

In addition, we have several other industry-leading initiatives to help lower customer costs, including AWS Compute Optimizer, Amazon CodeGuru, and AWS Windows Optimization and Licensing Assessments (AWS OLA). AWS Compute Optimizer recommends optimal AWS Compute resources for your workloads by using machine learning (ML) to analyze historical utilization metrics. Customers who use Compute Optimizer can save up to 25% on applications running on Amazon Elastic Compute Cloud (Amazon EC2). Machine learning also plays a key role in Amazon CodeGuru, which provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code. Finally, AWS OLA helps customers to optimize licensing and infrastructure provisioning based on actual resource consumption (ARC) to offer cost-effective Windows deployment options.

Cloud pricing shouldn’t be complicated

Other cloud providers bury key pricing information when making comparisons to other vendors, thereby incorrectly touting pricing advantages. Often those online “pricing calculators” that purport to clarify pricing neglect to include hidden fees, complicating costs through licensing rules (e.g., you can run this workload “for free” if you pay us elsewhere for “Software Assurance”). At AWS, we believe such pricing and licensing tricks are contrary to the fundamental promise of transparent pricing for cloud computing.

By contrast, AWS makes it straightforward for you to run Windows Server applications where you want. With our End-of-Support Migration Program (EMP) for Windows Server, you can easily move your legacy Windows Server applications—without needing any code changes. The EMP technology decouples the applications from the underlying OS. This enables AWS Partners or AWS Professional Services to migrate critical applications from legacy Windows Server 2003, 2008, and 2008 R2 to newer, supported versions of Windows Server on AWS. This allows you to avoid extra charges for extended support that other cloud providers charge.

Other cloud providers also may limit your ability to Bring-Your-Own-License (BYOL) for SQL Server to your preferred cloud provider. Meanwhile, AWS improves the BYOL experience using EC2 Dedicated Hosts and AWS License Manager. With EC2 Dedicated Hosts, you can save costs by moving existing Windows Server and SQL Server licenses do not have Software Assurance to AWS. AWS License Manager simplifies how you manage your software licenses from software vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. We also work hard to help our customers spend less.

How AWS helps customers save money on Windows Server and SQL Server workloads

The first way AWS helps customers save money is by delivering the most reliable global cloud infrastructure for your Windows workloads. Any downtime costs customers in terms of lost revenue, diminished customer goodwill, and reduced employee productivity.

With respect to pricing, AWS offers multiple pricing options to help our customers save. First, we offer AWS Savings Plans that provide you with a flexible pricing model to save up to 72 percent on your AWS compute usage. You can sign up for Savings Plans for a 1- or 3-year term. Our Savings Plans help you easily manage your plans by taking advantage of recommendations, performance reporting and budget alerts in AWS Cost Explorer, which is a unique benefit only AWS provides. Not only that, but we also offer Amazon EC2 Spot Instances that help you save up to 90 percent on your compute costs vs. On-Demand Instance pricing.

Customers don’t need to walk this migration path alone. In fact, AWS customers often make the most efficient use of cloud resources by working with assessment partners like Cloudamize, CloudChomp, or Migration Evaluator (formerly TSO Logic), which is now part of AWS. By running detailed assessments of their environments with Migration Evaluator before migration, customers can achieve an average of 36 percent savings using AWS over three years. So how do you get from an on-premises Windows deployment to the cloud? AWS makes it simple.

AWS has support programs and tools to help you migrate to the cloud

Though AWS Migration Acceleration Program (MAP) for Windows is a great way to reduce the cost of migrating Windows Server and SQL Server workloads, MAP is more than a cost savings tool. As part of MAP, AWS offers a number of resources to support and sustain your migration efforts. This includes an experienced APN Partner ecosystem to execute migrations, our AWS Professional Services team to provide best practices and prescriptive advice, and a training program to help IT professionals understand and carry out migrations successfully. We help you figure out which workloads to move first, then leverage the combined experience of our Professional Services and partner teams to guide you through the process. For customers who want to save even more (up to 72% in some cases) we are the leaders in helping customers transform legacy systems to modernized managed services.

Again, we are always available to help guide you in your Windows journey to the cloud. We guide you through our technologies like AWS Launch Wizard, which provides a guided way of sizing, configuring, and deploying AWS resources for Microsoft applications like Microsoft SQL Server Always On, or through our comprehensive ecosystem of tens of thousands of partners and third-party solutions, including many with deep expertise with Windows technologies.

Why run Windows Server and SQL Server anywhere else but AWS?

Not only does AWS offer significantly more services than any other cloud, with over 48 services without comparable equivalents on other clouds, but AWS also provides better ways to use Microsoft products than any other cloud. This includes Active Directory as a managed service and FSx for Windows File Server, the only fully managed file storage service for Windows. If you’re interested in learning more about how AWS improves the Windows experience, please visit this article on our Modernizing with AWS blog.

Bring your Windows Server and SQL Server workloads to AWS for the most secure, reliable, and performant cloud, providing you with the depth and breadth of capabilities at the lowest cost. To learn more, visit Windows on AWS. Contact us today to learn more on how we can help you move your Windows to AWS or innovate on open source solutions.

About the Author
Fred Wurden is the GM of Enterprise Engineering (Windows, VMware, Red Hat, SAP, benchmarking) working to make AWS the most customer-centric cloud platform on Earth. Prior to AWS, Fred worked at Microsoft for 17 years and held positions, including: EU/DOJ engineering compliance for Windows and Azure, interoperability principles and partner engagements, and open source engineering. He lives with his wife and a few four-legged friends since his kids are all in college now.

AWS announces AWS Contact Center Intelligence solutions

Post Syndicated from Alejandra Quetzalli original https://aws.amazon.com/blogs/aws/aws-announces-aws-contact-center-intelligence-solutions/

What was announced?

We’re announcing the availability of AWS Contact Center Intelligence (CCI) solutions, a combination of services that empowers customers to easily integrate AI into contact centers, made available through AWS Partner Network (APN) partners.

AWS CCI has solutions for self-service, live-call analytics & agent assist, and post-call analytics, making it possible for customers to quickly deploy AI into their existing workflows or build completely new ones.

Pricing and regional availability correspond to the underlying services (Amazon Comprehend, Amazon Kendra, Amazon Lex, Amazon Transcribe, Amazon Translate, and Amazon Polly) used.

What is AWS Contact Center Intelligence?

We mentioned that AWS CCI brings solutions to contact centers powered by AI for before, during, and after customer interactions.

My colleague Swami Sivasubramanian (VP, Amazon Machine Learning, AWS) said: “We want to make it easy for our customers with contact centers to benefit from machine learning capabilities even if they have no machine learning expertise. By partnering with APN technology and consulting partners to bring AWS Contact Center Intelligence solutions to market, we are making it easier for customers to realize the benefits of cloud-based machine learning services while removing the heavy lifting and the need to hire specialized developers to integrate the ML capabilities in to their existing contact centers.

But what does that mean? 🤔

AWS CCI solutions lets you leverage machine learning (ML) functionality such as text-to-speech, translation, enterprise search, chatbots, business intelligence, and language comprehension into current contact center environments. Customers can now implement contact center intelligence ML solutions to aid self-service, live-call analytics & agent assist, and post-call analytics. Currently, AWS CCI solutions are available through partners such as Genesys, Vonage, and UiPath for easy integration into existing enterprise contact center systems.

“We’re proud Genesys customers will be among the first to benefit from the off-the-shelf machine learning capabilities of AWS Contact Center Intelligence solutions. It’s now simpler and more cost-effective for organizations to combine AWS’s AI capabilities, including search, text-to-speech and natural language understanding, with the advanced contact center capabilities of Genesys Cloud to give customers outstanding self-service experiences.” ~ Olivier Jouve (Executive Vice President and General Manager of Genesys Cloud)

“More and more consumers are relying on automated methods to interact with brands, especially in today’s retail environment where online shopping is taking a front seat. The Genesys Cloud and Amazon Web Services (AWS) integration will make it easier to leverage conversational AI so we can provide more effective self-service experiences for our customers.” ~ Aarde Cosseboom (Senior Director of Global Member Services Technology, Analytics and Product at TechStyle Fashion Group)

 

How it works and who it’s for…

AWS Contact Center Intelligence solutions offer a variety of ways that organizations can quickly and cost-effectively add machine learning-based intelligence to their contact centers, via AWS pre-trained AI Services. AWS CCI is currently available through participating APN partners, and it is focused on three stages of the contact center workflow: Self-Service, Live Call Analytics and Agent Assist, and Post-Call Analytics. Let’s break each one of these up.

The Self-Service solution helps with creation of chatbots and ML-driven IVRs (Interactive voice response) to address the most common queries a contact center workforce often gets. This now allows actual call center employees to focus on higher value work. To implement this solution, you’ll want to work with either Amazon Lex and/or Amazon Kendra. The novelty of this solution is that Lex + Kendra not only fulfills transactional queries (i.e. book a hotel room or reset my password), but also addresses the long tail of customers questions whose answers live in enterprises knowledge systems. Before, these Q&A had to be hard coded in Lex, making it harder to implement and maintain. Today, you can implement this solution directly from your existing contact center platform with AWS CCI partners, such as Genesys.

The Live Call Analytics & Agent Assist solution enables the creation of real-time ML capabilities to increase staff productivity and engagement. Here, Amazon Transcribe is used to perform real-time speech transcription, while Amazon Comprehend can analyze interactions, detect the sentiment of the caller, and identify key words and phrases in the conversation. Amazon Translate can even be added to translate the conversation into a preferred language! Now, you can implement this solution directly from several leading contact center platforms with AWS CCI partners, like SuccessKPI.

The Post-Call Analytics solution is an automatic analysis of contact center conversations, which tend to leave actionable data for product and service feedback loops. Similar to live call analytics, this solution combines Amazon Transcribe to perform speech recognition and creates a high-quality text transcription of each call, with Amazon Comprehend to analyze the interaction. Amazon Translate can be added to translate the conversation into your preferred language, and Amazon Kendra can be used for contextual natural language queries. Today, you can implement this solution directly from several leading contact center platforms with AWS CCI partners, such as Acqueon.

AWS helps partners integrate these solutions into their products. Some solutions also have a Quick Start, which includes CloudFormation templates and deployment guide, to automate the deployments. The good news is that our AWS Partners landing pages will also provide additional implementation information specific to their products. 👌

Let’s see a demo…

For today’s post, we chose to focus on diving deeper into the Self-Service and Post-Call Analytics solutions, so let’s begin with Self-Service.

Self-Service
We have a public GitHub repository that has a complete Quick Start template plus a detailed deployment guide with architecture diagrams. (And the good news is that our APN partner landing pages will also reference this repo!)

This GitHub repo talks about the Amazon Lex chatbot integration with Amazon Kendra. The main idea here is that the customer can bring their own document repository through Amazon Kendra, which can be sourced through Amazon Lex when customers are interacting with this Lex chatbot.

The main thing we want to notice in this architecture is that customers can bring their existing documents and allow their chatbot to search that document whenever someone interacts with said chatbot. The architecture below assumes our docs are in an S3 bucket, but it’s worth noting that Amazon Kendra can integrate with multiple kinds of data sources. If using an S3 bucket, customers must provide their own S3 bucket name, the one that has their document repository. This is a prerequisite for deployment.

Let’s follow the instructions under the repo’s Deployment Steps, skipping ahead to Step #2, “Click Deploy to launch the CloudFormation template.”

Since this is a Quick Start template, you can see how everything is already filled out for us. We click Next and move on to Step 2, Specify stack details.

Notice how the S3 bucket section is blank. You can provide your own S3 bucket name if you want to test this out with your own docs. For today, I am going to use the S3 bucket name that was provided to us in the GitHub doc.

The next part to configure will be the Cross account role configuration section. For my demo, I will add my own AWS account ID under “Assuming Account ID.”

We click Next and move on to Step 3, Configure Stack options.

Nothing to configure here, so we can click Next again and move on to Step 4, Review. We click to accept these final acknowledgements and click Create Stack.

If we were to navigate over to our deployed AWS CloudFormation stacks, we can go to Outputs of this stack and see our Kendra index name and Lex bot name.

Now if we head over to Amazon Lex, we should be able to easily find our chatbot.

We click into it and we can see that our chatbot is ready. At this point, we can start interacting with it!

We can something like “Hi” for example.

Eventually we would also get a response that details the reply source. What this means is that it will tell you if this came from Amazon Lex or from Amazon Kendra and the documents we saved in our S3 bucket.

 

Live Call Analytics & Agent Assist
We have two public GitHub repositories for this solution too, and both have detailed deployment guide with architecture diagrams as well.

This GitHub repo provides us a code example and a fully functional AWS Lambda function to get you started with capturing and transcribing Amazon Chime Voice Connector phone calls using Amazon Kinesis Video Streams and Amazon Transcribe. This solution gives us the ability to see how to use AI and ML services to talk to the customer’s existent environment, to drive agent assistance or analytics. We can take a real-time voice feed, transcribe that information, and then use Amazon Comprehend to pull that information out to provide the key action and sentiment.

We now also provide the Chime SIP req connector (a chime component that allows you to connect voice over an IP compatible environment with Amazon voice services) to stream voice in Amazon Transcribe from virtually any contact center. Our partner Vonage can do the same through websocket.

👉🏽 Check out the GitHub developer docs:

And as we mentioned above, for today’s post, we chose to focus on diving deeper into the Self-Service and Post-Call Analytics solutions. So let’s move on to show an example for Post-Call Analytics.

 

Post-Call Analytics

We have a public GitHub repository for this solution too, with another complete Quick Start template and detailed deployment guide with architecture diagrams. This solution is used after the call has ended, so that our customers can review the analytics of those calls.

This GitHub repo talks about how to look for insights and information about calls that have already happened. We call this, Quality Management. We can use Amazon Transcribe and Amazon Comprehend to pull out key words, information, and data, in order to know how to better drive what is happening in our contact center calls. We can then review these insights on Amazon QuickSight.

Let’s look at the architecture diagram for this solution too. Our call recording gets stored in an S3 bucket, which is then picked up by a Lambda function which does a transcription using Amazon Transcribe. It puts the result in a different bucket and then that call’s metadata gets stored in DynamoDB. Now Amazon Comprehend can conduct text analysis on the call’s metadata, and stores the result in a Text analysis Output bucket. Eventually, QuickSight is used to provide dashboards showing the resulting call analytics.

Just like in the previous example, we move down to the Deployment steps section. Just like before, we have a pre-made CloudFormation template that is ready to be deployed.

Step 1, Specify template is good to go, so we click Next.

In Step 2, Specify stack details, something important to note is that the User Pool Domain Name must be globally unique.

We click Next and move on to Step 3, Configure Stack options. Nothing additional to configure here either, so we can click Next again and move on to Step 4, Review.

We click to accept these final acknowledgements and click Create Stack.

And if we were to navigate over to our deployed AWS CloudFormation stacks again, we can go to Outputs of this stack and see the PortalEndpoint key. After the stack creation has completed successfully, and portal website is available at CloudFront distribution endpoint. This key is what will allow us to find the portal URL.

We will need to have user created in Amazon Cognito for the next steps to work. (If you have never created one, visit this how-to guide.)

⚠ NOTE: Make sure to open the portal URL endpoint in a different Incognito Window as the portal attaches a QuickSight User Role that can interfere with your actual role.

We go to the portal URL and login with our created Cognito user. We’re prompted to change the temporary password and are eventually directed to the QuickSight homepage.

Now we want to upload the audio files of our calls and we can do so with the Upload button.

After successfully uploading our audio files, the audio processing will run through transcription and text analysis. At this point we can click on the Call Analytics logo in the top left of the Navigation Bar to return to home page.

Now we can drill down into a call to see Amazon Comprehend’s result of the call classifications and turn-by-turn sentiments.

 

🌎 Lastly…

Regional availability for AWS Contact Center Intelligence (CCI) solutions correspond to the underlying services (Amazon Comprehend, Amazon Kendra, Amazon Lex, Amazon Transcribe, Amazon Translate) used.

We are announcing AWS CCI availability with 12 APN partners: Genesys, UiPath, Vonage, Acqueon, SuccessKPI, and Inference Solutions (Technology partners), and Slalom, Onica/Rackspace, TensorIoT, Quantiphi, Accenture, and HGS Digital (Consulting partners).

Ready to get started? Contact one of the AWS CCI launch partners listed on the AWS CCI web page.

 

You may also want to see…

👉🏽AWS Quick Start links from post:

 

¡Gracias por tu tiempo!
~Alejandra 💁🏻‍♀️🤖 y Canela 🐾

AWS Architecture Monthly Magazine: Agriculture

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-agriculture/

Architecture Monthly Magazine cover - AgricultureIn this month’s issue of AWS Architecture Monthly, Worldwide Tech Lead for Agriculture, Karen Hildebrand (who’s also a fourth generation farmer) refers to agriculture as “the connective tissue our world needs to survive.” As our expert for August’s Agriculture issue, she also talks about what role cloud will play in future development efforts in this industry and why developing personal connections with our AWS agriculture customers is one of the most important aspects of our jobs.

You’ll also buzz through the world of high tech beehives, milk the information about data analytics-savvy cows, and see what the reference architecture of a Smart Farm looks like.

In August’s issue Agriculture issue

  • Ask an Expert: Karen Hildebrand, AWS WW Agriculture Tech Leader
  • Customer Success Story: Tine & Crayon: Revolutionizing the Norwegian Dairy Industry Using Machine Learning on AWS
  • Blog Post: Beewise Combines IoT and AI to Offer an Automated Beehive
  • Reference Architecture:Smart Farm: Enabling Sensor, Computer Vision, and Edge Inference in Agriculture
  • Customer Success Story: Farmobile: Empowering the Agriculture Industry Through Data
  • Blog Post: The Cow Collar Wearable: How Halter benefits from FreeRTOS
  • Related Videos: DuPont, mPrest & Netafirm, and Veolia

Survey opportunity

This month, we’re also asking you to take a 10-question survey about your experiences with this magazine. The survey is hosted by an external company (Qualtrics), so the below survey button doesn’t lead to our website. Please note that AWS will own the data gathered from this survey, and we will not share the results we collect with survey respondents. Your responses to this survey will be subject to Amazon’s Privacy Notice. Please take a few moments to give us your opinions.

How to access the magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Amazon ECS Now Supports EC2 Inf1 Instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-ecs-now-supports-ec2-inf1-instances/

As machine learning and deep learning models become more sophisticated, hardware acceleration is increasingly required to deliver fast predictions at high throughput. Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon ECS, for high performance and the lowest prediction cost in the cloud. For a few weeks now, these instances have also been available on Amazon Elastic Kubernetes Service.

A primer on EC2 Inf1 instances
Inf1 instances were launched at AWS re:Invent 2019. They are powered by AWS Inferentia, a custom chip built from the ground up by AWS to accelerate machine learning inference workloads.

Inf1 instances are available in multiple sizes, with 1, 4, or 16 AWS Inferentia chips, with up to 100 Gbps network bandwidth and up to 19 Gbps EBS bandwidth. An AWS Inferentia chip contains four NeuronCores. Each one implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, saving I/O time in the process. When several AWS Inferentia chips are available on an Inf1 instance, you can partition a model across them and store it entirely in cache memory. Alternatively, to serve multi-model predictions from a single Inf1 instance, you can partition the NeuronCores of an AWS Inferentia chip across several models.

Compiling Models for EC2 Inf1 Instances
To run machine learning models on Inf1 instances, you need to compile them to a hardware-optimized representation using the AWS Neuron SDK. All tools are readily available on the AWS Deep Learning AMI, and you can also install them on your own instances. You’ll find instructions in the Deep Learning AMI documentation, as well as tutorials for TensorFlow, PyTorch, and Apache MXNet in the AWS Neuron SDK repository.

In the demo below, I will show you how to deploy a Neuron-optimized model on an ECS cluster of Inf1 instances, and how to serve predictions with TensorFlow Serving. The model in question is BERT, a state of the art model for natural language processing tasks. This is a huge model with hundreds of millions of parameters, making it a great candidate for hardware acceleration.

Creating an Amazon ECS Cluster
Creating a cluster is the simplest thing: all it takes is a call to the CreateCluster API.

$ aws ecs create-cluster --cluster-name ecs-inf1-demo

Immediately, I see the new cluster in the console.

New cluster

Several prerequisites are required before we can add instances to this cluster:

  • An AWS Identity and Access Management (IAM) role for ECS instances: if you don’t have one already, you can find instructions in the documentation. Here, my role is named ecsInstanceRole.
  • An Amazon Machine Image (AMI) containing the ECS agent and supporting Inf1 instances. You could build your own, or use the ECS-optimized AMI for Inferentia. In the us-east-1 region, its id is ami-04450f16e0cd20356.
  • A Security Group, opening network ports for TensorFlow Serving (8500 for gRPC, 8501 for HTTP). The identifier for mine is sg-0994f5c7ebbb48270.
  • If you’d like to have ssh access, your Security Group should also open port 22, and you should pass the name of an SSH key pair. Mine is called admin.

We also need to create a small user data file in order to let instances join our cluster. This is achieved by storing the name of the cluster in an environment variable, itself written to the configuration file of the ECS agent.

#!/bin/bash
echo ECS_CLUSTER=ecs-inf1-demo >> /etc/ecs/ecs.config

We’re all set. Let’s add a couple of Inf1 instances with the RunInstances API. To minimize cost, we’ll request Spot Instances.

$ aws ec2 run-instances \
--image-id ami-04450f16e0cd20356 \
--count 2 \
--instance-type inf1.xlarge \
--instance-market-options '{"MarketType":"spot"}' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ecs-inf1-demo}]' \
--key-name admin \
--security-group-ids sg-0994f5c7ebbb48270 \
--iam-instance-profile Name=ecsInstanceRole \
--user-data file://user-data.txt

Both instances appear right away in the EC2 console.

Inf1 instances

A couple of minutes later, they’re ready to run tasks on the cluster.

Inf1 instances

Our infrastructure is ready. Now, let’s build a container storing our BERT model.

Building a Container for Inf1 Instances
The Dockerfile is pretty straightforward:

  • Starting from an Amazon Linux 2 image, we open ports 8500 and 8501 for TensorFlow Serving.
  • Then, we add the Neuron SDK repository to the list of repositories, and we install a version of TensorFlow Serving that supports AWS Inferentia.
  • Finally, we copy our BERT model inside the container, and we load it at startup.

Here is the complete file.

FROM amazonlinux:2
EXPOSE 8500 8501
RUN echo $'[neuron] \n\
name=Neuron YUM Repository \n\
baseurl=https://yum.repos.neuron.amazonaws.com \n\
enabled=1' > /etc/yum.repos.d/neuron.repo
RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
RUN yum install -y tensorflow-model-server-neuron
COPY bert /bert
CMD ["/bin/sh", "-c", "/usr/local/bin/tensorflow_model_server_neuron --port=8500 --rest_api_port=8501 --model_name=bert --model_base_path=/bert/"]

Then, I build and push the container to a repository hosted in Amazon Elastic Container Registry. Business as usual.

$ docker build -t neuron-tensorflow-inference .

$ aws ecr create-repository --repository-name ecs-inf1-demo

$ aws ecr get-login-password | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

$ docker tag neuron-tensorflow-inference 123456789012.dkr.ecr.us-east-1.amazonaws.com/ecs-inf1-demo:latest

$ docker push

Now, we need to create a task definition in order to run this container on our cluster.

Creating a Task Definition for Inf1 Instances
If you don’t have one already, you should first create an execution role, i.e. a role allowing the ECS agent to perform API calls on your behalf. You can find more information in the documentation. Mine is called ecsTaskExecutionRole.

The full task definition is visible below. As you can see, it holds two containers:

  • The BERT container that I built,
  • A sidecar container called neuron-rtd, that allows the BERT container to access NeuronCores present on the Inf1 instance. The AWS_NEURON_VISIBLE_DEVICES environment variable lets you control which ones may be used by the container. You could use it to pin a container on one or several specific NeuronCores.
{
  "family": "ecs-neuron",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "entryPoint": [
        "sh",
        "-c"
      ],
      "portMappings": [
        {
          "hostPort": 8500,
          "protocol": "tcp",
          "containerPort": 8500
        },
        {
          "hostPort": 8501,
          "protocol": "tcp",
          "containerPort": 8501
        },
        {
          "hostPort": 0,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "command": [
        "tensorflow_model_server_neuron --port=8500 --rest_api_port=8501 --model_name=bert --model_base_path=/bert"
      ],
      "cpu": 0,
      "environment": [
        {
          "name": "NEURON_RTD_ADDRESS",
          "value": "unix:/sock/neuron-rtd.sock"
        }
      ],
      "mountPoints": [
        {
          "containerPath": "/sock",
          "sourceVolume": "sock"
        }
      ],
      "memoryReservation": 1000,
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/ecs-inf1-demo:latest",
      "essential": true,
      "name": "bert"
    },
    {
      "entryPoint": [
        "sh",
        "-c"
      ],
      "portMappings": [],
      "command": [
        "neuron-rtd -g unix:/sock/neuron-rtd.sock"
      ],
      "cpu": 0,
      "environment": [
        {
          "name": "AWS_NEURON_VISIBLE_DEVICES",
          "value": "ALL"
        }
      ],
      "mountPoints": [
        {
          "containerPath": "/sock",
          "sourceVolume": "sock"
        }
      ],
      "memoryReservation": 1000,
      "image": "790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:latest",
      "essential": true,
      "linuxParameters": { "capabilities": { "add": ["SYS_ADMIN", "IPC_LOCK"] } },
      "name": "neuron-rtd"
    }
  ],
  "volumes": [
    {
      "name": "sock",
      "host": {
        "sourcePath": "/tmp/sock"
      }
    }
  ]
}

Finally, I call the RegisterTaskDefinition API to let the ECS backend know about it.

$ aws ecs register-task-definition --cli-input-json file://inf1-task-definition.json

We’re now ready to run our container, and predict with it.

Running a Container on Inf1 Instances
As this is a prediction service, I want to make sure that it’s always available on the cluster. Instead of simply running a task, I create an ECS Service that will make sure the required number of container copies is running, relaunching them should any failure happen.

$ aws ecs create-service --cluster ecs-inf1-demo \
--service-name bert-inf1 \
--task-definition ecs-neuron:1 \
--desired-count 1

A minute later, I see that both task containers are running on the cluster.

Running containers

Predicting with BERT on ECS and Inf1
The inner workings of BERT are beyond the scope of this post. This particular model expects a sequence of 128 tokens, encoding the words of two sentences we’d like to compare for semantic equivalence.

Here, I’m only interested in measuring prediction latency, so dummy data is fine. I build 100 prediction requests storing a sequence of 128 zeros. Using the IP address of the BERT container, I send them to the TensorFlow Serving endpoint via grpc, and I compute the average prediction time.

Here is the full code.

import numpy as np
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import time

if __name__ == '__main__':
    channel = grpc.insecure_channel('18.234.61.31:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'bert'
    i = np.zeros([1, 128], dtype=np.int32)
    request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))

    latencies = []
    for i in range(100):
        start = time.time()
        result = stub.Predict(request)
        latencies.append(time.time() - start)
        print("Inference successful: {}".format(i))
    print ("Ran {} inferences successfully. Latency average = {}".format(len(latencies), np.average(latencies)))

For convenience, I’m running this code on an EC2 instance based on the Deep Learning AMI. It comes pre-installed with a Conda environment for TensorFlow and TensorFlow Serving, saving me from installing any dependencies.

$ source activate tensorflow_p36
$ python predict.py

On average, prediction took 56.5ms. As far as BERT goes, this is pretty good!

Ran 100 inferences successfully. Latency average = 0.05647835493087769

Getting Started
You can now deploy Amazon Elastic Compute Cloud (EC2) Inf1 instances on Amazon ECS today in the US East (N. Virginia) and US West (Oregon) regions. As Inf1 deployment progresses, you’ll be able to use them with Amazon ECS in more regions.

Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for Amazon ECS, or on the container roadmap on Github.

– Julien

New AI Dupes Humans into Believing Synthesized Sound Effects Are Real

Post Syndicated from Michelle Hampson original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/new-ai-dupes-humans-into-believing-synthesized-sound-effects-are-real

Journal Watch report logo, link to report landing page

Imagine you are watching a scary movie: the heroine creeps through a dark basement, on high alert. Suspenseful music plays in the background, while some unseen, sinister creature creeps in the shadows… and then–BANG! It knocks over an object.

Such scenes would hardly be as captivating and scary without the intense, but perfectly timed sound effects, like the loud bang that sent our main character wheeling around in fear. Usually these sound effects are recorded by Foley artists in the studio, who produce the sounds using oodles of objects at their disposal. Recording the sound of glass breaking may involve actually breaking glass repeatedly, for example, until the sound closely matches the video clip.

In a more recent plot twist, researchers have created an automated program that analyzes the movement in video frames and creates its own artificial sound effects to match the scene. In a survey, the majority of people polled indicated that they believed the fake sound effects were real. The model, AutoFoley, is described in a study published June 25 in IEEE Transactions on Multimedia.

“Adding sound effects in post-production using the art of Foley has been an intricate part of movie and television soundtracks since the 1930s,” explains Jeff Prevost, a professor at the University of Texas at San Antonio who co-created AutoFoley. “Movies would seem hollow and distant without the controlled layer of a realistic Foley soundtrack. However, the process of Foley sound synthesis therefore adds significant time and cost to the creation of a motion picture.”

Intrigued by the thought of an automated Foley system, Prevost and his PhD student, Sanchita Ghose, set about creating a multi-layered machine learning program. They created two different models that could be used in the first step, which involves identifying the actions in a video and determining the appropriate sound.

The first machine learning model extracts image features (e.g., color and motion) from the frames of fast-moving action clips to determine an appropriate sound effect.

The second model analyzes the temporal relationship of an object in separate frames. By using relational reasoning to compare different frames across time, the second model can anticipate what action is taking place in the video.

In a final step, sound is synthesized to match the activity or motion predicted by one of the models. Prevost and Ghose used AutoFoley to create sound for 1,000 short movie clips capturing a number of common actions, like falling raining, a galloping horse, and a ticking clock.

Analysis shows–unsurprisingly–that AutoFoley is best at producing sounds where the timing doesn’t need to align perfectly with the video (e.g., falling rain, a crackling fire). But the program is more likely to be out of sync with the video when visual scenes contain random actions with variation in time (e.g., typing, thunderstorms).

Next, Prevost and Ghose surveyed 57 local college students on which movie clips they thought included original soundtracks. In assessing soundtracks generated by the first model, 73% of students surveyed chose the synthesized AutoFoley clip as the original piece, over the true original sound clip. In assessing the second model, 66% of respondents chose the AutoFoley clip over the original sound clip.

“One limitation in our approach is the requirement that the subject of classification is present in the entire video frame sequence,” says Prevost, also noting that AutoFoley currently relies on a dataset with limited Foley categories. While a patent for AutoFoley is still in the early stages, Prevost says these limitations will be addressed in future research.

TensorFlow Serving on Kubernetes with Amazon EC2 Spot Instances

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/tensorflow-serving-on-kubernetes-spot-instances/

This post is contributed by Kinnar Sen – Sr. Specialist Solutions Architect, EC2 Spot

TensorFlow (TF) is a popular choice for machine learning research and application development. It’s a machine learning (ML) platform, which is used to build (train) and deploy (serve) machine learning models. TF Serving is a part of TF framework and is used for deploying ML models in production environments. TF Serving can be containerized using Docker and deployed in a cluster with Kubernetes. It is easy to run production grade workloads on Kubernetes using Amazon Elastic Kubernetes Service (Amazon EKS), a managed service for creating and managing Kubernetes clusters. To cost optimize the TF serving workloads, you can use Amazon EC2 Spot Instances. Spot Instances are spare EC2 capacity available at up to a 90% discount compared to On-Demand Instance prices.

In this post I will illustrate deployment of TensorFlow Serving using Kubernetes via Amazon EKS and Spot Instances to build a scalable, resilient, and cost optimized machine learning inference service.

Overview

About TensorFlow Serving (TF Serving)

TensorFlow Serving is the recommended way to serve TensorFlow models. A flexible and a high-performance system for serving models TF Serving enables users to quickly deploy models to production environments. It provides out-of-box integration with TF models and can be extended to serve other kinds of models and data. TF Serving deploys a model server with gRPC/REST endpoints and can be used to serve multiple models (or versions). There are two ways that the requests can be served, batching individual requests or one-by-one. Batching is often used to unlock the high throughput of hardware accelerators (if used for inference) for offline high volume inference jobs.

Amazon EC2 Spot Instances

Spot Instances are spare Amazon EC2 capacity that enables customers to save up to 90% over On-Demand Instance prices. The price of Spot Instances is determined by long-term trends in supply and demand of spare capacity pools. Capacity pools can be defined as a group of EC2 instances belonging to particular instance family, size, and Availability Zone (AZ). If EC2 needs capacity back for On-Demand usage, Spot Instances can be interrupted by EC2 with a two-minute notification. There are many graceful ways to handle the interruption to ensure that the application is well architected for resilience and fault tolerance. This can be automated via the application and/or infrastructure deployments. Spot Instances are ideal for stateless, fault tolerant, loosely coupled and flexible workloads that can handle interruptions.

TensorFlow Serving (TF Serving) and Kubernetes

Each pod in a Kubernetes cluster runs a TF Docker image with TF Serving-based server and a model. The model contains the architecture of TensorFlow Graph, model weights and assets. This is a deployment setup with configurable number of replicas. The replicas are exposed externally by a service and an External Load Balancer that helps distribute the requests to the service endpoints. To keep up with the demands of service, Kubernetes can help scale the number of replicated pods using Kubernetes Replication Controller.

Architecture

There are a couple of goals that we want to achieve through this solution.

  • Cost optimization – By using EC2 Spot Instances
  • High throughput – By using Application Load Balancer (ALB) created by Ingress Controller
  • Resilience – Ensuring high availability by replenishing nodes and gracefully handling the Spot interruptions
  • Elasticity – By using Horizontal Pod Autoscaler, Cluster Autoscaler, and EC2 Auto Scaling

This can be achieved by using the following components.

ComponentRoleDetailsDeployment Method
Cluster AutoscalerScales EC2 instances automatically according to pods running in the clusterOpen sourceA deployment on On-Demand Instances
EC2 Auto Scaling groupProvisions and maintains EC2 instance capacityAWSAWS CloudFormation via eksctl
AWS Node Termination HandlerDetects EC2 Spot interruptions and automatically drains nodesOpen sourceA DaemonSet on Spot and On-Demand Instances
AWS ALB Ingress ControllerProvisions and maintains Application Load BalancerOpen sourceA deployment on On-Demand Instances

You can find more details about each component in this AWS blog. Let’s go through the steps that allow the deployment to be elastic.

  1. HTTP requests flows in through the ALB and Ingress object.
  2. Horizontal Pod Autoscaler (HPA) monitors the metrics (CPU / RAM) and once the threshold is breached a Replica (pod) is launched.
  3. If there are sufficient cluster resources, the pod starts running, else it goes into pending state.
  4. If one or more pods are in pending state, the Cluster Autoscaler (CA) triggers a scale up request to Auto Scaling group.
    1. If HPA tries to schedule pods more than the current size of what the cluster can support, CA can add capacity to support that.
  5. Auto Scaling group provision a new node and the application scales up
  6. A scale down happens in the reverse fashion when requests start tapering down.

AWS ALB Ingress controller and ALB

We will be using an ALB along with an Ingress resource instead of the default External Load Balancer created by the TF Serving deployment. The open source AWS ALB Ingress controller triggers the creation of an ALB and the necessary supporting AWS resources whenever a Kubernetes user declares an Ingress resource in the cluster. The Ingress resource uses the ALB to route HTTP(S) traffic to different endpoints within the cluster. ALB is ideal for advanced load balancing of HTTP and HTTPS traffic. ALB provides advanced request routing targeted at delivery of modern application architectures, including microservices and container-based applications. This allows the deployment to maintain a high throughput and improve load balancing.

Spot Instance interruptions

To gracefully handle interruptions, we will use the AWS node termination handler. This handler runs a DaemonSet (one pod per node) on each host to perform monitoring and react accordingly. When it receives the Spot Instance 2-minute interruption notification, it uses the Kubernetes API to cordon the node. This is done by tainting it to ensure that no new pods are scheduled there, then it drains it, removing any existing pods from the ALB.

One of the best practices for using Spot is diversification where instances are chosen from across different instance types, sizes, and Availability Zone. The capacity-optimized allocation strategy for EC2 Auto Scaling provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics, thus lowering the chance of interruptions.

Tutorial

Set up the cluster

We are using eksctl to create an Amazon EKS cluster with the name k8-tf-serving in combination with a managed node group. The managed node group has two On-Demand t3.medium nodes and it will bootstrap with the labels lifecycle=OnDemand and intent=control-apps. Be sure to replace <YOUR REGION> with the Region you are launching your cluster into.

eksctl create cluster --name=TensorFlowServingCluster --node-private-networking --managed --nodes=3 --alb-ingress-access --region=<YOUR REGION> --node-type t3.medium --node-labels="lifecycle=OnDemand,intent=control-apps" --asg-access

Check the nodes provisioned by using kubectl get nodes.

Create the NodeGroups now. You create the eksctl configuration file first. Copy the nodegroup configuration below and create a file named spot_nodegroups.yml. Then run the command using eksctl below to add the new Spot nodes to the cluster.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
    name: TensorFlowServingCluster
    region: <YOUR REGION>
nodeGroups:
    - name: prod-4vcpu-16gb-spot
      minSize: 0
      maxSize: 15
      desiredCapacity: 10
      instancesDistribution:
        instanceTypes: ["m5.xlarge", "m5d.xlarge", "m4.xlarge","t3.xlarge","t3a.xlarge","m5a.xlarge","t2.xlarge"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: capacity-optimized
      labels:
        lifecycle: Ec2Spot
        intent: apps
        aws.amazon.com/spot: "true"
      tags:
        k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
        k8s.io/cluster-autoscaler/node-template/label/intent: apps
      iam:
        withAddonPolicies:
          autoScaler: true
          albIngress: true
    - name: prod-8vcpu-32gb-spot
      minSize: 0
      maxSize: 15
      desiredCapacity: 10
      instancesDistribution:
        instanceTypes: ["m5.2xlarge", "m5n.2xlarge", "m5d.2xlarge", "m5dn.2xlarge","m5a.2xlarge", "m4.2xlarge"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: capacity-optimized
      labels:
        lifecycle: Ec2Spot
        intent: apps
        aws.amazon.com/spot: "true"
      tags:
        k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
        k8s.io/cluster-autoscaler/node-template/label/intent: apps
      iam:
        withAddonPolicies:
          autoScaler: true
          albIngress: true
eksctl create nodegroup -f spot_nodegroups.yml

A few points to note here, for more technical details refer to the EC2 Spot workshop.

  • There are two diversified node groups created with a fixed vCPU:Memory ratio. This adheres to the Spot best practice of diversifying instances, and helps the Cluster Autoscaler function properly.
  • Capacity-optimized Spot allocation strategy is used in both the node groups.

Once the nodes are created, you can check the number of instances provisioned using the command below. It should display 20 as we configured each of our two node groups with a desired capacity of 10 instances.

kubectl get nodes --selector=lifecycle=Ec2Spot | expr $(wc -l) - 1

The cluster setup is complete.

Install the AWS Node Termination Handler

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.3.1/all-resources.yaml

This installs the Node Termination Handler to both Spot Instance and On-Demand Instance nodes. This helps the handler responds to both EC2 maintenance events and Spot Instance interruptions.

Deploy Cluster Autoscaler

For additional detail, see the Amazon EKS page here. Next, export the Cluster Autoscaler into a configuration file:

curl -o cluster_autoscaler.yml https://raw.githubusercontent.com/awslabs/ec2-spot-workshops/master/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml

Open the file created and edit.

Add AWS Region and the cluster name as depicted in the screenshot below.

Run the commands below to deploy Cluster Autoscaler.

<div class="hide-language"><pre class="unlimited-height-code"><code class="lang-yaml">kubectl apply -f cluster_autoscaler.yml</code></pre></div><div class="hide-language"><pre class="unlimited-height-code"><code class="lang-yaml">kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"</code></pre></div>

Use this command to see into the Cluster Autoscaler (CA) logs to find NodeGroups auto-discovered. Use Ctrl + C to abort the log view.

kubectl logs -f deployment/cluster-autoscaler -n kube-system --tail=10

Deploy TensorFlow Serving

TensorFlow Model Server is deployed in pods and the model will load from the model stored in Amazon S3.

Amazon S3 access

We are using Kubernetes Secrets to store and manage the AWS Credentials for S3 Access.

Copy the following and create a file called kustomization.yml. Add the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY details in the file.

namespace: default
secretGenerator:
- name: s3-credentials
  literals:
  - AWS_ACCESS_KEY_ID=<<AWS_ACCESS_KEY_ID>>
  - AWS_SECRET_ACCESS_KEY=<<AWS_SECRET_ACCESS_KEY>>
generatorOptions:
  disableNameSuffixHash: true

Create the secret file and deploy.

kubectl kustomize . > secret.yaml
kubectl apply -f secret.yaml

We recommend to use Sealed Secret for production workloads, Sealed Secret provides a mechanism to encrypt a Secret object thus making it more secure. For further details please take a look at the AWS workshop here.

ALB Ingress Controller

Deploy RBAC Roles and RoleBindings needed by the AWS ALB Ingress controller.

kubectl apply -f

https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/rbac-role.yaml

Download the AWS ALB Ingress controller YAML into a local file.

curl -sS "https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/alb-ingress-controller.yaml" &gt; alb-ingress-controller.yaml

Change the –cluster-name flag to ‘TensorFlowServingCluster’ and add the Region details under – –aws-region. Also add the lines below just before the ‘serviceAccountName’.

nodeSelector:
    lifecycle: OnDemand

Deploy the AWS ALB Ingress controller and verify that it is running.

kubectl apply -f alb-ingress-controller.yaml
kubectl logs -n kube-system $(kubectl get po -n kube-system | grep alb-ingress | awk '{print $1}')

Deploy the application

Next, download a model as explained in the TF official documentation, then upload in Amazon S3.

mkdir /tmp/resnet

curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | \
tar --strip-components=2 -C /tmp/resnet -xvz

RANDOM_SUFFIX=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 10 | head -n 1)

S3_BUCKET="resnet-model-k8serving-${RANDOM_SUFFIX}"
aws s3 mb s3://${S3_BUCKET}
aws s3 sync /tmp/resnet/1538687457/ s3://${S3_BUCKET}/resnet/1/

Copy the following code and create a file named tf_deployment.yml. Don’t forget to replace <AWS_REGION> with the AWS Region you plan to use.

A few things to note here:

  • NodeSelector is used to route the TF Serving replica pods to Spot Instance nodes.
  • ServiceType LoadBalancer is used.
  • model_base_path is pointed at Amazon S3. Replace the <S3_BUCKET> with the S3_BUCKET name you created in last instruction set.
apiVersion: v1
kind: Service
metadata:
  labels:
    app: resnet-service
  name: resnet-service
spec:
  ports:
  - name: grpc
    port: 9000
    targetPort: 9000
  - name: http
    port: 8500
    targetPort: 8500
  selector:
    app: resnet-service
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: resnet-service
  name: resnet-v1
spec:
  replicas: 25
  selector:
    matchLabels:
      app: resnet-service
  template:
    metadata:
      labels:
        app: resnet-service
        version: v1
    spec:
      nodeSelector:
        lifecycle: Ec2Spot
      containers:
      - args:
        - --port=9000
        - --rest_api_port=8500
        - --model_name=resnet
        - --model_base_path=s3://<S3_BUCKET>/resnet/
        command:
        - /usr/bin/tensorflow_model_server
        env:
        - name: AWS_REGION
          value: <AWS_REGION>
        - name: S3_ENDPOINT
          value: s3.<AWS_REGION>.amazonaws.com
   - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: AWS_SECRET_ACCESS_KEY        
image: tensorflow/serving
        imagePullPolicy: IfNotPresent
        name: resnet-service
        ports:
        - containerPort: 9000
        - containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "2"
            memory: 2Gi

Deploy the application.

kubectl apply -f tf_deployment.yml

Copy the code below and create a file named ingress.yml.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "resnet-service"
  namespace: "default"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
  labels:
    app: resnet-service
spec:
  rules:
    - http:
        paths:
          - path: "/v1/models/resnet:predict"
            backend:
              serviceName: "resnet-service"
              servicePort: 8500

Deploy the ingress.

kubectl apply -f ingress.yml

Deploy the Metrics Server and Horizontal Pod Autoscaler, which scales up when CPU/Memory exceeds 50% of the allocated container resource.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
kubectl autoscale deployment resnet-v1 --cpu-percent=50 --min=20 --max=100

Load testing

Download the Python helper file written for testing the deployed application.

curl -o submit_mc_tf_k8s_requests.py https://raw.githubusercontent.com/awslabs/ec2-spot-labs/master/tensorflow-serving-load-testing-sample/python/submit_mc_tf_k8s_requests.py

Get the address of the Ingress using the command below.

kubectl get ingress resnet-service

Install a Python Virtual Env. and install the library requirements.

pip3 install virtualenv
virtualenv venv
source venv/bin/activate
pip3 install tqdm
pip3 install requests

Run the following command to warm up the cluster after replacing the Ingress address. You will be running a Python application for predicting the class of a downloaded image against the ResNet model, which is being served by the TF Serving rest API. You are running multiple parallel processes for that purpose. Here “p” is the number of processes and “r” the number of requests for each process.

python submit_mc_tf_k8s_requests.py -p 100 -r 100 -u 'http://<INGRESS ADDRESS>:80/v1/models/resnet:predict'


You can use the command below to observe the scaling of the cluster.

kubectl get hpa -w

We ran the above again with 10,000 requests per process as to send 1 million requests to the application. The results are below:

The deployment was able to serve ~300 requests per second with an average latency of ~320 ms per requests.

Cleanup

Now that you’ve successfully deployed and ran TensorFlow Serving using Ec2 Spot it’s time to cleanup your environment. Remove the ingress, deployment, ingress-controller.

kubectl delete -f ingress.yml
kubectl delete -f tf_deployment.yml
kubectl delete -f alb-ingress-controller.yaml

Remove the model files from Amazon S3.

aws s3 rb s3://${S3_BUCKET}/ --force 

Delete the node groups and the cluster.

eksctl delete nodegroup -f spot_nodegroups.yml --approve
eksctl delete cluster --name TensorFlowServingCluster

Conclusion

In this blog, we demonstrated how TensorFlow Serving can be deployed onto Spot Instances based on a Kubernetes cluster, achieving both resilience and cost optimization. There are multiple optimizations that can be implemented on TensorFlow Serving that will further optimize the performance. This deployment can be extended and used for serving multiple models with different versions. We hope you consider running TensorFlow Serving using EC2 Spot Instances to cost optimize the solution.

New Records for AI Training

Post Syndicated from Samuel K. Moore original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/new-records-for-ai-training

The most broadly accepted suite of seven standard tests for AI systems released its newest rankings Wednesday, and GPU-maker Nvidia swept all the categories for commercially-available systems with its new A100 GPU-based computers, breaking 16 records. It was, however, the only entrant in some of them.

The rankings are by MLPerf, a  consortium with membership from both AI powerhouses like Facebook, Tencent, and Google and startups like Cerebras, Mythic, and Sambanova. MLPerf’s tests measure the time it takes a computer to train a particular set of neural networks to an agreed upon accuracy. Since the previous round of results, released in July 2019, the fastest systems improved by an average of 2.7x, according to MLPerf.

This AI Can See the Forest and the Trees

Post Syndicated from Zack Parisa original https://spectrum.ieee.org/artificial-intelligence/machine-learning/this-ai-can-see-the-forest-and-the-trees

In 2007, one of us (Parisa) found himself standing alone in the woods of Armenia and fighting off a rising feeling of dread.

Armenia, a former Soviet-bloc country, is about the size of Maryland. Its forests provide residents with mushrooms and berries, habitat for game animals, and firewood to heat homes during the cold winters. The forests also shelter several endangered bird species.

Parisa, then a first-year graduate student studying forestry, was there to help the country figure out a plan for managing those forests. The decisions the Armenian people make about their forests must balance economic, cultural, and conservation values, and those decisions will have repercussions for years, decades, or even centuries to come. To plan properly, Armenians need to answer all sorts of questions. What level of firewood harvest is sustainable? How can those harvests be carried out while minimizing disruption to bird habitat? Can these logging operations open up spaces in a way that helps people to gather more berries?

Across the world, communities depend on expert foresters to help them manage forests in a way that best balances such competing needs. In turn, foresters depend on hard data—and have done so for a very long time.

In the early 1800s, foresters were at the forefront of a “big data” revolution of sorts. It wasn’t feasible to count every tree on every hectare, so foresters had to find another way to evaluate what the land held. The birth of scientific forestry early in the 19th century in Saxony ushered in rudimentary statistical sampling techniques that gave reliable estimates of the distribution of the sizes and species of trees across large swaths of land without someone having to measure every single tree.

A collection of this type of data is called a forest inventory, which foresters use to develop management plans and project what the forest will look like in the future. The techniques forged two centuries ago to create such inventories—laborious field sampling to arrive at population statistics—have remained largely unchanged to this day, and hundreds of foresters working in the United States still count trees with pencil and paper even now.

Parisa was excited to help communities in Armenia develop forest management plans. He had been assured that he’d have good data for the large area where he was to work, in and around Dilijan National Park. But the “forest inventory” he’d been promised turned out to be the translated field notes from Soviet foresters who had visited the area more than 30 years earlier—observations along the lines of “Went on walk on southern exposure of the mountain. Many pine, few beech.” Such casual observations couldn’t possibly provide a solid foundation on which to build a forest management plan.

Parisa needed to inventory hundreds of thousands of hectares of forest. He knew, though, that a single forester can assess roughly 20 ­hectares (about 50 acres) in a day. Unless he wanted to spend the next decade counting trees in Armenia, he had to find a way to get those numbers faster.

Parisa grew up in Huntsville, Ala., where his father worked for NASA. Once, when Parisa was 8 years old, he hit a baseball through a window, and his dad punished him by making him calculate the amount of force behind the ball. He got good at that sort of exercise and later came to study forestry with an unusually quantitative skill set.

In Armenia, Parisa put those skills to work figuring out how to compile a complete forest inventory using remote sensing, which has been the holy grail of forestry for decades. Within 18 months, he developed the core of the machine-learning approach that the two of us later used in founding SilviaTerra, a startup based in San Francisco that’s dedicated to producing forest inventories from remotely sensed data. Here’s an overview of the some of the challenges we faced, how we overcame them, and what we’ve been able to do with this technology.

Most people rarely think about forests, yet they play a vital role in our lives. The lumber that was used to build your house, the paper cup that holds your morning coffee, and the cardboard box containing your latest online delivery all came from a tree growing in the woods.

Measuring the potential of forests to provide those things has historically been expensive, slow, and low tech. The biggest forestry companies in the United States spend millions each year paying people to laboriously count and measure trees. The forests owned by such companies make up a sizable fraction of the U.S. total. So it made sense for us to concentrate on such places after we launched SilviaTerra in 2010.

The next year, our fledgling startup won the Sabin Sustainable Venture Prize from the Yale Center for Business and the Environment, in New Haven, Conn. We spent some of the US $25,000 prize money driving around the southeast United States in a pickup truck, looking for companies that owned more than 10,000 acres of forest so we could set up meetings with their executives.

We soon found our first paying customers. Later, we signed contracts with companies elsewhere, eventually applying our technology to all of the major forest types in the United States.

For the most part, our service has proved very attractive—so it isn’t a hard sell. What we offer is analogous to what farmers require to practice precision agriculture, a general approach that often uses remote sensing to inform decisions about what to grow, how to fertilize it, when to harvest the crop, and so forth. You might say that SilviaTerra is enabling “precision forestry.”

Being precise about forests, however, is more difficult than being precise about farmland. For one thing, you almost always know what you’ve planted in your fields, and it’s almost invariably just one crop. But natural forests can have a bewildering mix of tree species. Often, the dominant tree species can hide other kinds of trees lower in the canopy. And while crops are generally planted in rows or other regular geometries, forests usually have a much more organic spatial arrangement (although some managed plantations do have trees growing neatly in rows). What’s more, forests tend to be, um, out in the woods, and their remoteness makes it hard to collect ground truth.

Another technical challenge for us has been dealing with a veritable tsunami of data. The Landsat satellite archive, for example, stretches back to 1972 and is enormously rich, with millions of images, for both optical and infrared bands. And the amount of nationwide high-­resolution aerial imagery, digital elevation maps, and so forth just keeps growing every day. There are now terabytes of relevant data to digest.

An even taller hurdle was finding a way to analyze the imagery in a way that gives reliable estimates. The executives of publicly owned timber companies are especially keen on having good estimates, because they have to report accurate numbers about their holdings to investors.

Another big challenge was dealing with the fact that most of the satellite imagery available to us was of quite limited resolution—typically 15 meters. That’s much too coarse to make out individual trees in an image. As a result, we had to use a statistical technique rather than computer vision per se here. (One benefit of this statistical approach is that it avoids the biases that commonly result with high-resolution tree-delineation methods.)

For all these reasons, creating an inventory of what’s growing in a forest is technically more difficult than creating an inventory of what’s growing on a farmer’s field. The economic stakes are also different: The value of the annual crop harvest in the United States is about $400 billion, while the annual timber harvest is only $10 billion.

That said, forests provide many benefits that nobody pays for, including wildlife habitat, carbon sequestration, and water filtration, not to mention nice places to camp for the weekend.

More than 20 years ago, the economist Robert Costanza and others examined the value of the various ecosystem services that forests deliver, even though no money changes hands. Based on those results, we estimate that U.S. forests provide about $100 billion worth of ecosystem services every year. Part of our mission at SilviaTerra is to help put real numbers on these ecosystem services for every acre of forest in the United States.

The output of our very complicated machine-learning system for processing remotely sensed forest imagery is actually very simple: For each 1/20 of an acre (0.02 hectare, or a little smaller than the footprint of an average U.S. home), the system builds a list of the trees standing there. The list includes the species of each tree and its diameter as measured 4.5 feet (1.4 meters) off the ground, following standard U.S. forestry practice. Other key metrics, such as tree height and total carbon storage, can be derived from these values. Things like wildfire risk or the suitability of the land as deer habitat can be modeled based on the types of trees present.

To create this giant list of trees, we combined thousands of field measurements with terabytes of satellite imagery. So we needed field data for the entire United States. Fortunately, for decades U.S. taxpayers have paid the U.S. Forest Service to establish a nationwide grid of forest measurements. This amazing collection of observations spans the continental United States, and it provided exactly what we needed to train our machine-learning system to gauge the number, size, and species of trees present in remote-sensing imagery.

In most remote-sensing forestry efforts, a human analyst starts with a single image that he or she hopes will document everything in the area of interest. For example, the analyst might use lidar data in the form of a high-resolution point cloud (the coordinates of a set of points in 3D space) to figure out the number of trees present, as well as their heights and species.

Lidar imagery is expensive to obtain, though, so there’s not much of it around. And what can be had is often sorely out of date or incomplete. For these reasons, we instead relied on a wide range of free satellite and aerial imagery. We used all kinds—visible light, near-infrared, radar—because each kind of image tells you about a different aspect of the forest. Landsat imagery stretching back decades is often great for picking up on the differences among species, while radar typically contains much more information about overall forest structure. The key is to combine these different types of imagery and analyze them in a statistically rigorous way.

Before we took on this problem, a single high-resolution inventory of all U.S. forests did not exist. But if society is going to prevent more wildfires, grow rural economies in a sustainable way, and manage climate change, a much better understanding of our forests is needed. We boosted that understanding in a unique way when we finished our nationwide forest Basemap project last year.

Although we had previously applied our methodology to many focused projects, compiling a forest inventory for the continental United States was an entirely new scale of undertaking. We were very fortunate to partner with Microsoft, which in 2017 launched its AI for Earth grant program to provide the company’s tools to outside teams working on conservation projects. We applied for and ultimately received a grant to expand the forest inventory work we had been doing.

Using Microsoft Azure, the company’s cloud-computing platform, we were able to process over 10 TB of satellite imagery. It wasn’t just a matter of needing more computing power. Modeling the particular kinds of forests present in different regions was a major challenge. So was recognizing issues with data integrity. We spent one confused weekend, for example, trying to sort out problems in the output before we realized that some ­high-resolution aerial imagery is blacked out over military bases!

While we weren’t expecting such artificial holes in the data, we knew from our prior work that it can be hard to find cloud-free images of a given area. For some regions—especially in the Pacific Northwest—you simply can’t find any such images that cover an appreciable area.

Luckily, Lin Yan, now of Michigan State University, published a method for dealing with just this problem in 2018. When an image is obscured by a cloud, his algorithm replaces the cloud, pixel by pixel, with pixels from another image obtained when the sky over that spot of land was clear. We applied Yan’s algorithm to produce a set of cloud-free images, which were much easier to analyze.

We unveiled our nationwide forest inventory last year, but we knew it was just a starting point: Having better information doesn’t do any good unless it actually affects the decisions that people are making about their land. So influencing those decisions is now our focus.

For that, we’ve again partnered with Microsoft, which intends to become carbon negative by 2030. Microsoft can’t cease emitting carbon dioxide entirely, but it plans to offset its emissions, at least in part by paying forest owners to defer their timber harvests and thus sequester carbon through the growth of trees.

Carbon markets are not new, but they’ve been notoriously ineffective because it’s very hard to monitor such carbon sequestration. Our Basemap, which is updated annually, now makes that monitoring straightforward.

New possibilities also open up. The California carbon market, for example, is accessible only to landowners with more than 2,000 hectares of trees—smaller forests are too expensive to monitor. It also requires forest owners to make a 100-year commitment to keep carbon stocks at a certain level. Yet the most important time to sequester carbon is now, not a century in the future. A shorter-term contract of one year would provide the same immediate benefit at a lower cost, allowing much larger areas to be protected, at least in the short term.

Our Basemap dramatically lowers the cost of monitoring forests over time, which will allow millions of small landowners to participate in such markets. And because the Basemap is updated every year, Microsoft and others can make payments to those landowners year after year, providing much greater value for the money spent combating climate change.

Markets work well for commodities like corn, because when you sign a futures contract to sell corn at a certain price, someone down the line has to deliver a quantity of corn to a warehouse. There, the corn will be weighed and examined, so it’s easy enough to measure what’s being bought.

Using markets to influence carbon sequestration or land conservation is much harder, in large part because these processes usually take place out of sight, somewhere out in the woods. It’s difficult enough to put a dollar value on what has been gained by not cutting trees down, but if you can’t even determine whether trees have been harvested from a given area, you’ll be very reluctant to pay a landowner for the promise not to cash in on his or her timber reserves.

SilviaTerra’s Basemap now gives people in the United States a way to measure and pay for trees that are allowed to remain standing so that these forests will continue to provide important ecosystem services. Being able to see the forest and the trees in this way, we believe, will help shape a more sustainable future.

About the Author

Zack Parisa and Max Nova are cofounders of the San Francisco–based precision-forestry startup SilviaTerra.

Amazon Translate now supports Office documents

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-translate-now-supports-office-documents/

Whether your organization is a multinational enterprise present in many countries, or a small startup hungry for global success, translating your content to local languages may be an enduring challenge. Indeed, text data often comes in many formats, and processing them may require several different tools. Also, as all these tools may not support the same language pairs, you may have to convert certain documents to intermediate formats, or even resort to manual translation. All these issues add extra cost, and create unnecessary complexity in building consistent and automated translation workflows.

Amazon Translate aims at solving these problems in a simple and cost effective fashion. Using either the AWS console or a single API call, Amazon Translate makes it easy for AWS customers to quickly and accurately translate text in 55 different languages and variants.

Earlier this year, Amazon Translate introduced batch translation for plain text and HTML documents. Today, I’m very happy to announce that batch translation now also supports Office documents, namely .docx, .xlsx and .pptx files as defined by the Office Open XML standard.

Introducing Amazon Translate for Office Documents
The process is extremely simple. As you would expect, source documents have to be stored in an Amazon Simple Storage Service (S3) bucket. Please note that no document may be larger than 20 Megabytes, or have more than 1 million characters.

Each batch translation job processes a single file type and a single source language. Thus, we recommend that you organize your documents in a logical fashion in S3, storing each file type and each language under its own prefix.

Then, using either the AWS console or the StartTextTranslationJob API in one of the AWS language SDKs, you can launch a translation job, passing:

  • the input and output location in S3,
  • the file type,
  • the source and target languages.

Once the job is complete, you can collect translated files at the output location.

Let’s do a quick demo!

Translating Office Documents
Using the S3 console, I first upload a few .docx documents to one of my buckets.

S3 files

Then, moving to the Translate console, I create a new batch translation job, giving it a name, and selecting both the source and target languages.

Creating a batch job

Then, I define the location of my documents in S3, and their format, .docx in this case. Optionally, I could apply a custom terminology, to make sure specific words are translated exactly the way that I want.

Likewise, I define the output location for translated files. Please make sure that this path exists, as Translate will not create it for you.

Creating a batch job

Finally, I set the AWS Identity and Access Management (IAM) role, giving my Translate job the appropriate permissions to access S3. Here, I use an existing role that I created previously, and you can also let Translate create one for you. Then, I click on ‘Create job’ to launch the batch job.

Creating a batch job

The job starts immediately.

Batch job running

A little while later, the job is complete. All three documents have been translated successfully.

Viewing a completed job

Translated files are available at the output location, as visible in the S3 console.

Viewing translated files

Downloading one of the translated files, I can open it and compare it to the original version.

Comparing files

For small scale use, it’s extremely easy to use the AWS console to translate Office files. Of course, you can also use the Translate API to build automated workflows.

Automating Batch Translation
In a previous post, we showed you how to automate batch translation with an AWS Lambda function. You could expand on this example, and add language detection with Amazon Comprehend. For instance, here’s how you could combine the DetectDominantLanguage API with the Python-docx open source library to detect the language of .docx files.

import boto3, docx
from docx import Document

document = Document('blog_post.docx')
text = document.paragraphs[0].text
comprehend = boto3.client('comprehend')
response = comprehend.detect_dominant_language(Text=text)
top_language = response['Languages'][0]
code = top_language['LanguageCode']
score = top_language['Score']
print("%s, %f" % (code,score))

Pretty simple! You could also detect the type of each file based on its extension, and move it to the proper input location in S3. Then, you could schedule a Lambda function with CloudWatch Events to periodically translate files, and send a notification by email. Of course, you could use AWS Step Functions to build more elaborate workflows. Your imagination is the limit!

Getting Started
You can start translating Office documents today in the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), and Asia Pacific (Seoul).

If you’ve never tried Amazon Translate, did you know that the free tier offers 2 million characters per month for the first 12 months, starting from your first translation request?

Give it a try, and let us know what you think. We’re looking forward to your feedback: please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts.

– Julien

Amazon Fraud Detector is now Generally Available

Post Syndicated from Alejandra Quetzalli original https://aws.amazon.com/blogs/aws/amazon-fraud-detector-is-now-generally-available/

What was announced?

Amazon Fraud Detector is now Generally Available! 🥳

In case you missed the announcement during 2019 re:Invent week, Amazon Fraud Detector was originally released in preview mode on December 3rd, 2019. But today it is now Generally Available for customers to check out.

What is Amazon Fraud Detector?

Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities such as online payment fraud and the creation of fake accounts.

Did you know that each year, tens of billions of dollars are lost to online fraud world-wide?

Companies with online businesses have to constantly be on guard for fraudulent activity such as fake accounts and payments made with stolen credit cards.  One way they try to identify fraudsters is by using fraud detection apps, some of which use Machine Learning (ML).

Enter Amazon Fraud Detector! It uses your data, ML, and more than 20 years of fraud detection expertise from Amazon to automatically identify potentially fraudulent online activity so you can catch more fraud faster. You can create a fraud detection model with just a few clicks and no prior ML experience because Fraud Detector handles all of the ML heavy lifting for you.

How it works..

“But how does it work?” you ask. 🤷🏻‍♀️

I’m so glad you asked! Let’s summarize this into 5 main steps. 👩🏻‍💻

  • Step 1: Define the event you want to assess for fraud.
  • Step 2: Upload your historical event dataset to Amazon S3 and select a fraud detection model type.
  • Step 3: Amazon Fraud Detector uses your historical data as input to build a custom model. The service automatically inspects and enriches data, performs feature engineering, selects algorithms, trains and tunes your model, and hosts the model.
  • Step 4: Create rules to either accept, review, or collect more information based on model predictions.
  • Step 5: Calls the Amazon Fraud Detector API from your online application to receive real-time fraud predictions and take action based on your configured detection rules. (Example: an ecommerce application can send an email and IP address and receive a fraud score as well as the output from your rule (e.g., review))

Let’s see a demo…

Let’s have a demo to better understand how it all fits together. In today’s post, we will walk you through two main components: Building an Amazon Fraud Detector model and Generating real-time fraud predictions.

Part A: Building an Amazon Fraud Detector model

We begin by uploading fictitious generated training data to an S3 bucket. In fact, our user guide has a sample data set that we can use. Once we have downloaded that CSV file, we need to put this training data into an S3 bucket.

For context, let’s also go ahead open that CSV file and see what’s inside…

👉🏾NOTE: With Amazon Fraud Detector, you’re able to choose a minimum of 2 variables to train a model, not just the email and IP address. (In fact, the model supports up to 100 inputs!)

We continue by defining (creating) an event. An event is essentially a set of attributes about a particular event. We define the structure of the event we want to evaluate for fraud. (Amazon Fraud Detector evaluates ‘events’ for fraud.)

Let’s create a New Entity. This entity represents the person or thing that is triggering the event.

event_details

create_entity

We move on to Event Variables. We will select variables from a training dataset. This will allow us to use the earlier mentioned CSV file and pull in the headers.

For the IAM role section, we create a new one. I am going to use the same name as my bucket I just created, ‘fraud-detector-training-data’.

And now we can upload the earlier mentioned CSV file to pull in the headers.


Because we are going to define a model, we must define at least two labels.

Let’s finalize creating our event!

If all goes well, we get a happy green bar that alerts us to the fact that our event was successfully created!

event_detail_page

Now it’s time to create our Model.

 

Let’s take a moment to Define model details. We make sure to select our previously created event type.

create_model_step_1

We move on to Configure training and make sure to select the labels under Fraud and Legitimate labels. (This allows us to separate our classifications so that the model can learn to distinguish between these two labels.)

Models take about 30-40 minutes up to a couple hours depending on the dataset size. This example dataset takes around 40 minutes to train the model.

For the purpose of this blog post, let’s pretend we’ve already skipped ahead 40 minutes in time to a training model that is complete. 🙌🏾

model_detail_page

You can also check out your model’s performance metrics!

model_performance

We can now proceed to deploy our Model.

deploy_model_1

A pop-up model asks us to confirm if this is the version we wish to Deploy.

deploy_model_confirmation

 

 

Part B: Generate real-time fraud predictions

It’s time to generate real-time fraud predictions! Ready?

At this point you have a deployed model that you’re happy with and want to use to get predictions.

We must build a Detector, which is a container for your models and rules. It’s your detection logic that you want to apply to evaluate the event.

We go on to define the Detector details.

We also make sure to select our previously created Event.

detector_wizard_step_1

Now we select a Model.

add_model_to_detector

We move on to establish some threshold rules.

The rules interpret the output of the Model. They also determine the output of the Detector.

high_fraud_risk_rule

Let’s do two more rules.

Besides a high_fraud_risk label, we also want to add low_fraud_risk and medium_fraud_risk labels.

low_fraud_risk_rule

medium_fraud_risk_rule

Remember that these rule threshold values are examples only. When creating rules for your own detector, you should use values that are appropriate based on your model, data and business.

Now in our example for this post, these particular threshold rules are never going to match at the same time.

three_rules_created

This means that either Rule Execution modes are fine to use in our current example.

Yay! We’ve created our Detector.

detector_created_banner

Now let’s click on the Rules tab.

detector_rules_tab

We can also check out what models we have under the Models tab.

detector_models_tab

If we go back to the Overview tab, we can even run a quick test! We can run tests to sample the output from our Detector. 

run_test

Once we’re ready, we can publish this version of the detector to make it the Active version. Each detector can have one Active version at a time.

publish_detector

A pop-up modal asks us to confirm if we’re ready to publish this version.

The next step is to run real time predictions! Let’s show a sample one-off prediction with an Amazon SageMaker notebook and see what that looks like.

We move to the Amazon SageMaker console, and go to Notebook instances.

In this case you can see I already have a Jupyter Notebook ready to go.

We’re going to run the get_event_prediction block. This is our main runtime API and customers can call it using a script to run a batch of sample predictions. Alternatively, customers can also integrate this API into their applications to generate real-time predictions, and adjust user experiences dynamically based on risk.

After running this block, here are the model score results we receive.

We had 1 model in this Detector and it returned a score of 933. According to the rules we created, this means we consider this transaction to return as a high_fraud_risk.

get_prediction

Let’s head back to the Amazon Fraud Detector console and check out the Rules in our Detector.

We can see from the Rules of our Detector that if the risk score is over 900, the Outcome should be verify_customer.

This completes the loop!

We now have confirmation that you can call this Detector in real time and get your Fraud Predictions.

🌎 Lastly…
Amazon Fraud Detector is now globally available to our customers and is integrated with many AWS services such as Amazon CloudWatch, AWS CloudTrail, AWS PrivateLink, etc.

To learn more about Amazon Fraud Detector, visit the website and the developer guide.

 

¡Gracias por tu tiempo!
~Alejandra 💁🏻‍♀️🤖  y Canela 🐾

Powerful AI Can Now Be Trained on a Single Computer

Post Syndicated from Edd Gent original https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/powerful-ai-can-now-be-trained-on-a-single-computer

The enormous computing resources required to train state-of-the-art artificial intelligence systems means well-heeled tech firms are leaving academic teams in the dust. But a new approach could help balance the scales, allowing scientists to tackle cutting-edge AI problems on a single computer.

A 2018 report from OpenAI found the processing power used to train the most powerful AI is increasing at an incredibly fast pace, doubling every 3.4 months. One of the most data-hungry approaches is deep reinforcement learning, where AI learns through trial and error by iterating through millions of simulations. Impressive recent advances on videogames like Starcraft and Dota2 have relied on servers packed with hundreds of CPUs and GPUs.

Specialized hardware such as the Cerebras System’s Wafer Scale Engine promises to replace these racks of processors with a single large chip perfectly optimized for training AI. But with a price tag running into the millions, it’s not much solace for under-funded researchers.

Now a team from the University of Southern California and Intel Labs have created a way to train deep reinforcement learning (RL) algorithms on hardware commonly available in academic labs. In a paper presented at the 2020 International Conference on Machine Learning (ICML) this week, they describe how they were able to use a single high-end workstation to train AI with state-of-the-art performance on the first-person shooter videogame Doom. They also tackle a suite of 30 diverse 3D challenges created by DeepMind using a fraction of the normal computing power.

“Inventing ways to do deep RL on commodity hardware is a fantastic research goal,” says Peter Stone, a professor at the University of Texas at Austin who specializes in deep RL. As well as leaving smaller research groups behind, the computing resources normally required to carry out this kind of research have a significant carbon footprint, he adds. “Any progress towards democratizing RL and reducing the energy needs for doing research is a step in the right direction,” he says.

The inspiration for the project was a classic case of necessity being the mother of invention, says lead author Aleksei Petrenko, a graduate student at USC. As a summer internship at Intel came to an end, Petrenko lost access to the company’s supercomputing cluster putting unfinished deep RL projects in jeopardy. So he and colleagues decided to find a way to continue the work on simpler systems.

“From my experience, a lot of researchers don’t have access to cutting-edge, fancy hardware,” says Petrenko. “We realized that just by rethinking in terms of maximizing the hardware utilization you can actually approach the performance you will usually squeeze out of a big cluster even on a single workstation.”

The leading approach to deep RL places an AI agent in a simulated environment that provides rewards for achieving certain goals, which the agent uses as feedback to work out the best strategy. This involves three main computational jobs: simulating the environment and the agent; deciding what to do next next based on learned rules called a policy; and using the results of those actions to update the policy.

Training is always limited by the slowest process, says Petrenko, but these three jobs are often intertwined in standard deep RL approaches, making it hard to optimize them individually. The researchers’ new approach, dubbed Sample Factory, splits them up so resources can be dedicated to get them all running at peak speeds.

Piping data between processes is another major bottleneck as these can often be spread across multiple machines, Petrenko explains. His group took advantage of working on a single machine by simply cramming all the data to shared memory where all processes can access it instantaneously.

This resulted in significant speed-ups compared to leading deep RL approaches. Using a single machine equipped with a 36-core CPU and one GPU, the researchers were able to process roughly 140,000 frames per second while training on Atari videogames and Doom, or double the next best approach. On the 3D training environment DeepMind Lab, they clocked 40,000 frames per second—about 15 percent better than second place.

To check how frame rate translated into training time the team pitted Sample Factory against an algorithm Google Brain open-sourced in March that is designed to dramatically increase deep RL efficiency. Sample Factory trained on two simple tasks in Doom in a quarter of the time it took the other algorithm. The team also tested their approach on a collection of 30 challenges in DeepMind Lab using a more powerful 36-core 4-GPU machine. The resulting AI significantly outperformed the original AI that DeepMind used to tackle the challenge, which was trained on a large computing cluster.

Edward Beeching, a graduate student working on deep RL at the Institut National des Sciences Appliquées de Lyon, in France, says the approach might struggle with memory intensive challenges like the photo-realistic 3D simulator Habitat released by Facebook last year.

But he adds that these kinds of efficient training approaches are vitally important for smaller research teams. “A four-fold increase compared to the state of the art implementation is huge,” he says. “This means in the same time you can run four times as many experiments.”

While the computers used in the paper are still high-end workstations designed for AI research, Petrenko says he and his collaborators have also been using Sample Factory on much simpler devices. He’s even been able to run some advanced deep RL experiments on his mid-range gaming laptop, he says. “This is unheard of.”

New – Label Videos with Amazon SageMaker Ground Truth

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-label-videos-with-amazon-sagemaker-ground-truth/

Launched at AWS re:Invent 2018, Amazon Sagemaker Ground Truth is a capability of Amazon SageMaker that makes it easy to annotate machine learning datasets. Customers can efficiently and accurately label image, text and 3D point cloud data with built-in workflows, or any other type of data with custom workflows. Data samples are automatically distributed to a workforce (private, 3rd party or MTurk), and annotations are stored in Amazon Simple Storage Service (S3). Optionally, automated data labeling may also be enabled, reducing both the amount of time required to label the dataset, and the associated costs.

As models become more sophisticated, AWS customers are increasingly applying machine learning prediction to video content. Autonomous driving is perhaps the most well-known use case, as safety demands that road condition and moving objects be correctly detected and tracked in real-time. Video prediction is also a popular application in Sports, tracking players or racing vehicles to compute all kinds of statistics that fans are so fond of. Healthcare organizations also use video prediction to identify and track anatomical objects in medical videos. Manufacturing companies do the same to track objects on the assembly line, parcels for logistics, and more. The list goes on, and amazing applications keep popping up in many different industries.

Of course, this requires building and labeling video datasets, where objects of interest need to be labeled manually. At 30 frames per second, one minute of video translates to 1,800 individual images, so the amount of work can quickly become overwhelming. In addition, specific tools have to be built to label images, manage workflows, and so on. All this work takes valuable time and resources away from an organization’s core business.

AWS customers have asked us for a better solution, and today I’m very happy to announce that Amazon Sagemaker Ground Truth now supports video labeling.

Customer use case: the National Football League
The National Football League (NFL) has already put this new feature to work. Says Jennifer Langton, SVP of Player Health and Innovation, NFL: “At the National Football League (NFL), we continue to look for new ways to use machine learning (ML) to help our fans, broadcasters, coaches, and teams benefit from deeper insights. Building these capabilities requires large amounts of accurately labeled training data. Amazon SageMaker Ground Truth was truly a force multiplier in accelerating our project timelines. We leveraged the new video object tracking workflow in addition to other existing computer vision (CV) labeling workflows to develop labels for training a computer vision system that tracks all 22 players as they move on the field during plays. Amazon SageMaker Ground Truth reduced the timeline for developing a high quality labeling dataset by more than 80%”.

Courtesy of the NFL, here are a couple of predicted frames, showing helmet detection in a Seattle Seahawks video. This particular video has 353 frames. This first picture is frame #100.

Object tracking

This second picture is frame #110.

Object tracking

Introducing Video Labeling
With the addition of video task types, customers can now use Amazon Sagemaker Ground Truth for:

  • Video clip classification
  • Video multi-frame object detection
  • Video multi-frame object tracking

The multi-frame task types support multiple labels, so that you may label different object classes present in the video frames. You can create labeling jobs to annotate frames from scratch, as well as adjustment jobs to review and fine tune frames that have already been labeled. These jobs may be distributed either to a private workforce, or to a vendor workforce you picked on AWS Marketplace.

Using the built-in GUI, workers can then easily label and track objects across frames. Once they’ve annotated a frame, they can use an assistive labeling feature to predict the location of bounding boxes in the next frame, as you will see in the demo below. This significantly simplifies labeling work, saves time, and improves the quality of annotations. Last but not least, work is saved automatically.

Preparing Input Data for Video Object Detection and Tracking
As you would expect, input data must be located in S3. You may bring either video files, or sequences of video frames.

The first option is the simplest, as Amazon Sagemaker Ground Truth includes a tool that automatically extracts frames from your video files. Optionally, you can sample frames (1 in ‘n’), in order to reduce the amount of labeling work. The extraction tool also builds a manifest file describing sequences and frames. You can learn more about it in the documentation.

The second option requires two steps: extracting frames, and building the manifest file. Extracting frames can easily be performed with the popular ffmpeg open source tool. Here’s how you could convert the first 60 seconds of a video to a frame sequence.

$ ffmpeg -ss 00:00:00.00 -t 00:01:0.00 -i basketball.mp4 frame%04d.jpg

Each frame sequence should be uploaded to S3 under a different prefix, for example s3://my-bucket/my-videos/sequence1, s3://my-bucket/my-videos/sequence2, and so on, as explained in the documentation.

Once you have uploaded your frame sequences, you may then either bring your own JSON files to describe them, or let Ground Truth crawl your sequences and build the JSON files and the manifest file for you automatically. Please note that a video sequence cannot be longer than 2,000 frames, which corresponds to about a minute of video at 30 frames per second.

Each sequence should be described by a simple sequence file:

  • A sequence number, an S3 prefix, and a number of frames.
  • A list of frames: number, file name, and creation timestamp.

Here’s an example of a sequence file.

{"version": "2020-06-01",
"seq-no": 1, "prefix": "s3://jsimon-smgt/videos/basketball", "number-of-frames": 1800, 
	"frames": [
		{"frame-no": 1, "frame": "frame0001.jpg", "unix-timestamp": 1594111541.71155},
		{"frame-no": 2, "frame": "frame0002.jpg", "unix-timestamp": 1594111541.711552},
		{"frame-no": 3, "frame": "frame0003.jpg", "unix-timestamp": 1594111541.711553},
		{"frame-no": 4, "frame": "frame0004.jpg", "unix-timestamp": 1594111541.711555},
. . .

Finally, the manifest file should point at the sequence files you’d like to include in the labeling job. Here’s an example.

{"source-ref": "s3://jsimon-smgt/videos/seq1.json"}
{"source-ref": "s3://jsimon-smgt/videos/seq2.json"}
. . .

Just like for other task types, the augmented manifest is available in S3 once labeling is complete. It contains annotations and labels, which you can then feed to your machine learning training job.

Labeling Videos with Amazon SageMaker Ground Truth
Here’s a sample video where I label the first ten frames of a sequence. You can see a screenshot below.

I first use the Ground Truth GUI to carefully label the first frame, drawing bounding boxes for basketballs and basketball players. Then, I use the “Predict next” assistive labeling tool to predict the location of the boxes in the next nine frames, applying only minor adjustments to some boxes. Although this was my first try, I found the process easy and intuitive. With a little practice, I could certainly go much faster!

Getting Started
Now, it’s your turn. You can start labeling videos with Amazon Sagemaker Ground Truth today in the following regions:

  • US East (N. Virginia), US East (Ohio), US West (Oregon),
  • Canada (Central),
  • Europe (Ireland), Europe (London), Europe (Frankfurt),
  • Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Tokyo).

We’re looking forward to reading your feedback. You can send it through your usual support contacts, or in the AWS Forum for Amazon SageMaker.

– Julien

Find Your Most Expensive Lines of Code – Amazon CodeGuru Is Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/find-your-most-expensive-lines-of-code-amazon-codeguru-is-now-generally-available/

Bringing new applications into production, maintaining their code base as they grow and evolve, and at the same time respond to operational issues, is a challenging task. For this reason, you can find many ideas on how to structure your teams, on which methodologies to apply, and how to safely automate your software delivery pipeline.

At re:Invent last year, we introduced in preview Amazon CodeGuru, a developer tool powered by machine learning that helps you improve your applications and troubleshoot issues with automated code reviews and performance recommendations based on runtime data. During the last few months, many improvements have been launched, including a more cost-effective pricing model, support for Bitbucket repositories, and the ability to start the profiling agent using a command line switch, so that you no longer need to modify the code of your application, or add dependencies, to run the agent.

You can use CodeGuru in two ways:

  • CodeGuru Reviewer uses program analysis and machine learning to detect potential defects that are difficult for developers to find, and recommends fixes in your Java code. The code can be stored in GitHub (now also in GitHub Enterprise), AWS CodeCommit, or Bitbucket repositories. When you submit a pull request on a repository that is associated with CodeGuru Reviewer, it provides recommendations for how to improve your code. Each pull request corresponds to a code review, and each code review can include multiple recommendations that appear as comments on the pull request.
  • CodeGuru Profiler provides interactive visualizations and recommendations that help you fine-tune your application performance and troubleshoot operational issues using runtime data from your live applications. It currently supports applications written in Java virtual machine (JVM) languages such as Java, Scala, Kotlin, Groovy, Jython, JRuby, and Clojure. CodeGuru Profiler can help you find the most expensive lines of code, in terms of CPU usage or introduced latency, and suggest ways you can improve efficiency and remove bottlenecks. You can use CodeGuru Profiler in production, and when you test your application with a meaningful workload, for example in a pre-production environment.

Today, Amazon CodeGuru is generally available with the addition of many new features.

In CodeGuru Reviewer, we included the following:

  • Support for Github Enterprise – You can now scan your pull requests and get recommendations against your source code on Github Enterprise on-premises repositories, together with a description of what’s causing the issue and how to remediate it.
  • New types of recommendations to solve defects and improve your code – For example, checking input validation, to avoid issues that can compromise security and performance, and looking for multiple copies of code that do the same thing.

In CodeGuru Profiler, you can find these new capabilities:

  • Anomaly detection – We automatically detect anomalies in the application profile for those methods that represent the highest proportion of CPU time or latency.
  • Lambda function support – You can now profile AWS Lambda functions just like applications hosted on Amazon Elastic Compute Cloud (EC2) and containerized applications running on Amazon ECS and Amazon Elastic Kubernetes Service, including those using AWS Fargate.
  • Cost of issues in the recommendation report – Recommendations contain actionable resolution steps which explain what the problem is, the CPU impact, and how to fix the issue. To help you better prioritize your activities, you now have an estimation of the savings introduced by applying the recommendation.
  • Color-my-code – In the visualizations, to help you easily find your own code, we are coloring your methods differently from frameworks and other libraries you may use.
  • CloudWatch metrics and alerts – To keep track and monitor efficiency issues that have been discovered.

Let’s see some of these new features at work!

Using CodeGuru Reviewer with a Lambda Function
I create a new repo in my GitHub account, and leave it empty for now. Locally, where I am developing a Lambda function using the Java 11 runtime, I initialize my Git repo and add only the README.md file to the master branch. In this way, I can add all the code as a pull request later and have it go through a code review by CodeGuru.

git init
git add README.md
git commit -m "First commit"

Now, I add the GitHub repo as origin, and push my changes to the new repo:

git remote add origin https://github.com/<my-user-id>/amazon-codeguru-sample-lambda-function.git
git push -u origin master

I associate the repository in the CodeGuru console:

When the repository is associated, I create a new dev branch, add all my local files to it, and push it remotely:

git checkout -b dev
git add .
git commit -m "Code added to the dev branch"
git push --set-upstream origin dev

In the GitHub console, I open a new pull request by comparing changes across the two branches, master and dev. I verify that the pull request is able to merge, then I create it.

Since the repository is associated with CodeGuru, a code review is listed as Pending in the Code reviews section of the CodeGuru console.

After a few minutes, the code review status is Completed, and CodeGuru Reviewer issues a recommendation on the same GitHub page where the pull request was created.

Oops! I am creating the Amazon DynamoDB service object inside the function invocation method. In this way, it cannot be reused across invocations. This is not efficient.

To improve the performance of my Lambda function, I follow the CodeGuru recommendation, and move the declaration of the DynamoDB service object to a static final attribute of the Java application object, so that it is instantiated only once, during function initialization. Then, I follow the link in the recommendation to learn more best practices for working with Lambda functions.

Using CodeGuru Profiler with a Lambda Function
In the CodeGuru console, I create a MyServerlessApp-Development profiling group and select the Lambda compute platform.

Next, I give the AWS Identity and Access Management (IAM) role used by my Lambda function permissions to submit data to this profiling group.

Now, the console is giving me all the info I need to profile my Lambda function. To configure the profiling agent, I use a couple of environment variables:

  • AWS_CODEGURU_PROFILER_GROUP_ARN to specify the ARN of the profiling group to use.
  • AWS_CODEGURU_PROFILER_ENABLED to enable (TRUE) or disable (FALSE) profiling.

I follow the instructions (for Maven and Gradle) to add a dependency, and include the profiling agent in the build. Then, I update the code of the Lambda function to wrap the handler function inside the LambdaProfiler provided by the agent.

To generate some load, I start a few scripts invoking my function using the Amazon API Gateway as trigger. After a few minutes, the profiling group starts to show visualizations describing the runtime behavior of my Lambda function.

For example, I can see how much CPU time is spent in the different methods of my function. At the bottom, there are the entry point methods. As I scroll up, I find methods that are called deeper in the stack trace. I right-click and hide the LambdaRuntimeClient methods to focus on my code. Note that my methods are colored differently than those in the packages I am using, such as the AWS SDK for Java.

I am mostly interested in what happens in the handler method invoked by the Lambda platform. I select the handler method, and now it becomes the new “base” of the visualization.

As I move my pointer on each of my methods, I get more information, including an estimation of the yearly cost of running that specific part of the code in production, based on the load experienced by the profiling agent during the selected time window. In my case, the handler function cost is estimated to be $6. If I select the two main functions above, I have an estimation of $3 each. The cost estimation works for code running on Lambda functions, EC2 instances, and containerized applications.

Similarly, I can visualize Latency, to understand how much time is spent inside the methods in my code. I keep the Lambda function handler method selected to drill down into what is under my control, and see where time is being spent the most.

The CodeGuru Profiler is also providing a recommendation based on the data collected. I am spending too much time (more than 4%) in managing encryption. I can use a more efficient crypto provider, such as the open source Amazon Corretto Crypto Provider, described in this blog post. This should lower the time spent to what is expected, about 1% of my profile.

Finally, I edit the profiling group to enable notifications. In this way, if CodeGuru detects an anomaly in the profile of my application, I am notified in one or more Amazon Simple Notification Service (SNS) topics.

Available Now
Amazon CodeGuru is available today in 10 regions, and we are working to add more regions in the coming months. For regional availability, please see the AWS Region Table.

CodeGuru helps you improve your application code and reduce compute and infrastructure costs with an automated code reviewer and application profiler that provide intelligent recommendations. Using visualizations based on runtime data, you can quickly find the most expensive lines of code of your applications. With CodeGuru, you pay only for what you use. Pricing is based on the lines of code analyzed by CodeGuru Reviewer, and on sampling hours for CodeGuru Profiler.

To learn more, please see the documentation.

Danilo

Amazon EKS Now Supports EC2 Inf1 Instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-eks-now-supports-ec2-inf1-instances/

Amazon Elastic Kubernetes Service (EKS) has quickly become a leading choice for machine learning workloads. It combines the developer agility and the scalability of Kubernetes, with the wide selection of Amazon Elastic Compute Cloud (EC2) instance types available on AWS, such as the C5, P3, and G4 families.

As models become more sophisticated, hardware acceleration is increasingly required to deliver fast predictions at high throughput. Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon Elastic Kubernetes Service, for high performance and the lowest prediction cost in the cloud.

A primer on EC2 Inf1 instances
Inf1 instances were launched at AWS re:Invent 2019. They are powered by AWS Inferentia, a custom chip built from the ground up by AWS to accelerate machine learning inference workloads.

Inf1 instances are available in multiple sizes, with 1, 4, or 16 AWS Inferentia chips, with up to 100 Gbps network bandwidth and up to 19 Gbps EBS bandwidth. An AWS Inferentia chip contains four NeuronCores. Each one implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, saving I/O time in the process. When several AWS Inferentia chips are available on an Inf1 instance, you can partition a model across them and store it entirely in cache memory. Alternatively, to serve multi-model predictions from a single Inf1 instance, you can partition the NeuronCores of an AWS Inferentia chip across several models.

Compiling Models for EC2 Inf1 Instances
To run machine learning models on Inf1 instances, you need to compile them to a hardware-optimized representation using the AWS Neuron SDK. All tools are readily available on the AWS Deep Learning AMI, and you can also install them on your own instances. You’ll find instructions in the Deep Learning AMI documentation, as well as tutorials for TensorFlow, PyTorch, and Apache MXNet in the AWS Neuron SDK repository.

In the demo below, I will show you how to deploy a Neuron-optimized model on an EKS cluster of Inf1 instances, and how to serve predictions with TensorFlow Serving. The model in question is BERT, a state of the art model for natural language processing tasks. This is a huge model with hundreds of millions of parameters, making it a great candidate for hardware acceleration.

Building an EKS Cluster of EC2 Inf1 Instances
First of all, let’s build a cluster with two inf1.2xlarge instances. I can easily do this with eksctl, the command-line tool to provision and manage EKS clusters. You can find installation instructions in the EKS documentation.

Here is the configuration file for my cluster. Eksctl detects that I’m launching a node group with an Inf1 instance type, and will start your worker nodes using the EKS-optimized Accelerated AMI.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: cluster-inf1
  region: us-west-2
nodeGroups:
  - name: ng1-public
    instanceType: inf1.2xlarge
    minSize: 0
    maxSize: 3
    desiredCapacity: 2
    ssh:
      allow: true

Then, I use eksctl to create the cluster. This process will take approximately 10 minutes.

$ eksctl create cluster -f inf1-cluster.yaml

Eksctl automatically installs the Neuron device plugin in your cluster. This plugin advertises Neuron devices to the Kubernetes scheduler, which can be requested by containers in a deployment spec. I can check with kubectl that the device plug-in container is running fine on both Inf1 instances.

$ kubectl get pods -n kube-system
NAME                                  READY STATUS  RESTARTS AGE
aws-node-tl5xv                        1/1   Running 0        14h
aws-node-wk6qm                        1/1   Running 0        14h
coredns-86d5cbb4bd-4fxrh              1/1   Running 0        14h
coredns-86d5cbb4bd-sts7g              1/1   Running 0        14h
kube-proxy-7px8d                      1/1   Running 0        14h
kube-proxy-zqvtc                      1/1   Running 0        14h
neuron-device-plugin-daemonset-888j4  1/1   Running 0        14h
neuron-device-plugin-daemonset-tq9kc  1/1   Running 0        14h

Next, I define AWS credentials in a Kubernetes secret. They will allow me to grab my BERT model stored in S3. Please note that both keys needs to be base64-encoded.

apiVersion: v1 
kind: Secret 
metadata: 
  name: aws-s3-secret 
type: Opaque 
data: 
  AWS_ACCESS_KEY_ID: <base64-encoded value> 
  AWS_SECRET_ACCESS_KEY: <base64-encoded value>

Finally, I store these credentials on the cluster.

$ kubectl apply -f secret.yaml

The cluster is correctly set up. Now, let’s build an application container storing a Neuron-enabled version of TensorFlow Serving.

Building an Application Container for TensorFlow Serving
The Dockerfile is very simple. We start from an Amazon Linux 2 base image. Then, we install the AWS CLI, and the TensorFlow Serving package available in the Neuron repository.

FROM amazonlinux:2
RUN yum install -y awscli
RUN echo $'[neuron] \n\
name=Neuron YUM Repository \n\
baseurl=https://yum.repos.neuron.amazonaws.com \n\
enabled=1' > /etc/yum.repos.d/neuron.repo
RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
RUN yum install -y tensorflow-model-server-neuron

I build the image, create an Amazon Elastic Container Registry repository, and push the image to it.

$ docker build . -f Dockerfile -t tensorflow-model-server-neuron
$ docker tag IMAGE_NAME 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo
$ aws ecr create-repository --repository-name inf1-demo
$ docker push 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo

Our application container is ready. Now, let’s define a Kubernetes service that will use this container to serve BERT predictions. I’m using a model that has already been compiled with the Neuron SDK. You can compile your own using the instructions available in the Neuron SDK repository.

Deploying BERT as a Kubernetes Service
The deployment manages two containers: the Neuron runtime container, and my application container. The Neuron runtime runs as a sidecar container image, and is used to interact with the AWS Inferentia chips. At startup, the latter configures the AWS CLI with the appropriate security credentials. Then, it fetches the BERT model from S3. Finally, it launches TensorFlow Serving, loading the BERT model and waiting for prediction requests. For this purpose, the HTTP and grpc ports are open. Here is the full manifest.

kind: Service
apiVersion: v1
metadata:
  name: eks-neuron-test
  labels:
    app: eks-neuron-test
spec:
  ports:
  - name: http-tf-serving
    port: 8500
    targetPort: 8500
  - name: grpc-tf-serving
    port: 9000
    targetPort: 9000
  selector:
    app: eks-neuron-test
    role: master
  type: ClusterIP
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: eks-neuron-test
  labels:
    app: eks-neuron-test
    role: master
spec:
  replicas: 2
  selector:
    matchLabels:
      app: eks-neuron-test
      role: master
  template:
    metadata:
      labels:
        app: eks-neuron-test
        role: master
    spec:
      volumes:
        - name: sock
          emptyDir: {}
      containers:
      - name: eks-neuron-test
        image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo:latest
        command: ["/bin/sh","-c"]
        args:
          - "mkdir ~/.aws/ && \
           echo '[eks-test-profile]' > ~/.aws/credentials && \
           echo AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID >> ~/.aws/credentials && \
           echo AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY >> ~/.aws/credentials; \
           /usr/bin/aws --profile eks-test-profile s3 sync s3://jsimon-inf1-demo/bert /tmp/bert && \
           /usr/local/bin/tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp/bert/"
        ports:
        - containerPort: 8500
        - containerPort: 9000
        imagePullPolicy: Always
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: aws-s3-secret
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: aws-s3-secret
        - name: NEURON_RTD_ADDRESS
          value: unix:/sock/neuron.sock

        resources:
          limits:
            cpu: 4
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 1Gi
        volumeMounts:
          - name: sock
            mountPath: /sock

      - name: neuron-rtd
        image: 790709498068.dkr.ecr.us-west-2.amazonaws.com/neuron-rtd:1.0.6905.0
        securityContext:
          capabilities:
            add:
            - SYS_ADMIN
            - IPC_LOCK

        volumeMounts:
          - name: sock
            mountPath: /sock
        resources:
          limits:
            hugepages-2Mi: 256Mi
            aws.amazon.com/neuron: 1
          requests:
            memory: 1024Mi

I use kubectl to create the service.

$ kubectl create -f bert_service.yml

A few seconds later, the pods are up and running.

$ kubectl get pods
NAME                           READY STATUS  RESTARTS AGE
eks-neuron-test-5d59b55986-7kdml 2/2   Running 0        14h
eks-neuron-test-5d59b55986-gljlq 2/2   Running 0        14h

Finally, I redirect service port 9000 to local port 9000, to let my prediction client connect locally.

$ kubectl port-forward svc/eks-neuron-test 9000:9000 &

Now, everything is ready for prediction, so let’s invoke the model.

Predicting with BERT on EKS and Inf1
The inner workings of BERT are beyond the scope of this post. This particular model expects a sequence of 128 tokens, encoding the words of two sentences we’d like to compare for semantic equivalence.

Here, I’m only interested in measuring prediction latency, so dummy data is fine. I build 100 prediction requests storing a sequence of 128 zeros. I send them to the TensorFlow Serving endpoint via grpc, and I compute the average prediction time.

import numpy as np
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import time

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:9000')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'bert_mrpc_hc_gelus_b4_l24_0926_02'
    i = np.zeros([1, 128], dtype=np.int32)
    request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))

    latencies = []
    for i in range(100):
        start = time.time()
        result = stub.Predict(request)
        latencies.append(time.time() - start)
        print("Inference successful: {}".format(i))
    print ("Ran {} inferences successfully. Latency average = {}".format(len(latencies), np.average(latencies)))

On average, prediction took 5.92ms. As far as BERT goes, this is pretty good!

Ran 100 inferences successfully. Latency average = 0.05920819044113159

In real-life, we would certainly be batching prediction requests in order to increase throughput. If needed, we could also scale to larger Inf1 instances supporting several Inferentia chips, and deliver even more prediction performance at low cost.

Getting Started
Kubernetes users can deploy Amazon Elastic Compute Cloud (EC2) Inf1 instances on Amazon Elastic Kubernetes Service today in the US East (N. Virginia) and US West (Oregon) regions. As Inf1 deployment progresses, you’ll be able to use them with Amazon Elastic Kubernetes Service in more regions.

Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for Amazon Elastic Kubernetes Service, or on the container roadmap on Github.

– Julien