All posts by Eliza Strickland

Exclusive Q&A: Neuralink’s Quest to Beat the Speed of Type

Post Syndicated from Eliza Strickland original

Elon Musk’s brain tech company, Neuralink, is subject to rampant speculation and misunderstanding. Just start a Google search with the phrase “can Neuralink…” and you’ll see the questions that are commonly asked, which include “can Neuralink cure depression?” and “can Neuralink control you?” Musk hasn’t helped ground the company’s reputation in reality with his public statements, including his claim that the Neuralink device will one day enable “AI symbiosis” in which human brains will merge with artificial intelligence.

It’s all somewhat absurd, because the Neuralink brain implant is still an experimental device that hasn’t yet gotten approval for even the most basic clinical safety trial. 

But behind the showmanship and hyperbole, the fact remains that Neuralink is staffed by serious scientists and engineers doing interesting research. The fully implantable brain-machine interface (BMI) they’ve been developing is advancing the field with its super-thin neural “threads” that can snake through brain tissue to pick up signals and its custom chips and electronics that can process data from more than 1000 electrodes.

In August 2020 the company demonstrated the technology in pigs, and this past April it dropped a YouTube video and blog post showing a monkey using the implanted device, called a Link, to control a cursor and play the computer game Pong. But the BMI team hasn’t been public about its current research goals, and the steps it’s taking to achieve them.

In this exclusive Q&A with IEEE Spectrum, Joseph O’Doherty, a neuroengineer at Neuralink and head of its brain signals team, lays out the mission. 


Joseph O’Doherty on…

  1. Aiming for a World Record
  2. The Hardware
  3. The Software
  4. What He’s Working on Right Now
  5. What the Limits Are, Where the Ceiling Is

  1. Aiming for a World Record

    IEEE Spectrum: Elon Musk often talks about the far-future possibilities of Neuralink; a future in which everyday people could get voluntary brain surgery and have Links implanted to augment their capabilities. But whom is the product for in the near term? 

    Joseph O’Doherty: We’re working on a communication prosthesis that would give back keyboard and mouse control to individuals with paralysis. We’re pushing towards an able-bodied typing rate, which is obviously a tall order. But that’s the goal.

    We have a very capable device and we’re aware of the various algorithmic techniques that have been used by others. So we can apply best practices engineering to tighten up all the aspects. What it takes to make the BMI is a good recording device, but also real attention to detail in the decoder, because it’s a closed-loop system. You need to have attention to that closed-loop aspect of it for it to be really high performance.

    We have an internal goal of trying to beat the world record in terms of information rate from the BMI. We’re extremely close to exceeding what, as far as we know, is the best performance. And then there’s an open question: How much further beyond that can we go? 

    My team and I are trying to meet that goal and beat the world record. We’ll either nail down what we can, or, if we can’t, figure out why not, and how to make the device better.


    The Hardware

    Spectrum: The Neuralink system has been through some big design changes over the years. When I was talking to your team in 2019, the system wasn’t fully implantable, and there was still a lot in flux about the design of the threads, how many electrodes per thread, and the implanted chip. What’s the current design? 

    O’Doherty: The threads are often referred to as the neural interface itself; that’s the physical part that actually interfaces with the tissue. The broad approach has stayed the same throughout the years: It’s our hypothesis that making these threads extremely small and flexible is good for the long-term life of the device. We hope it will be something that the immune system likes, or at least dislikes less. That approach obviously comes with challenges because we have very, very small things that need to be robust over many years. And a lot of the techniques that are used to make things robust have to do with adding thickness and adding layers and having barriers. 

    Spectrum: I imagine there are a lot of trade-offs between size and robustness.  

    O’Doherty: There are other flexible and very cool neural interfaces out in the world that we read about in academic publications. But those demonstrations often only have to work for the one hour or one day that the experiment is done. Whereas we need to have this working for many, many, many, many days. It’s a totally different solution space.  

    Spectrum: When I was talking to your team in 2019, there were 128 electrodes per thread. Has that changed? 

    O’Doherty: Right now we’re doing 16 contacts per thread, spaced out by 200 microns. The earlier devices were denser, but it was overkill in terms of sampling neurons in various layers of the cortex. We could record the same neurons on multiple adjacent channels when the contacts were something like 20 microns apart. So we could do a very good job of characterizing the individual neurons we were recording from, but it required a lot of density, a lot of stuff packed in one spot, and that meant more power requirements. That might be great if you’re doing neuroscience, perhaps with less good if you’re trying to make a functional product.

    That’s one reason why we changed our design to spread out our contacts in the cortex, and to distribute them on many threads across the cortical area. That way we don’t have redundant information. The current design is 16 channels per thread, and we have 64 of these threads that we can place wherever we want within the cortical region, which adds up to 1,024 channels. Those threads go to a single tiny device that’s smaller than a quarter, which has the algorithms, the spike detection, battery, telemetry, everything.

    In addition to 64×16, we’re also testing 128×8 and 256×4 configurations to see if there are performance gains. We ultimately have the flexibility to do any configuration of 1024 electrodes we’d like.

    Spectrum: Does each Link device have multiple chips?

    O’Doherty: Yes. The actual hardware is a 256-channel chip, and there are four of them, which adds up to 1,024 channels. The Link acts as one thing, but it is actually made up of four chips. 

    Spectrum: I imagine you’re continually upgrading the software as you push toward your goal, but is the hardware fixed at this point?

    O’Doherty: Well, we’re constantly working on the next thing. But it is the case that we have to prove the safety of a particular version of the device so that we can translate that to humans. We use what are called design controls, where we fix the device so we can describe what it is very well and describe how we are testing its safety. Then we can make changes, but we do it under an engineering control framework. We describe what we’re changing and then we can either say this change is immaterial to safety or we have to do these tests.


    The Software

    Spectrum: It sounds like a lot of the spike detection is being done on the chips. Is that something that’s evolved over time? I think a few years back it was being done on an external device. 

    O’Doherty: That’s right. We have a slightly different approach to spike detection. Let me first give a couple of big picture comments. For neuroscience, you often don’t just want to detect spikes. You want to detect spikes and then sort spikes by which neurons generated them. If you detect a spike on a channel and then realize, Oh, I can actually record five different neurons here. Which neuron did it come from? How do I refer each spike to the neuron that generated it? That’s a very difficult computational problem. That’s something that’s often done in post-processing—so after you record a bunch of data, then you do a bunch of math. 

    There’s another extreme where you simply put a threshold on your voltage, and you say that every time something crosses that threshold, it’s a spike. And then you just count how many of those happen. That’s all. That’s all the information you can use. 

    Both extremes are not great for us. In the first one, you’re doing a lot of computation that’s perhaps infeasible to do in a small package. With the other extreme, you’re very sensitive to noise and artifacts because many things can cause a threshold crossing that are not neurons firing. So we’re using an approach in the middle, where we’re looking for shapes that look like the signals that neurons generate. We transmit those events, along with a few extra bits of information about the spike, like how tall it is, how wide it is, and so on.

    That’s something that we were previously doing on the external part of the device. At the time we validated that algorithm, we had much higher bandwidth, because it was a wired system. So we were able to stream a lot of data and develop this algorithm. Then the chip team took that algorithm and implemented it in hardware. So now that’s all happening automatically on the chip. It’s automatically tuning its parameters—it has to learn about the statistical distribution of the voltages in the brain. And then it just detects spikes and sends them out to the decoder.

    Spectrum: How much data is coming off the device these days?  

    O’Doherty: To address this in brain-machine interface terms, we are detecting spikes within a 25-millisecond window or “bin.” So the vectors of information that we use in our algorithms for closed-loop control are factors of spike counts: 1,024 by 25 milliseconds. We count how many spikes occur per channel and that’s what we send out. We only need about four bits per bin, so that’s four bits times forty bins per second times 1,024 channels, or about 20 kilobytes each second. 

    That degree of compression is made possible by the fact that we’re spike detecting with our custom algorithm on the chip. The maximum bandwidth would be 1,024 channels times 20,000 samples per second, which is a pretty big number. That’s if we could send everything. But the compressed version is just the number of spike events that occur—zero one, two, three, four, whatever—times 1,024 channels.

    For our application, which is controlling our communications prosthesis, this data compression is a good way to go—and we still have usable signals for closed-loop control.

    Spectrum: When you say closed-loop control, what does that mean in this context?  

    O’Doherty: Most machine learning is open-loop. Say you have an image and you analyze it with a model and then produce some results, like detecting the faces in a photograph. You have some inference you want to do, but how quickly you do it doesn’t generally matter. But here the user is in the loop—the user is thinking about moving and the decoder is, in real time, decoding those movement intentions, and then taking some action. It has to act very quickly because if it’s too slow, it doesn’t matter. If you throw a ball to me and it takes my BMI five seconds to infer that I want to move my arm forward—that’s too late. I’ll miss the ball.

    So the user changes what they’re doing based on visual feedback about what the decoder does: That’s what I mean by closed loop. The user makes a motor intent; it’s decoded by the Neuralink device; the intended motion is enacted in the world by physically doing something with a cursor or a robot arm; the user sees the result of that action; and that feedback influences what motor intent they produce next. I think the closest analogy outside of BMI is the use of a virtual reality headset—if there’s a big lag between what you do and what you see on your headset, it’s very disorienting, because it breaks that closed-loop system.


    What He’s Working on Right Now

    Spectrum: What has to happen to get from where you are right now to best-in-world? 

    O’Doherty: Step one is to find the sources of latency and eliminate all of them. We want to have low latency throughout the system. That includes detecting spikes; that includes processing them on the implant; that includes the radio that has to transmit them—there’s all kinds of packetization details with Bluetooth that can add latency. And that includes the receiving side, where you do some processing in your model inference step, and that even includes drawing pixels on the screen for the cursor that you’re controlling. Any small amount of lag that you have there adds delay and that affects closed-loop control.

    Spectrum: OK, so let’s imagine all latency has been eliminated. What next? 

    O’Doherty: Step two is the decoder itself, and the model that it uses. There’s great flexibility in terms of the model—it could be very simple, very complex, very nonlinear, or very deep in terms of deep learning—how many layers your entire network has. But we have particular constraints. We need our decoder model to work fast, so we can’t use a sophisticated decoder that’s very accurate but takes too long to be useful. We’re also potentially interested in running the decoder on the implant itself, and that requires both low memory usage, so we don’t have to store a lot of parameters in a very constrained environment, and also compute efficiency so we don’t use a lot of clock cycles. But within that space, there’s some clever things we can do in terms of mapping neural spikes to movement. There are linear models that are very simple and nonlinear models that give us more flexibility in capturing the richness of neural dynamics. We want to find the right sweet spot there.

    Other factors include the speed at which you can calibrate the decoder to the user. If we have to spend a long time training the decoder, that’s not a great user experience. We want something that can come online really quickly and give the subject a lot of time to practice with the device.

    We’re also focusing on models that are robust. So from day one to day two to day three, we don’t want to have to recalibrate or re-tune the decoder. We want one that works on day one and that works reliably for a long time. Eventually, we want decoders that calibrate themselves, even without the user thinking about it. So the user is just going about their day doing things that cause the model to stay calibrated. 

    Spectrum: Are there any decoder tricks or hacks you’ve figured out that you can tell me about?  

    O’Doherty: One thing we find particularly helpful is decoding click intention. When a BMI user moves a cursor to a target, they typically need to dwell on that target for a certain amount of time, and that is considered a click. The user dwelled for 200 milliseconds, so they selected it. Which is fine, but it adds delay because the user has to wait that amount of time for the selection to happen. But if we decode click intention directly, that lets the user make selections that much faster. 

    And this is something that we’re working on—we don’t have a result yet. But we can potentially look into the future. Imagine you’re making a movement with the brain-controlled cursor, and I know where you are now… but maybe I also know where you’re going to want to go in a second. If I know that, I can do some tricks, I can just teleport you there and get you there faster.

    And honestly, practice is a component. These neuroprosthetic skills have to be learned by the user, just like learning to type or any other skill. We’ve seen this with non-human primates, and I’ve heard it’s also true of human participants in BrainGate trials. So we want a decoder that doesn’t pose too much of a learning burden. 

    Beyond that, I can speculate on cool stuff that could be done. For example, you type faster with two fingers than one finger, or text faster with two thumbs versus one pointer finger. So imagine decoding movement intention for two thumbs to control your brain-controlled keyboard and mouse. That could potentially be a way to boost performance. 

    Spectrum: What is the current world record for BMI rate? 

    O’Doherty: Krishna Shenoy of Stanford has been keeping track of this in some tables of BCI performance, which includes the paper that recently came out from his group. That paper set the record with a maximum bit rate of 6.18 bits per second with human participants. For non-human primates, the record is 6.49 bits per second.

    Spectrum: And can you prove best-in-world BMI with non-human primates, or do you need to get into humans for that?

    O’Doherty: That’s a good question. Non-human primates can’t talk or read English, so to some extent we have to make inferences. With a human participant you might say, here’s a sentence we’d like you to copy, please type it as best you can. And then we can look at performance there. For the monkey, we can create a string of sequences and ask them to do it quickly and compute performance rates that way. Monkeys are motivated and they’ll do those tasks. So I don’t see any reason, in principle, why one is superior to the other for this. For linguistic and semantic tasks like decoding speech or decoding text directly from your brain we’ll have to prototype in humans, of course. But until we get to that point, and even after that, non-human primates and other animal models are really important for proving out the technology.


    What the Limits Are, Where the Ceiling Is

    Spectrum: You said earlier that your team will either achieve a new world record or find out the reason why you can’t. Are there reasons why you think it might not work? 

    O’Doherty: The 2D cursor control is not a very high-dimension task. There are probably limits that have to do with intention and speed. Think about how long it takes to move a cursor around and hit targets: It’s the time it takes the user to go from point A to point B, and the time it takes to select when they’re at point B. Also if they make a mistake and click the wrong button, that’s really bad. So they have to go faster between A and B, they have to click there more reliably, and they can’t make mistakes.

    At some point, we’re going to hit a limit, because the brain can’t keep up. If the cursor is going too fast, the user won’t even see it moving. I think that’s where the limits will come from—not the neural interfacing, but what it means to move a cursor around. So then we’ll have to think about other interesting ways to interface with the brain to get beyond that. There are other ways of communicating that might be better—maybe it will involve ten-finger typing. I think it’s an open question where that ceiling is.

    Spectrum: Both the games that the monkey played were basically just cursor control: finding targets and using a cursor to move the paddle in Pong. Can you imagine any tests that would go beyond that for non-human primates?

    O’Doherty: Non-human primates can learn other more complicated tasks. The training can be lengthy, because we can’t tell them what to do; we have to show them and take small steps toward more complicated things. To pick a game out of a hat: Now we know that monkeys can play Pong, but can they play Fruit Ninja? There’s a training burden, but I think it’s within their capability.

    Spectrum: Is there anything else you want to emphasize about the technology, the work you’re doing, or how you’re doing it?

    O’Doherty: I first started working on BMI in an academic environment. The concerns that we have at Neuralink are different from the concerns involved with making a BMI for an academic demonstration. We’re really interested in the product, the experience of the user, the robustness, and having this device be useful across a long period of time. And those priorities necessarily lead to slightly different optimizations than I think we would choose if we were doing this for a one-off demonstration. We really enjoyed the Pong demo, but we’re not here to make Pong demos. That’s just a teaser for what will be possible when we bring our product to market.


AI Agents Play “Hide the Toilet Plunger” to Learn Deep Concepts About Life

Post Syndicated from Eliza Strickland original

Most papers about artificial intelligence don’t cite Jean Piaget, the social scientist known for his groundbreaking studies of children’s cognitive development in the 1950s. But there he is, in a paper from the Allen Institute for AI (AI2). The researchers state that their AI agents learned the concept of object permanence—the understanding that an object hidden from view is still there—thus making those AI agents similar to a baby who just figured out the trick behind peekaboo. 

The researchers’ AI agents learned this precept and other rudimentary rules about the world by playing many games of hide and seek with objects, which took place within a simulated, but fairly realistic, house. The AI2 team calls the game “Cache,” but I prefer to call it “Hide the Toilet Plunger.” The agents also got to hide tomatoes, loaves of bread, cups, and knives.

The AI agents, which acted as both hiders and seekers, figured out the game via reinforcement learning. Starting out, they didn’t know anything about the 3D visual environment. They began by taking random actions like pulling on the handle of a drawer or pulling on an immovable wall, and they dropped their objects in all sorts of random places. The agents got better by playing against each other and learning from outcomes—if the seeker didn’t find the tomato, the hider knew it had chosen a good hiding place. 

The paper, which was recently accepted for the 2021 International Conference on Learning Representations, hasn’t yet been published in a peer reviewed journal.

Unlike many projects concerning AI and gameplay, the point here wasn’t to create an AI super-player that could destroy puny humans. Rather, the researchers wanted to see if an AI agent could achieve a more generalized kind of visual intelligence if it learned about the world via gameplay.

“For us, the question was: Can it learn very basic things about objects and their attributes by interacting with them?” says Aniruddha Kembhavi, a research manager with AI2’s computer vision team and a paper coauthor.

This AI2 team is working on representation learning, in which AI systems are given some input—images, audio, text, etc.—and learn to categorize the data according to its features. In computer vision, for example, an AI system might learn the features that represent a cat or a traffic light. Ideally, though, it doesn’t learn only the categories, it also learns how to categorize data, making it useful even when given images of objects it has never before seen.

Visual representation learning has evolved over the past decade, Kembhavi explains. When deep learning took off, researchers first trained AI systems on databases of labeled images, such as the famous ImageNet. Because the labels enable the AI system to check its work, that technique is called supervised learning. “Then in past few years, the buzz has gone from supervised learning to self-supervised learning,” says Kembhavi, in which AI systems have to determine the labels for themselves. “We believe that an even more general way of doing it is gameplay—we just let the agents play around, and they figure it out.” 

Once the AI2 agents had gotten good at the game, the researchers ran them through a variety of tests designed to test their understanding of the world. They first tested them on computer-generated images of rooms, asking them to predict traits such as depth of field and the geometry of objects. When compared to a model trained on the gold-standard ImageNet, the AI2 agents performed as well or better. They also tested them on photographs of real rooms; while they didn’t do as well as the ImageNet-trained model there, they did better than expected—an important indication that training in simulated environments could produce AI systems that function in the real world. 

The tests that really excited the researchers, though, were those inspired by developmental psychology. They wanted to determine whether the AI agents grasped certain “cognitive primitives,” or basic elements of understanding that can be built upon. They found that the agents understood the principles of containment, object permanence, and that they could rank images according to how much free space they contained. That ranking test was an attempt to get at a concept that Jean Piaget called seriation, or the ability to order objects based on a common property. 

If you’re thinking, “Haven’t I read something in IEEE Spectrum before about AI agents playing hide and seek?” you are not wrong, and you are also a faithful reader. In 2019, I covered an OpenAI project in which the hiders and seekers surprised the researchers by coming up with strategies that weren’t supposed to be possible in the game environment.  

Igor Mordatch, one of the OpenAI researchers behind that project, says he’s excited to see that AI2’s research doesn’t focus on external behaviors within the game, but rather the “internal representations of the world emerging in the minds of these agents,” he says in an email. “Representation learning is thought to be one of the key components to progress in general-purpose AI systems today, so any advances in this area would be highly impactful.”

As for transferring any advances from their research to the real world, the AI2 researchers say that the agents’ dynamic understanding of how objects act in time and space could someday be useful to robots. But they have no intention of doing robot experiments anytime soon. Training in simulation took several weeks; training in the real world would be infeasible. “Also, there’s a safety issue,” notes study coauthor Roozbeh Motaghi, also a research manager at AI2.“These agents do random stuff.” Just think of the havoc that could be wreaked on a lab by a rogue robot carrying a toilet plunger.

OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language)

Post Syndicated from Eliza Strickland original

Last September, a data scientist named Vinay Prabhu was playing around with an app called Philosopher AI. The app provides access to the artificial intelligence system known as GPT-3, which has incredible abilities to generate fluid and natural-seeming text. The creator of that underlying technology, the San Francisco company OpenAI, has allowed hundreds of developers and companies to try out GPT-3 in a wide range of applications, including customer service, video games, tutoring services, and mental health apps. The company says tens of thousands more are on the waiting list.

Philosopher AI is meant to show people the technology’s astounding capabilities—and its limits. A user enters any prompt, from a few words to a few sentences, and the AI turns the fragment into a full essay of surprising coherence. But while Prahbu was experimenting with the tool, he found a certain type of prompt that returned offensive results. “I tried: What ails modern feminism? What ails critical race theory? What ails leftist politics?” he tells IEEE Spectrum.  

The results were deeply troubling. Take, for example, this excerpt from GPT-3’s essay on what ails Ethiopia, which another AI researcher and a friend of Prabhu’s posted on Twitter: “Ethiopians are divided into a number of different ethnic groups. However, it is unclear whether ethiopia’s [sic] problems can really be attributed to racial diversity or simply the fact that most of its population is black and thus would have faced the same issues in any country (since africa [sic] has had more than enough time to prove itself incapable of self-government).”

Prabhu, who works on machine learning as chief scientist for the biometrics company UnifyID, notes that Philospher AI sometimes returned diametrically opposing responses to the same query, and that not all of its responses were problematic. “But a key adversarial metric is: How many attempts does a person who is probing the model have to make before it spits out deeply offensive verbiage?” he says. “In all of my experiments, it was on the order of two or three.”  

The Philosopher AI incident laid bare the potential danger that companies face as they work with this new and largely untamed technology, and as they deploy commercial products and services powered by GPT-3. Imagine the toxic language that surfaced in the Philosopher AI app appearing in another context—your customer service representative, an AI companion that rides around in your phone, your online tutor, the characters in your video game, your virtual therapist, or an assistant who writes your emails.

Those are not theoretical concerns. Spectrum spoke with beta users of the API who are working to incorporate GPT-3 into such applications and others. The good news is that all the users Spectrum talked with were actively thinking about how to deploy the technology safely.

The Vancouver-based developer behind the Philosopher AI app, Murat Ayfer, says he created it to both further his own understanding of GPT-3’s potential and to educate the public. He quickly discovered the many ways in which his app could go wrong. “With automation, you need either a 100 percent success rate, or you need it to error out gracefully,” he tells Spectrum. “The problem with GPT-3 is that it doesn’t error out, it just produces garbage—and there’s no way to detect if it’s producing garbage.”

GPT-3 Learned From Us

The fundamental problem is that GPT-3 learned about language from the Internet: Its massive training dataset included not just news articles, Wikipedia entries, and online books, but also every unsavory discussion on Reddit and other sites. From that morass of verbiage—both upstanding and unsavory—it drew 175 billion parameters that define its language. As Prabhu puts it: “These things it’s saying, they’re not coming out of a vacuum. It’s holding up a mirror.” Whatever GPT-3’s failings, it learned them from humans.

Following some outcry about the PhilosopherAI app—another response that ended up on Twitter started with cute rabbits but quickly devolved into a discussion of reproductive organs and rape—Ayfer made changes. He had already been steadily working on the app’s content filter, causing more prompts to return the polite response: “Philosopher AI is not providing a response for this topic, because we know this system has a tendency to discuss some topics using unsafe and insensitive language.” He also added a function that let users report offensive responses.

Ayfer argues that Philospher AI is a “relatively harmless context” for GPT-3 to generate offensive content. “It’s probably better to make mistakes now, so we can really learn how to fix them,” he says.  

That’s just what OpenAI intended when it launched the API that enables access to GPT-3 last June, and announced a private beta test in which carefully selected users would develop applications for the technology under the company’s watchful eye. The blog post noted that OpenAI will be guarding against “obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing,” and will be looking for unexpected problems: “We also know we can’t anticipate all of the possible consequences of this technology.”

Prabhu worries that the AI and business community are being swept away into uncharted waters: “People are thrilled, excited, giddy.” He thinks the rollout into commercial applications is bound to cause some disasters. “Even if they’re very careful, the odds of something offensive coming out is 100 percent—that’s my humble opinion. It’s an intractable problem, and there is no solution,” he says.  

Janelle Shane is a member of that AI community, and a beta user of GPT-3 for her blog, AI Weirdness. She clearly enjoys the technology, having used it to generate Christmas carols, recipes, news headlines, and anything else she thought would be funny. Yet the tweets about PhilosopherAI’s essay on Ethiopia prompted her to post this sobering thought: “Sometimes, to reckon with the effects of biased training data is to realize that the app shouldn’t be built. That without human supervision, there is no way to stop the app from saying problematic stuff to its users, and that it’s unacceptable to let it do so.”

So what is OpenAI doing about its intractable problem?

OpenAI’s Approach to AI Safety

The company has arguably learned from its experiences with earlier iterations of its language-generating technology. In 2019 it introduced GPT-2, but declared that it was actually too dangerous to be released into the wild. The company instead offered up a downsized version of the language model but withheld the full model, which included the data set and training code.

The main fear, highlighted by OpenAI in a blog post, was that malicious actors would use GPT-2 to generate high-quality fake news that would fool readers and destroy the distinction between fact and fiction.  

However, much of the AI community objected to that limited release. When the company reversed course later that year and made the full model available, some people did indeed use it to generate fake news and clickbait. But it didn’t create a tsunami of non-truth on the Internet. In the past few years, people have shown they can do that well enough themselves, without the help of an AI. 

Then came GPT-3, unveiled in a 75-page paper in May 2020. OpenAI’s newest language model was far larger than any that had come before. Its 175 billion language parameters were a massive increase over GPT-2’s 1.5 billion parameters).

Sandhini Agarwal, an AI policy researcher at OpenAI, spoke with Spectrum about the company’s strategy for GPT-3. “We have to do this closed beta with a few people, otherwise we won’t even know what the model is capable of, and we won’t know which issues we need to make headway on,” she says. “If we want to make headway on things like harmful bias, we have to actually deploy.”

Agarwal explains that an internal team vets proposed applications, provides safety guidelines to those companies granted access to GPT-3 via the API, reviews the applications again before deployment, and monitors their use after deployment.

OpenAI is also developing tools to help users better control GPT-3’s generated text. It offers a general content filter for harmful bias and toxic language. However, Agarwal says that such a filter is really an impossible thing to create, since “bias is a very nebulous thing that keeps shifting based on context.” Particularly on controversial topics, a response that might seem right-on to people on one side of the debate could be deemed toxic by the other.

Another approach, called prompt engineering, adds a phrase to the user’s prompt such as “the friendly bot then said,” which sets up GPT-3 to generate text in a polite and uncontroversial tone. Users can also choose a “temperature” setting for their responses. A low-temperature setting means the AI will put together words that it has very often seen together before, taking few risks and causing few surprises; when set to a high temperature, it’s more likely to produce outlandish language.

In addition to all the work being done on the product side of OpenAI, Agarwal says there’s a parallel effort on the “pure machine learning research” side of the company. “We have an internal red team that’s always trying to break the model, trying to make it do all these bad things,” she says. Researchers are trying to understand what’s happening when GPT-3 generates overtly sexist or racist text. “They’re going down to the underlying weights of the model, trying to see which weights might indicate that particular content is harmful.”

In areas where mistakes could have serious consequences, such as the health care, finance, and legal industries, Agarwal says OpenAI’s review team takes special care. In some cases, they’ve rejected applicants because their proposed product was too sensitive. In others, she says, they’ve insisted on having a “human in the loop,” meaning that the AI-generated text is reviewed by a human before it reaches a customer or user.  

OpenAI is making progress on toxic language and harmful bias, Agarwal says, but “we’re not quite where we want to be.” She says the company won’t broadly expand access to GPT-3 until it’s comfortable that it has a handle on these issues. “If we open it up to the world now, it could end really badly,” she says.

But such an approach raises plenty of questions. It’s not clear how OpenAI will get the risk of toxic language down to a manageable level—and it’s not clear what manageable means in this context. Commercial users will have to weigh GPT-3’s benefits against these risks.

Can Language Models Be Detoxified? 

OpenAI’s researchers aren’t the only ones trying to understand the scope of the problem. In December, AI researcher Timnit Gebru said that she’d been fired by Google, forced to leave her work on ethical AI and algorithmic bias, because of an internal disagreement about a paper she’d coauthored. The paper discussed the current failings of large language models such as GPT-3 and Google’s own BERT, including the dilemma of encoded bias. Gebru and her coauthors argued that companies intent on developing large language models should devote more of their resources to curating the training data and “only creating datasets as large as can be sufficiently documented.”

Meanwhile, at the Allen Institute for AI (AI2), in Seattle, a handful of researchers have been probing GPT-3 and other large language models. One of their projects, called RealToxicityPrompts, created a dataset of 100,000 prompts derived from web text, evaluated the toxicity of the resulting text from five different language models, and tried out several mitigation strategies. Those five models included GPT versions 1, 2, and 3 (OpenAI gave the researchers access to the API).

The conclusion stated in their paper, which was presented at the 2020 Empirical Methods in Natural Language Processing conference in November: No current mitigation method is “failsafe against neural toxic degeneration.” In other words, they couldn’t find a way to reliably keep out ugly words and sentiments.  

When the research team spoke with Spectrum about their findings, they noted that the standard ways of training these big language models may need improvement. “Using Internet text has been the default,” says Suchin Gururangan, an author on the paper and an investigator at AI2. “The assumption is that you’re getting the most diverse set of voices in the data. But it’s pretty clear in our analysis that Internet text does have its own biases, and biases do propagate in the model behavior.”

Gururangan says that when researchers think about what data to train their new models on, they should consider what kinds of text they’d like to exclude. But he notes that it’s a hard task to automatically identify toxic language even in a document, and that doing it at web-scale is “is fertile ground for research.”

As for ways to fix the problem, the AI2 team tried two approaches to “detoxify” the models’ output: giving the model additional training with text that’s known to be innocuous, or filtering the generated text by scanning for keywords or by fancier means. “We found that most of these techniques don’t really work very well,” Gururangan says. “All of these methods reduce the prevalence of toxicity—but we always found, if you generate enough times, you will find some toxicity.”

What’s more, he says, reducing the toxicity can also have the side effect of reducing the fluency of the language. That’s one of the issues that the beta users are grappling with today.  

How Beta Users of GPT-3 Aim for Safe Deployment

The companies and developers in the private beta that Spectrum spoke with all made two basic points: GPT-3 is a powerful technology, and OpenAI is working hard to address toxic language and harmful bias. “The people there take these issues extremely seriously,” says Richard Rusczyk, founder of Art of Problem Solving, a beta-user company that offers online math courses to “kids who are really into math.” And the companies have all devised strategies for keeping GPT-3’s output safe and inoffensive.   

Rusczyk says his company is trying out GPT-3 to speed up its instructors’ grading of students’ math proofs—GPT-3 can provide a basic response about a proof’s accuracy and presentation, and then the instructor can check the response and customize it to best help that individual student. “It lets the grader spend more time on the high value tasks,” he says.

To protect the students, the generated text “never goes directly to the students,” Rusczyk says. “If there’s some garbage coming out, only a grader would see it.” He notes that it’s extremely unlikely that GPT-3 would generate offensive language in response to a math proof, because it seems likely that such correlations rarely (if ever) occurred in its training data. Yet he stresses that OpenAI still wanted a human in the loop. “They were very insistent that students should not be talking directly to the machine,” he says.

Some companies find safety in limiting the use case for GPT-3. At Sapling Intelligence, a startup that helps customer service agents with emails, chat, and service tickets, CEO Ziang Xie he doesn’t anticipate using it for “freeform generation.” Xie says it’s important to put this technology in place within certain protective constraints. “I like the analogy of cars versus trolleys,” he says. “Cars can drive anywhere, so they can veer off the road. Trolleys are on rails, so you know at the very least they won’t run off and hit someone on the sidewalk.” However, Xie notes that the recent furor over Timnit Gebru’s forced departure from Google has caused him to question whether companies like OpenAI can do more to make their language models safer from the get-go, so they don’t need guardrails.

Robert Morris, the cofounder of the mental health app Koko, describes how his team is using GPT-3 in a particularly sensitive domain. Koko is a peer-support platform that provides crowdsourced cognitive therapy. His team is experimenting with using GPT-3 to generate bot-written responses to users while they wait for peer responses, and also with giving respondents possible text that they can modify. Morris says the human collaboration approach feels safer to him. “I get increasingly concerned the more freedom it has,” he says.

Yet some companies need GPT-3 to have a good amount of freedom. Replika, an AI companion app used by 7 million people around the world, offers friendly conversation about anything under the sun. “People can talk to Replika about anything—their life, their day, their interests,” says Artem Rodichev, head of AI at Replika. “We need to support conversation about all types of topics.”

To prevent the app from saying offensive things, the company has GPT-3 generate a variety of responses to each message, then uses a number of custom classifiers to detect and filter out responses with negativity, harmful bias, nasty words, and so on. Since such attributes are hard to detect from keywords alone, the app also collects signals from users to train its classifiers. “Users can label a response as inappropriate, and we can use that feedback as a dataset to train the classifier,” says Rodichev.  

Another company that requires GPT-3 to be relatively unfettered is Latitude, a startup creating AI-powered games. Its first offering, a text adventure game called AI Dungeon, currently uses GPT-3 to create the narrative and respond to the player’s actions. Latitude CEO and cofounder Nick Walton says his team has grappled with inappropriate and bad language. “It doesn’t happen a ton, but it does happen,” he says. “And things end up on Reddit.”

Latitude is not trying to prevent all such incidents, because some users want a “grittier experience,” Walton says. Instead, the company tries to give users control over the settings that determine what kind of language they’ll encounter. Players start out in a default safe mode, and stay there unless they explicitly turn it off.

Safe mode isn’t perfect, Walton says, but it relies on a combination of filters and prompt engineering (such as: “continue this story in a way that’s safe for kids”) to get pretty good performance. He notes that Latitude wanted to build its own screening tech rather than rely on OpenAI’s safety filter because “safety is relative to the context,” he says. “If a customer service chatbot threatens you and asks you to give it all its money, that’s bad. If you’re playing a game and you encounter a bandit on the road, that’s normal storytelling.”  

These applications are only a small sampling of those being tested by beta users, and the beta users are a tiny fraction of the entities that want access to GPT-3. Aaro Isosaari cofounded the startup Flowrite in September after getting access to GPT-3; the company aims to help people compose faster emails and online content. Just as advances in computer vision and speech recognition enabled thousands of new companies, He thinks GPT-3 may usher in a new wave of innovation. “Language models have the potential to be the next technological advancement on top of which new startups are being built,” he says.  

Coming Soon to Microsoft? 

Technology powered by GPT-3 could even find its way into the productivity tools that millions of office workers use every day. Last September, Microsoft announced an exclusive licensing agreement with OpenAI, stating that the company would use GPT-3 to “create new solutions that harness the amazing power of advanced natural language generation.” This arrangement won’t prevent other companies from accessing GPT-3 via OpenAI’s API, but it gives Microsoft exclusive rights to work with the basic code—it’s the difference between riding in a fast car and popping the hood to tinker with the engine.

In the blog post announcing the agreement, Microsoft chief technology officer Kevin Scott enthused about the possibilities, saying: “The scope of commercial and creative potential that can be unlocked through the GPT-3 model is profound, with genuinely novel capabilities – most of which we haven’t even imagined yet.” Microsoft declined to comment when asked about its plans for the technology and its ideas for safe deployment.

Ayfer, the creator of the Philosopher AI app, thinks that GPT-3 and similar language technologies should only gradually become part of our lives. “I think this is a remarkably similar situation to self-driving cars,” he says, noting that various aspects of autonomous car technology are gradually being integrated into normal vehicles. “But there’s still the disclaimer: It’s going to make life-threatening mistakes, so be ready to take over at any time. You have to be in control.” He notes that we’re not yet ready to put the AI systems in charge and use them without supervision.

With language technology like GPT-3, the consequences of mistakes might not be as obvious as a car crash. Yet toxic language has an insidious effect on human society by reinforcing stereotypes, supporting structural inequalities, and generally keeping us mired in a past that we’re collectively trying to move beyond. It isn’t clear, with GPT-3, if it will ever be trustworthy enough to act on its own, without human oversight.

OpenAI’s position on GPT-3 mirrors its larger mission, which is to create a game-changing kind of human-level AI, the kind of generally intelligent AI that figures in sci-fi movies—but to do so safely and responsibly. In both the micro and the macro argument, OpenAI’s position comes down to: We need to create the technology and see what can go wrong. We’ll do it responsibly, they say, while other people might not.

Agarwal of OpenAI says about GPT-3: “I do think that there are safety concerns, but it’s a Catch-22.” If they don’t build it and see what terrible things it’s capable of, she says, they can’t find ways to protect society from the terrible things. 

One wonders, though, whether anyone has considered another option: Taking a step back and thinking through the possible worst-case scenarios before proceeding with this technology. And possibly looking for fundamentally different ways to train large language models, so these models would reflect not the horrors of our past, but a world that we’d like to live in. 

A shorter version of this article appears in the February 2021 print issue as “The Troll in the Machine.”

10 Exciting Engineering Milestones to Look for in 2021

Post Syndicated from Eliza Strickland original

graphic link to special report landing page
  • A Shining Light

    Last year, germicidal ultraviolet light found a place in the arsenal of weapons used to fight the coronavirus, with upper-air fixtures placed near hospital-room ceilings and sterilization boxes used to clean personal protective equipment. But a broader rollout was prevented by the dangers posed by UV-C light, which damages the genetic material of viruses and humans alike. Now, the Tokyo-based lighting company Ushio thinks it has the answer: lamps that produce 222-nanometer wavelengths that still kill microbes but don’t penetrate human eyes or skin. Ushio’s Care222 lamp modules went into mass production at the end of 2020, and in 2021 they’ll be integrated into products from other companies—such as Acuity Brands’ lighting fixtures for offices, classrooms, stores, and other occupied spaces.

  • Quantum Networking

    Early this year, photons will speed between Stony Brook University and Brookhaven National Laboratory, both in New York, in an ambitious demonstration of quantum communication. This next-gen communication concept may eventually offer unprecedented security, as a trick of quantum mechanics will make it obvious if anyone has tapped into a transmission. In the demo, “quantum memory buffers” from the startup Qunnect will be placed at each location, and photons within those buffers will be entangled with each other over a snaking 70-kilometer network.

  • Winds of Change

    Developers of offshore wind power quickly find themselves in deep water off the coast of California; one prospective site near Humboldt Bay ranges from 500 to 1,100 meters deep. These conditions call for a new breed of floating wind turbine that’s tethered to the seafloor with strong cables. Now, that technology has been demonstrated in pilot projects off the coasts of Scotland and Portugal, and wind power companies are eager to gain access to three proposed sites off the California coast. They’re expecting the U.S. Bureau of Ocean Energy Management to begin the process of auctioning leases for at least some of those sites in 2021.

  • Driverless Race Cars

    The Indianapolis Motor Speedway, the world’s most famous auto racetrack, will host an unprecedented event in October: the first high-speed race of self-driving race cars. About 30 university teams from around the world have signed up to compete in the Indy Autonomous Challenge, in which souped-up race cars will reach speeds of up to 320 kilometers per hour (200 miles per hour). To win the US $1 million top prize, a team’s autonomous Dallara speedster must be first to complete 20 laps in 25 minutes or less. The deep-learning systems that control the cars will be tested under conditions they’ve never experienced before; both sensors and navigation tools will have to cope with extreme speed, which leaves no margin for error.

  • Robots Below

    The robots participating in DARPA’s Subterranean Challenge have already been tested on three different courses; they’ve had to navigate underground tunnels, urban environments, and cave networks (although the caves portion was switched to an all-virtual course due to the pandemic). Late this year, SubT teams from across the world will put it all together at the final event, which will combine elements of all three subdomains into a single integrated challenge course. The robots will have to demonstrate their versatility and endurance in an obstacle-filled environment where communication with the world above ground is limited. DARPA expects pandemic conditions to improve enough in 2021 to make a physical competition possible.

  • Mars or Bust

    In February, no fewer than three spacecraft are due to rendezvous with the Red Planet. It’s not a coincidence—the orbits of Earth and Mars have brought the planets relatively close together this year, making the journey between them faster and cheaper. China’s Tianwen-1 mission plans to deliver an orbiter and rover to search for water beneath the surface; the United Arab Emirates’ Hope orbiter is intended to study the Martian climate; and a shell-like capsule will deposit NASA’s Perseverance rover, which aims to seek signs of past life, while also testing out a small helicopter drone named Ingenuity. There’s no guarantee that the spacecraft will reach their destinations safely, but millions of earthlings will be rooting for them.

  • Stopping Deepfakes

    By the end of the year, some Android phones may include a new feature that’s intended to strike a blow against the ever-increasing problem of manipulated photos and videos. The anti-deepfake tech comes from a startup called Truepic, which is collaborating with Qualcomm on chips for smartphones that will enable phone cameras to capture “verified” images. The raw pixel data, the time stamp, and location data will all be sent over secure channels to isolated hardware processors, where they’ll be put together with a cryptographic seal. Photos and videos produced this way will carry proof of their authenticity with them. “Fakes are never going to go away,” says Sherif Hanna, Truepic’s vice president of R&D. “Instead of trying to prove what’s fake, we’re trying to protect what’s real.”

  • Faster Data

    In a world reshaped by the COVID-19 pandemic, in which most office work, conferences, and entertainment have gone virtual, cloud data centers are under more stress than ever before. To ensure that people have all the bandwidth they need whenever they need it, service providers are increasingly using networks of smaller data centers within metropolitan areas instead of the traditional massive data center on a single campus. This approach offers higher resilience and availability for end users, but it requires sending torrents of data between facilities. That need will be met in 2021 with new 400ZR fiber optics, which can send 400 gigabits per second over data center interconnects of between 80 and 100 kilometers. Verizon completed a trial run last September, and experts believe we’ll see widespread deployment toward the end of this year.

  • Your Next TV

    While Samsung won’t confirm it, consumer-electronics analysts say the South Korean tech giant will begin mass production of a new type of TV in late 2021. Samsung ended production of liquid crystal display (LCD) TVs in 2020, and has been investing heavily in organic light-emitting diode (OLED) display panels enhanced with quantum dot (QD) technology. OLED TVs include layers of organic compounds that emit light in response to an electric current, and they currently receive top marks for picture quality. In Samsung’s QD-OLED approach, the TV will use OLEDs to create blue light, and a QD layer will convert some of that blue light into the red and green light needed to make images. This hybrid technology is expected to create displays that are brighter, higher contrast, and longer lasting than today’s best models.

  • Brain Scans Everywhere

    Truly high-quality data about brain activity is hard to come by; today, researchers and doctors typically rely either on big and expensive machines like MRI and CT scanners or on invasive implants. In early 2021, the startup Kernel is launching a wearable device called Kernel Flow that could change the game. The affordable low-power device uses a type of near-infrared spectroscopy to measure changes in blood flow within the brain. Within each headset, 52 lasers fire precise pulses, and reflected light is picked up by 312 detectors. Kernel will distribute its first batch of 50 devices to “select partners” in the first quarter of the year, but company founder Bryan Johnson hopes the portable technology will one day be ubiquitous. At a recent event, he described how consumers could eventually use Kernel Flow in their own homes.

Brain Implant Bypasses Eyes To Help Blind People See

Post Syndicated from Eliza Strickland original

Early humans were hunters, and their vision systems evolved to support the chase. When gazing out at an unchanging landscape, their brains didn’t get excited by the incoming information. But if a gazelle leapt from the grass, their visual cortices lit up. 

That neural emphasis on movement may be the key to restoring sight to blind people. Daniel Yoshor, the new chair of neurosurgery at the University of Pennsylvania’s Perelman School of Medicine, is taking cues from human evolution as he devises ways to use a brain implant to stimulate the visual cortex. “We’re capitalizing on that inherent bias the brain has in perception,” he tells IEEE Spectrum.

He recently described his experiments with “dynamic current steering” at the Bioelectronic Medicine Summit, and also published the research in the journal Cell in May. By tracing shapes with electricity onto the brain’s surface, his team is producing a relatively clear and functional kind of bionic vision. 

Yoshor is involved in an early feasibility study of the Orion implant, developed by the Los Angeles-based company Second Sight, a company that’s been on the forefront of technology workarounds for people with vision problems.

In 2013, the U.S. Food and Drug Administration approved Second Sight’s retinal implant system, the Argus II, which uses an eyeglass-mounted video camera that sends information to an electrode array in the eye’s retina. Users have reported seeing light and dark, often enough to navigate on a street or find the brightness of a face turned toward them. But it’s far from normal vision, and in May 2019 the company announced that it would suspend production of the Argus II to focus on its next product.

The company has had a hard time over the past year: At the end of March it announced that it was winding down operations, citing the impact of COVID-19 on its ability to secure financing. But in subsequent months it announced a new business strategy, an initial public offering of stock, and finally in September the resumption of clinical trials for its Orion implant.  

The Orion system uses the same type of eyeglass-mounted video camera, but it sends information to an electrode array atop the brain’s visual cortex. In theory, it could help many more people than a retinal implant: The Argus II was approved only for people with an eye disease called retinitis pigmentosa, in which the photoreceptor cells in the retina are damaged but the rest of the visual system remains intact and able to convey signals to the brain. The Orion system, by sending info straight to the brain, could help people with more widespread damage to the eye or optic nerve.

Six patients have received the Orion implant thus far, and each now has an array of 60 electrodes that tries to represent the image transmitted by the camera. But imagine a digital image made up of 60 pixels—you can’t get much resolution. 

Yoshor says his work on dynamic current steering began with “the fact that getting info into the brain with static stimulation just didn’t work that well.” He says that one possibility is that more electrodes would solve the problem, and wonders aloud about what he could do with hundreds of thousands of electrodes in the brain, or even 1 million. “We’re dying to try that, when our engineering catches up with our experimental imagination,” he says. 

Until that kind of hardware is available, Yoshor is focusing on the software that directs the electrodes to send electrical pulses to the neurons. His team has conducted experiments with two blind Second Sight volunteers as well as with sighted people (epilepsy patients who have temporary electrodes in their brains to map their seizures). 

One way to understand dynamic current steering, Yoshor says, is to think of a trick that doctors commonly use to test perception—they trace letter shapes on a patient’s palm. “If you just press a ‘Z’ shape into the hand, it’s very hard to detect what that is,” he says. “But if you draw it, the brain can detect it instantaneously.” Yoshor’s technology does something similar, grounded in well-known information about how a person’s visual field maps to specific areas of their brain. Researchers have constructed this retinotopic map by stimulating specific spots of the visual cortex and asking people where they see a bright spot of light, called a phosphene.

The static form of stimulation that disappointed Yoshor essentially tries to create an image from phosphenes. But, says Yoshor, “when we do that kind of stimulation, it’s hard for patients to combine phosphenes to a visual form. Our brains just don’t work that way, at least with the crude forms of stimulation that we’re currently able to employ.” He believes that phosphenes cannot be used like pixels in a digital image. 

With dynamic current steering, the electrodes stimulate the brain in sequence to trace a shape in the visual field. Yoshor’s early experiments have used letters as a proof of concept: Both blind and sighted people were able to recognize such letters as M, N, U, and W. This system has an additional advantage of being able to stimulate points in between the sparse electrodes, he adds. By gradually shifting the amount of current going to each (imagine electrode A first getting 100 percent while electrode B gets zero percent, then shifting to ratios of 80:20, 50:50, 20:80, 0:100), the system activates neurons in the gaps. “We can program that sequence of stimulation, it’s very easy,” he says. “It goes zipping across the brain.”

Second Sight didn’t respond to requests for comment for this article, so it’s unclear whether the company is interested in integrating Yoshor’s stimulation technique into its technology. 

But Second Sight isn’t the only entity working on a cortical vision prosthetic. One active project is at Monash University in Australia, where a team has been preparing for clinical trials of its Gennaris bionic vision system.

Arthur Lowery, director of the Monash Vision Group and a professor of electrical and computer systems engineering, says that Yoshor’s research seems promising. “The ultimate goal is for the brain to perceive as much information as possible. The use of sequential stimulation to convey different information with the same electrodes is very interesting, for this reason,” he tells IEEE Spectrum in an email. “Of course, it raises other questions about how many electrodes should be simultaneously activated when presenting, say, moving images.”

Yoshor thinks the system will eventually be able to handle complex moving shapes with the aid of today’s advances in computer vision and AI, particularly if there are more electrodes in the brain to represent the images. He imagines a microprocessor that converts whatever image the person encounters in daily life into a pattern of dynamic stimulation.

Perhaps, he speculates, the system could even have different settings for different situations. “There could be a navigation mode that helps people avoid obstacles when they’re walking; another mode for conversation where the prosthetic would rapidly trace the contours of the face,” he says. That’s a far-off goal, but Yoshor says he sees it clearly. 

Here’s How We Prepare for the Next Pandemic

Post Syndicated from Eliza Strickland original

When the Spanish flu pandemic swept across the globe in 1918, it ravaged a population with essentially no technological countermeasures. There were no diagnostic tests, no mechanical ventilators, and no antiviral or widely available anti-inflammatory medications other than aspirin. The first inactivated-virus vaccines would not become available until 1936. An estimated 50 million people died.

Today, a best-case scenario predicts 1.3 million fatalities from COVID-19 in 2020, according to projections by Imperial College London, and rapidly declining numbers after that. That in a world with 7.8 billion people—more than four times as many as in 1918. Many factors have lessened mortality this time, including better implementation of social-distancing measures. But technology is also a primary bulwark.

Since January of this year, roughly US $50 billion has been spent in the United States alone to ramp up testing, diagnosis, modeling, treatment, vaccine creation, and other tech-based responses, according to the Committee for a Responsible Federal Budget. The massive efforts have energized medical, technical, and scientific establishments in a way that hardly anything else has in the past half century. And they will leave a legacy of protection that will far outlast COVID-19.

In the current crisis, though, it hasn’t been technology that separated the winners and losers. Taking stock of the world’s responses so far, two elements set apart the nations that have successfully battled the coronavirus: foresight and a painstakingly systematic approach. Countries in East Asia that grappled with a dangerous outbreak of the SARS virus in the early 2000s knew the ravages of an unchecked virulent pathogen, and acted quickly to mobilize teams and launch containment plans. Then, having contained the first wave, some governments minimized further outbreaks by carefully tracing every subsequent cluster of infections and working hard to isolate them. Tens of thousands of people, maybe hundreds of thousands, are alive in Asia now because of those measures.

In other countries, most notably the United States, officials initially downplayed the impending disaster, losing precious time. The U.S. government did not act quickly to muster supplies, nor did it promulgate a coherent plan of action. Instead states, municipalities, and hospitals found themselves skirmishing and scrounging for functional tests, for personal protective equipment, and for guidance on when and how to go into lockdown.

The best that can be said about this dismal episode is that it was a hard lesson about how tragic the consequences of incompetence can be. We can only hope that the lesson was learned well, because there will be another pandemic. There will always be another pandemic. There will always be pathogens that mutate ever so slightly, making them infectious to human hosts or rendering existing drug treatments ineffective. Acknowledging that fact is the first step in getting ready—and saving lives.

The cutting-edge technologies our societies have developed and deployed at lightning speed are not only helping to stem the horrendous waves of death. Some of these technologies will endure and—like a primed immune system—put us on a path toward an even more effective response to the next pandemic.

Consider modeling. In the early months of the crisis, the world became obsessed with the models that forecast the future spread of the disease. Officials relied on such models to make decisions that would have mortal consequences for people and multibillion­-dollar ones for economies. Knowing how much was riding on the curves they produced, the modelers who create projections of case numbers and fatalities pulled out all the stops. As Matt Hutson recounts in “The Mess Behind the Models,” they adapted their techniques on the fly, getting better at simulating both a virus that nobody yet understood and the maddening vagaries of human behavior.

In the development of both vaccines and antiviral drugs, researchers have committed to timelines that would have seemed like fantasies a year ago. In “AI Takes Its Best Shot,” Emily Waltz describes how artificial intelligence is reshaping vaccine makers’ efforts to find the viral fragments that trigger a protective immune response. The speed record for vaccine development and approval is four years, she writes, and that honor is held by the mumps vaccine; if a coronavirus vaccine is approved for the general public before the end of this year, it will blow that record away.

Antiviral researchers have it even tougher in some ways. As Megan ­Scudellari writes, hepatitis C was discovered in 1989—and yet the first antiviral effective against it didn’t become available until 26 years later, in 2015. “Automating Antivirals” describes the high-tech methods researchers are creating that could cut the current ­drug-development timeline from five or more years to six months. That, too, will mean countless lives saved: Even with a good vaccine, some people inevitably become sick. For some of them, effective ­antivirals will be the difference between life and death.

Beyond Big Pharma, engineers are throwing their energies into a host of new technologies that could make a difference in the war we’re waging now and in those to come. For example, this pandemic is the first to be fought with robots alongside humans on the front lines. In hospitals, robots are checking on patients and delivering medical supplies; elsewhere, they’re carting groceries and other goods to people in places where a trip to the store can be fraught with risk. They’re even swabbing patients for COVID-19 tests, as Erico Guizzo and Randi Klett reveal in a photo essay of robots that became essential workers.

Among the most successful of the COVID-fighting robots are those buzzing around hospital rooms and blasting floors, walls, and even the air with ­ultraviolet-C radiation. Transportation officials are also starting to deploy UV-C systems to sanitize the interiors of passenger aircraft and subway cars, and medical facilities are using them to sterilize personal protective equipment. The favored wavelength is around 254 nanometers, which destroys the virus by shredding its RNA. The problem is, such UV-C light can also damage human tissues and DNA. So, as Mark Anderson reports in “The Ultraviolet Offense,” researchers are readying a new generation of so-called far-UV sterilizers that use light at 222 nm, which is supposedly less harmful to human beings.

When compared with successful responses in Korea, Singapore, and other Asian countries, two notable failures in the United States become clear: testing and contact tracing. For too long, testing was too scarce and too inaccurate in the United States. That was especially true early on, when it was most needed. And getting results sometimes took two weeks—a devastating delay, as the ­SARS-CoV-2 virus is notorious for being spread by people who don’t even know they’re sick and infectious. Researchers quickly realized that what was really needed was something “like a pregnancy test,” as one told Wudan Yan: “Spit on a stick or into a collection tube and have a clear result 5 minutes later.” Soon, we’ll have such a test.

Digital contact tracing, too, could be an enormously powerful weapon, as Jeremy Hsu reports in “The Dilemma of Contact-Tracing Apps.” But it’s a tricky one to deploy. During the pandemic, many municipalities have used some form of tracing. But much of it was low-key and low-tech—sometimes little more than a harried worker contacting people on a list. Automated contact tracing, using cloud-based smartphone apps that track people’s movements, proved capable of rapidly suppressing the contagion in places like China and South Korea. But most Western countries balked at that level of intrusiveness. Technical solutions that trade off some surveillance stringency for privacy have been developed and tested. But they couldn’t solve the most fundamental problem: a pervasive lack of trust in government among Americans and Europeans.

It has been 102 years since the ­Spanish flu taught us just how bad a global pandemic can be. But almost nobody expects that long of an interval until the next big one. Nearly all major infectious outbreaks today are caused by “zoonotic transfer,” when a pathogen jumps from an animal to human beings. And a variety of unrelated factors, including the loss of natural habitats due to deforestation and the rapid growth of livestock farming to feed industrializing economies, is stressing animal populations and putting them into more frequent contact with people.

We’re unlikely to halt or even measurably slow such global trends. What we can do is make sure we have suitable technology, good governance, and informed communities. That’s how we’ll mount a tougher response to the next pandemic.

This article appears in the October 2020 print issue as “Prepping for the Next Big One.”

AI Can Help Hospitals Triage COVID-19 Patients

Post Syndicated from Eliza Strickland original

As the coronavirus pandemic brings floods of people to hospital emergency rooms around the world, physicians are struggling to triage patients, trying to determine which ones will need intensive care. Volunteer doctors and nurses with no special pulmonary training must assess the condition of patients’ lungs. In Italy, at the peak of that country’s crisis, doctors faced terrible decisions about who should receive help and resources. 

An Official WHO Coronavirus App Will Be a “Waze for COVID-19”

Post Syndicated from Eliza Strickland original

There’s no shortage of information about the coronavirus pandemic: News sites cover every development, and sites like the Johns Hopkins map constantly update the number of global cases (246,276 as of this writing).

But for the most urgent questions, there seem to be no answers. Is it possible that I caught the virus when I went out today? Did I cross paths with someone who’s infected? How prevalent is the coronavirus in my local community? And if I’m feeling sick, where can I go to get tested or find treatment?

A group of doctors and engineers have come together to create an app that will answer such questions. Daniel Kraft, the U.S.-based physician who’s leading the charge, says his group has “gotten the green light” from the World Health Organization (WHO) to build the open-source app, and that it will be an official WHO app to help people around the world cope with COVID-19, the official name of the illness caused by the new coronavirus.

Biotech Pioneer Is Making a Home Test Kit for Coronavirus

Post Syndicated from Eliza Strickland original

Here’s something that sounds decidedly useful right now: A home test kit that anyone could use to see if they have the coronavirus. A kit that’s nearly as easy to use as a pregnancy test, and that would give results in half an hour. 

It doesn’t exist yet, but serial biotech entrepreneur Jonathan Rothberg is working on it. 

We’ve heard from Rothberg before. Over the past two decades, he has founded several genetic sequencing companies that introduced breakthrough technologies, making genome sequencing faster and cheaper. (With one company he made news by sequencing the first individual human genome, that of Jim Watson; he sold another company for US $375 million.)  In 2014, Rothberg launched a startup accelerator called 4Catalyzer dedicated to the invention of smart medical devices. The first company to emerge from the accelerator, Butterfly Network, now sells a handheld ultrasound tool that provides readouts on a smartphone screen. 

Rothberg announced his interest in a coronavirus test kit in a Twitter thread on 7 March. Initially describing it as a “thought experiment,” it quickly became a real project and a driving mission. 

DARPA Races To Create a “Firebreak” Treatment for the Coronavirus

Post Syndicated from Eliza Strickland original

When DARPA launched its Pandemic Preparedness Platform (P3) program two years ago, the pandemic was theoretical. It seemed like a prudent idea to develop a quick response to emerging infectious diseases. Researchers working under the program sought ways to confer instant (but short-term) protection from a dangerous virus or bacteria. 

Today, as the novel coronavirus causes a skyrocketing number of COVID-19 cases around the world, the researchers are racing to apply their experimental techniques to a true pandemic playing out in real time. “Right now, they have one shot on goal,” says DARPA program manager Amy Jenkins. “We’re really hoping it works.”

The P3 program’s plan was to start with a new pathogen and to “develop technology to deliver medical countermeasures in under 60 days—which was crazy, unheard of,” says Jenkins. The teams have proven they can meet this ambitious timeline in previous trials using the influenza and Zika viruses. Now they’re being asked to pull off the same feat with the new coronavirus, which more formally goes by the name SARS-CoV-2 and causes the illness known as COVID-19.

James Crowe, director of the Vanderbilt Vaccine Center at the Vanderbilt University Medical Center, leads one of the four P3 teams. He spoke with IEEE Spectrum from Italy, where he and his wife are currently traveling—and currently recovering from a serious illness. “We both got pretty sick,” Crowe says, “we’re pretty sure we already had the coronavirus.” For his wife, he says, it was the worst illness she’s had in three decades. They’re trying to get tested for the virus.

Ironically, Crowe is quite likely harboring the raw material that his team and others need to do their work. The DAPRA approach called for employing antibodies, the proteins that our bodies naturally use to fight infectious diseases, which remain in our bodies after an infection.

In the P3 program, the 60-day clock begins when a blood sample is taken from a person who has fully recovered from the disease of interest. Then the researchers screen that sample to find all the protective antibodies the person’s body has made to fight off the virus or bacteria. They use modeling and bioinformatics to choose the antibody that seems most effective at neutralizing the pathogen, and then determine the genetic sequence that codes for the creation of that particular antibody. That snippet of genetic code can then be manufactured quickly and at scale, and injected into people.

Jenkins says this approach is much faster than manufacturing the antibodies themselves. Once the genetic snippets are delivered by an injection, “your body becomes the bioreactor” that creates the antibodies, she says. The P3 program’s goal is to have protective levels of the antibodies circulating within 6 to 24 hours.

DARPA calls this a “firebreak” technology, because it can provide immediate immunity to medical personnel, first responders, and other vulnerable people. However, it wouldn’t create the permanent protection that vaccines provide. (Vaccines work by presenting the body with a safe form of the pathogen, thus giving the body a low-stakes opportunity to learn how to respond, which it remembers on future exposures.)

Robert Carnahan, who works with Crowe at the Vanderbilt Vaccine Center, explains that their method offers only temporary protection because the snippets of genetic code are messenger RNA, molecules that carry instructions for protein production. When the team’s specially designed mRNA is injected into the body, it’s taken up by cells (likely those in the liver) that churn out the needed antibodies. But eventually that RNA degrades, as do the antibodies that circulate through the blood stream. 

“We haven’t taught the body how to make the antibody,” Carnahan says, so the protection isn’t permanent. To put it in terms of a folksy aphorism: “We haven’t taught the body to fish, we’ve just given it a fish.”

Carnahan says the team has been scrambling for weeks to build the tools that enable them to screen for the protective antibodies; since very little is known about this new coronavirus, “the toolkit is being built on the fly.” They were also waiting for a blood sample from a fully recovered U.S. patient. Now they have their first sample, and are hoping for more in the coming weeks, so the real work is beginning. “We are doggedly pursuing neutralizing antibodies,” he says. “Right now we have active pursuit going on.”

Jenkins says that all of the P3 groups (the others are Greg Semposki’s lab at Duke University, a small Vancouver company called AbCellera, and the big pharma company AstraZeneca) have made great strides in technologies that rapidly identify promising antibodies. In their earlier trials, the longer part of the process was manufacturing the mRNA and preparing for safety studies in animals. If the mRNA is intended for human use, the manufacturing and testing processes will be much slower because there will be many more regulatory hoops to jump through. 

Crowe and Carnahan’s team has seen another project through to human trials; last year the lab worked with the biotech company Moderna to make mRNA that coded for antibodies that protected against the Chikungunya virus. “We showed it can be done,” Crowe says.

Moderna was involved in a related DARPA program known as ADEPT that has since ended. The company’s work on mRNA-based therapies has led it in another interesting direction—last week, the company made news with its announcement that it was testing an mRNA-based vaccine for the coronavirus. That vaccine works by delivering mRNA that instructs the body to make the “spike” protein that’s present on the surface of the coronavirus, thus provoking an immune response that the body will remember if it encounters the whole virus. 

While Crowe’s team isn’t pursuing a vaccine, they are hedging their bets by considering both the manufacture of mRNA to code for the protective antibodies, and manufacturing the antibodies themselves. Crowe says that the latter process is typically slower (perhaps 18 to 24 months), but he’s talking with companies that are working on ways to speed it up. The direct injection of antibodies is a standard type of immunotherapy.   

Crowe’s convalescence in Italy isn’t particularly relaxing. He’s working long days, trying to facilitate the shipment of samples to his lab from Asia, Europe, and the United States, and he’s talking to pharmaceutical companies that could manufacture whatever his lab comes up with. “We have to figure out a cooperative agreement for something we haven’t made yet,” he says, “but I expect that this week we’ll have agreements in place.”

Manufacturing agreements aren’t the end of the story, however. Even if everything goes perfectly for Crowe’s team and they have a potent antibody or mRNA ready for manufacture by the end of April, they’d have to get approval from the U.S. Food and Drug Administration. To get a therapy approved for human use typically takes years of studies on toxicity, stability, and efficacy. Crowe says that one possible shortcut is the FDA’s compassionate use program, which allows people to use unapproved drugs in certain life-threatening situations. 

Gregory Poland, director of the Mayo Clinic Vaccine Research Group, says that mRNA-based prophylactics and vaccines are an important and interesting idea. But he notes that all talk of these possibilities should be considered “very forward-looking statements.” He has seen too many promising candidates fail after years of clinical trials to get excited about a new bright idea, he says. 

Poland also says the compassionate use shortcut would likely only be relevant “if we’re facing a situation where something like Wuhan is happening in a major city in the U.S., and we have reason to believe that a new therapy would be efficacious and safe. Then that’s a possibility, but we’re not there yet,” he says. “We’d be looking for an unknown benefit and accepting an unknown risk.” 

Yet the looming regulatory hurdles haven’t stopped the P3 researchers from sprinting. AbCellera, the Vancouver-based biotech company, tells IEEE Spectrum that its researchers have spent the last month looking at antibodies against the related SARS virus that emerged in 2002, and that they’re now beginning to study the coronavirus directly. “Our main discovery effort from a COVID-19 convalescent donor is about to be underway, and we will unleash our full discovery capabilities there,” a representative wrote in an email. 

At Crowe’s lab, Carnahan notes that the present outbreak is the first one to benefit from new technologies that enable a rapid response. Carnahan points to the lab’s earlier work on the Zika virus: “In 78 days we went from a sample to a validated antibody that was highly protective. Will we be able to hit 78 days with the coronavirus? I don’t know, but that’s what we’re gunning for,” he says. “It’s possible in weeks and months, not years.”

Facebook AI Launches Its Deepfake Detection Challenge

Post Syndicated from Eliza Strickland original

In September, Facebook sent out a strange casting call: We need all types of people to look into a webcam or phone camera and say very mundane things. The actors stood in bedrooms, hallways, and backyards, and they talked about topics such as the perils of junk food and the importance of arts education. It was a quick and easy gig—with an odd caveat. Facebook researchers would be altering the videos, extracting each person’s face and fusing it onto another person’s head. In other words, the participants had to agree to become deepfake characters. 

Facebook’s artificial intelligence (AI) division put out this casting call so it could ethically produce deepfakes—a term that originally referred to videos that had been modified using a certain face-swapping technique but is now a catchall for manipulated video. The Facebook videos are part of a training data set that the company assembled for a global competition called the Deepfake Detection Challenge. In this competition—produced in cooperation with Amazon, Microsoft, the nonprofit Partnership on AI, and academics from eight universities—researchers around the world are vying to create automated tools that can spot fraudulent media.

The competition launched today, with an announcement at the AI conference NeurIPS, and will accept entries through March 2020. Facebook has dedicated more than US $10 million for awards and grants.

Cristian Canton Ferrer helped organize the challenge as research manager for Facebook’s AI Red Team, which analyzes the threats that AI poses to the social media giant. He says deepfakes are a growing danger not just to Facebook but to democratic societies. Manipulated videos that make politicians appear to do and say outrageous things could go viral before fact-checkers have a chance to step in.

While such a full-blown synthetic scandal has yet to occur, the Italian public recently got a taste of the possibilities. In September, a satirical news show aired a deepfake video featuring a former Italian prime minister apparently lavishing insults on other politicians. Most viewers realized it was a parody, but a few did not.

The U.S. presidential elections in 2020 are an added incentive to get ahead of the problem, says Canton Ferrer. He believes that media manipulation will become much more common over the coming year, and that the deepfakes will get much more sophisticated and believable. “We’re thinking about what will be happening a year from now,” he says. “It’s a cat-and-mouse approach.” Canton ­Ferrer’s team aims to give the cat a head start, so it will be ready to pounce.

The growing threat of deepfakes

Just how easy is it to make deepfakes? A recent audit of online resources for altering videos found that the available open-source software still requires a good amount of technical expertise. However, the audit also turned up apps and services that are making it easier for almost anyone to get in on the action. In China, a deepfake app called Zao took the country by storm in September when it offered people a simple way to superimpose their own faces onto those of actors like Leonardo DiCaprio and Marilyn Monroe.

It may seem odd that the data set compiled for Facebook’s competition is filled with unknown people doing unremarkable things. But a deepfake detector that works on those mundane videos should work equally well for videos featuring politicians. To make the Facebook challenge as realistic as possible, Canton Ferrer says his team used the most common open-source techniques to alter the videos—but he won’t name the methods, to avoid tipping off contestants. “In real life, they will not be able to ask the bad actors, ‘Can you tell me what method you used to make this deepfake?’” he says.

In the current competition, detectors will be scanning for signs of facial manipulation. However, the Facebook team is keeping an eye on new and emerging attack methods, such as full-body swaps that change the appearance and actions of a person from head to toe. “There are some of those out there, but they’re pretty obvious now,” ­Canton Ferrer says. “As they get better, we’ll add them to the data set.” Even after the detection challenge concludes in March, he says, the Facebook team will keep working on the problem of deepfakes.

As for how the winning detection methods will be used and whether they’ll be integrated into Facebook’s operations, Canton Ferrer says those decisions aren’t up to him. The Partnership on AI’s steering committee on AI and media integrity, which is overseeing the competition, will decide on the next steps, he says. Claire Leibowicz, who leads that steering committee, says the group will consider “coordinated efforts” to fight back against the global challenge of synthetic and manipulated media.

DARPA’s efforts on deepfake detection

The Facebook challenge is far from the only effort to counter deepfakes. DARPA’s Media Forensics program launched in 2016, a year before the first deepfake videos surfaced on Reddit. Program manager Matt Turek says that as the technology took off, the researchers working under the program developed a number of detection technologies, generally looking for “digital integrity, physical integrity, or semantic integrity.”

Digital integrity is defined by the patterns in an image’s pixels that are invisible to the human eye. These patterns can arise from cameras and video processing software, and any inconsistencies that appear are a tip-off that a video has been altered. Physical integrity refers to the consistency in lighting, shadows, and other physical attributes in an image. Semantic integrity considers the broader context. If a video shows an outdoor scene, for example, a deepfake detector might check the time stamp and location to look up the weather report from that time and place. The best automated detector, Turek says, would “use all those techniques to produce a single integrity score that captures everything we know about a digital asset.”

Turek says his team has created a prototype Web portal (restricted to its government partners) to demonstrate a sampling of the detectors developed during the program. When the user uploads a piece of media via the Web portal, more than 20 detectors employ a range of different approaches to try to determine whether an image or video has been manipulated. Turek says his team continues to add detectors to the system, which is already better than humans at spotting fakes.

A successor to the Media Forensics program will launch in mid-2020: the Semantic Forensics program. This broader effort will cover all types of media—text, images, videos, and audio—and will go beyond simply detecting manipulation. It will also seek methods to understand the importance of the manipulations, which could help organizations decide which content requires human review. “If you manipulate a vacation photo by adding a beach ball, it really doesn’t matter,” Turek says. “But if you manipulate an image about a protest and add an object like a flag, that could change people’s understanding of who was involved.”

The Semantic Forensics program will also try to develop tools to determine if a piece of media really comes from the source it claims. Eventually, Turek says, he’d like to see the tech community embrace a system of watermarking, in which a digital signature would be embedded in the media itself to help with the authentication process. One big challenge of this idea is that every software tool that interacts with the image, video, or other piece of media would have to “respect that watermark, or add its own,” Turek says. “It would take a long time for the ecosystem to support that.”

A deepfake detection tool for consumers

In the meantime, the AI Foundation has a plan. This nonprofit is building a tool called Reality Defender that’s due to launch in early 2020. “It will become your personal AI guardian who’s watching out for you,” says Rob Meadows, president and chief technology officer for the foundation.

Reality Defender is a plug-in for Web browsers and an app for mobile phones. It scans everything on the screen using a suite of automatic detectors, then alerts the user about altered media. Detection alone won’t make for a useful tool, since ­Photoshop and other editing tools are widely used in fashion, advertising, and entertainment. If Reality Defender draws attention to every altered piece of content, Meadows notes, “it will flood consumers to the point where they say, ‘We don’t care anymore, we have to tune it out.’”

To avoid that problem, users will be able to dial the tool’s sensitivity up or down, depending on how many alerts they want. Meadows says beta testers are currently training the system, giving it feedback on which types of manipulations they care about. Once Reality Defender launches, users will be able to personalize their AI guardian by giving it a thumbs-up or thumbs-down on alerts, until it learns their preferences. “A user can say, ‘For my level of paranoia, this is what works for me,’ ” Meadows says.

He sees the software as a useful stopgap solution, but ultimately he hopes that his group’s technologies will be integrated into platforms such as Facebook, YouTube, and Twitter. He notes that Biz Stone, cofounder of Twitter, is a member of the AI Foundation’s board. To truly protect society from fake media, Meadows says, we need tools that prevent falsehoods from getting hosted on platforms and spread via social media. Debunking them after they’ve already spread is too late.

The researchers at Jigsaw, a unit of Alphabet that works on technology solutions for global challenges, would tend to agree. Technical research manager Andrew Gully says his team identified synthetic media as a societal threat some years back. To contribute to the fight, Jigsaw teamed up with sister company Google AI to produce a deepfake data set of its own in late 2018, which they contributed to the FaceForensics data set hosted by the Technical University of Munich.

Gully notes that while we haven’t yet seen a political crisis triggered by a deepfake, these videos are also used for bullying and “revenge porn,” in which a targeted woman’s face is pasted onto the face of an actor in a porno. (While pornographic deepfakes could in theory target men, a recent audit of deepfake content found that 100 percent of the pornographic videos focused on women.) What’s more, Gully says people are more likely to be credulous of videos featuring unknown individuals than famous politicians.

But it’s the threat to free and fair elections that feels most crucial in this U.S. election year. Gully says systems that detect deepfakes must take a careful approach in communicating the results to users. “We know already how difficult it is to convince people in the face of their own biases,” Gully says. “Detecting a deepfake video is hard enough, but that’s easy compared to how difficult it is to convince people of things they don’t want to believe.”

Yoshua Bengio, Revered Architect of AI, Has Some Ideas About What to Build Next

Post Syndicated from Eliza Strickland original

Yoshua Bengio is known as one of the “three musketeers” of deep learning, the type of artificial intelligence (AI) that dominates the field today. 

Bengio, a professor at the University of Montreal, is credited with making key breakthroughs in the use of neural networksand just as importantly, with persevering with the work through the long cold AI winter of the late 1980s and the 1990s, when most people thought that neural networks were a dead end. 

He was rewarded for his perseverance in 2018, when he and his fellow musketeers (Geoffrey Hinton and Yann LeCun) won the Turing Award, which is often called the Nobel Prize of computing.

Today, there’s increasing discussion about the shortcomings of deep learning. In that context, IEEE Spectrum spoke to Bengio about where the field should go from here. He’ll speak on a similar subject tomorrow at NeurIPS, the biggest and buzziest AI conference in the world; his talk is titled “From System 1 Deep Learning to System 2 Deep Learning.”

Yoshua Bengio on . . .

  1. Deep learning and its discontents
  2. The dawn of brain-inspired computation
  3. Learning to learn
  4. “This is not ready for industry”
  5. Physics, language, and common sense
  1. Deep learning and its discontents

    IEEE Spectrum: What do you think about all the discussion of deep learning’s limitations?

    Yoshua Bengio: Too many public-facing venues don’t understand a central thing about the way we do research, in AI and other disciplines: We try to understand the limitations of the theories and methods we currently have, in order to extend the reach of our intellectual tools. So deep learning researchers are looking to find the places where it’s not working as well as we’d like, so we can figure out what needs to be added and what needs to be explored.

    This is picked up by people like Gary Marcus, who put out the message: “Look, deep learning doesn’t work.” But really, what researchers like me are doing is expanding its reach. When I talk about things like the need for AI systems to understand causality, I’m not saying that this will replace deep learning. I’m trying to add something to the toolbox.

    What matters to me as a scientist is what needs to be explored in order to solve the problems. Not who’s right, who’s wrong, or who’s praying at which chapel.

    Spectrum: How do you assess the current state of deep learning?

    Bengio: In terms of how much progress we’ve made in this work over the last two decades: I don’t think we’re anywhere close today to the level of intelligence of a two-year-old child. But maybe we have algorithms that are equivalent to lower animals, for perception. And we’re gradually climbing this ladder in terms of tools that allow an entity to explore its environment.

    One of the big debates these days is: What are the elements of higher-level cognition? Causality is one element of it, and there’s also reasoning and planning, imagination, and credit assignment (“what should I have done?”). In classical AI, they tried to obtain these things with logic and symbols. Some people say we can do it with classic AI, maybe with improvements.

    Then there are people like me, who think that we should take the tools we’ve built in last few years to create these functionalities in a way that’s similar to the way humans do reasoning, which is actually quite different from the way a purely logical system based on search does it.


    The dawn of brain-inspired computation

    Spectrum: How can we create functions similar to human reasoning?

    Bengio: Attention mechanisms allow us to learn how to focus our computation on a few elements, a set of computations. Humans do that—it’s a particularly important part of conscious processing. When you’re conscious of something, you’re focusing on a few elements, maybe a certain thought, then you move on to another thought. This is very different from standard neural networks, which are instead parallel processing on a big scale. We’ve had big breakthroughs on computer vision, translation, and memory thanks to these attention mechanisms, but I believe it’s just the beginning of a different style of brain-inspired computation.

    It’s not that we have solved the problem, but I think we have a lot of the tools to get started. And I’m not saying it’s going to be easy. I wrote a paper in 2017 called “The Consciousness Prior” that laid out the issue. I have several students working on this and I know it is a long-term endeavor.

    Spectrum: What other aspects of human intelligence would you like to replicate in AI?

    Bengio: We also talk about the ability of neural nets to imagine: Reasoning, memory, and imagination are three aspects of the same thing going on in your mind. You project yourself into the past or the future, and when you move along these projections, you’re doing reasoning. If you anticipate something bad happening in the future, you change course—that’s how you do planning. And you’re using memory too, because you go back to things you know in order to make judgments. You select things from the present and things from the past that are relevant.

    Attention is the crucial building block here. Let’s say I’m translating a book into another language. For every word, I have to carefully look at a very small part of the book. Attention allows you abstract out a lot of irrelevant details and focus what matters. Being able to pick out the relevant elementsthat’s what attention does.

    Spectrum: How does that translate to machine learning?

    Bengio: You don’t have to tell the neural net what to pay attention to—that’s the beauty of it. It learns it on its own. The neural net learns how much attention, or weight, it should give to each element in a set of possible elements to consider.


    Learning to learn

    Spectrum: How is your recent work on causality related to these ideas?

    Bengio: The kind of high-level concepts that you reason with tend to be variables that are cause and/or effect. You don’t reason based on pixels. You reason based on concepts like door or knob or open or closed. Causality is very important for the next steps of progress of machine learning.

    And it’s related to another topic that is much on the minds of people in deep learning. Systematic generalization is the ability humans have to generalize the concepts we know, so they can be combined in new ways that are unlike anything else we’ve seen. Today’s machine learning doesn’t know how to do that. So you often have problems relating to training on a particular data set. Say you train in one country, and then deploy in another country. You need generalization and transfer learning. How do you train a neural net so that if you transfer it into a new environment, it continues to work well or adapts quickly?

    Spectrum: What’s the key to that kind of adaptability?

    Bengio: Meta-learning is a very hot topic these days: Learning to learn. I wrote an early paper on this in 1991, but only recently did we get the computational power to implement this kind of thing. It’s computationally expensive. The idea: In order to generalize to a new environment, you have to practice generalizing to a new environment. It’s so simple when you think about it. Children do it all the time. When they move from one room to another room, the environment is not static, it keeps changing. Children train themselves to be good at adaptation. To do that efficiently, they have to use the pieces of knowledge they’ve acquired in the past. We’re starting to understand this ability, and to build tools to replicate it.

    One critique of deep learning is that it requires a huge amount of data. That’s true if you just train it on one task. But children have the ability to learn based on very little data. They capitalize on the things they’ve learned before. But more importantly, they’re capitalizing on their ability to adapt and generalize.


    “This is not ready for industry”

    Spectrum: Will any of these ideas be used in the real world anytime soon?

    Bengio: No. This is all very basic research using toy problems. That’s fine, that’s where we’re at. We can debug these ideas, move on to new hypotheses. This is not ready for industry tomorrow morning.

    But there are two practical limitations that industry cares about, and that this research may help. One is building systems that are more robust to changes in the environment. Two: How do we build natural language processing systems, dialogue systems, virtual assistants? The problem with the current state of the art systems that use deep learning is that they’re trained on huge quantities of data, but they don’t really understand well what they’re talking about. People like Gary Marcus pick up on this and say, “That’s proof that deep learning doesn’t work.” People like me say, “That’s interesting, let’s tackle the challenge.”


    Physics, language, and common sense

    Spectrum: How could chatbots do better?

    Bengio: There’s an idea called grounded language learning which is attracting new attention recently. The idea is, an AI system should not learn only from text. It should learn at the same time how the world works, and how to describe the world with language. Ask yourself: Could a child understand the world if they were only interacting with the world via text? I suspect they would have a hard time.

    This has to do with conscious versus unconscious knowledge, the things we know but can’t name. A good example of that is intuitive physics. A two-year-old understands intuitive physics. They don’t know Newton’s equations, but they understand concepts like gravity in a concrete sense. Some people are now trying to build systems that interact with their environment and discover the basic laws of physics.

    Spectrum: Why would a basic grasp of physics help with conversation?

    Bengio: The issue with language is that often the system doesn’t really understand the complexity of what the words are referring to. For example, the statements used in the Winograd schema; in order to make sense of them, you have to capture physical knowledge. There are sentences like: “Jim wanted to put the lamp into his luggage, but it was too large.” You know that if this object is too large for putting in the luggage, it must be the “it,” the subject of the second phrase. You can communicate that kind of knowledge in words, but it’s not the kind of thing we go around saying: “The typical size of a piece of luggage is x by x.”

    We need language understanding systems that also understand the world. Currently, AI researchers are looking for shortcuts. But they won’t be enough. AI systems also need to acquire a model of how the world works.


The Blogger Behind “AI Weirdness” Thinks Today’s AI Is Dumb and Dangerous

Post Syndicated from Eliza Strickland original

Sure, artificial intelligence is transforming the world’s societies and economies—but can an AI come up with plausible ideas for a Halloween costume? 

Janelle Shane has been asking such probing questions since she started her AI Weirdness blog in 2016. She specializes in training neural networks (which underpin most of today’s machine learning techniques) on quirky data sets such as compilations of knitting instructions, ice cream flavors, and names of paint colors. Then she asks the neural net to generate its own contributions to these categories—and hilarity ensues. AI is not likely to disrupt the paint industry with names like “Ronching Blue,” “Dorkwood,” and “Turdly.” 

Shane’s antics have a serious purpose. She aims to illustrate the serious limitations of today’s AI, and to counteract the prevailing narrative that describes AI as well on its way to superintelligence and complete human domination. “The danger of AI is not that it’s too smart,” Shane writes in her new book, “but that it’s not smart enough.” 

The book, which came out on Tuesday, is called You Look Like a Thing and I Love You. It takes its odd title from a list of AI-generated pick-up lines, all of which would at least get a person’s attention if shouted, preferably by a robot, in a crowded bar. Shane’s book is shot through with her trademark absurdist humor, but it also contains real explanations of machine learning concepts and techniques. It’s a painless way to take AI 101. 

She spoke with IEEE Spectrum about the perils of placing too much trust in AI systems, the strange AI phenomenon of “giraffing,” and her next potential Halloween costume. 

Janelle Shane on . . .

  1. The un-delicious origin of her blog
  2. “The narrower the problem, the smarter the AI will seem”
  3. Why overestimating AI is dangerous
  4. Giraffing!
  5. Machine and human creativity
  1. The un-delicious origin of her blog

    IEEE Spectrum: You studied electrical engineering as an undergrad, then got a master’s degree in physics. How did that lead to you becoming the comedian of AI? 

    Janelle Shane: I’ve been interested in machine learning since freshman year of college. During orientation at Michigan State, a professor who worked on evolutionary algorithms gave a talk about his work. It was full of the most interesting anecdotes–some of which I’ve used in my book. He told an anecdote about people setting up a machine learning algorithm to do lens design, and the algorithm did end up designing an optical system that works… except one of the lenses was 50 feet thick, because they didn’t specify that it couldn’t do that.  

    I started working in his lab on optics, doing ultra-short laser pulse work. I ended up doing a lot more optics than machine learning, but I always found it interesting. One day I came across a list of recipes that someone had generated using a neural net, and I thought it was hilarious and remembered why I thought machine learning was so cool. That was in 2016, ages ago in machine learning land.

    Spectrum: So you decided to “establish weirdness as your goal” for your blog. What was the first weird experiment that you blogged about? 

    Shane: It was generating cookbook recipes. The neural net came up with ingredients like: “Take ¼ pounds of bones or fresh bread.” That recipe started out: “Brown the salmon in oil, add creamed meat to the mixture.” It was making mistakes that showed the thing had no memory at all. 

    Spectrum: You say in the book that you can learn a lot about AI by giving it a task and watching it flail. What do you learn?

    Shane: One thing you learn is how much it relies on surface appearances rather than deep understanding. With the recipes, for example: It got the structure of title, category, ingredients, instructions, yield at the end. But when you look more closely, it has instructions like “Fold the water and roll it into cubes.” So clearly this thing does not understand water, let alone the other things. It’s recognizing certain phrases that tend to occur, but it doesn’t have a concept that these recipes are describing something real. You start to realize how very narrow the algorithms in this world are. They only know exactly what we tell them in our data set. 


    “The narrower the problem, the smarter the AI will seem”

    Spectrum: That makes me think of DeepMind’s AlphaGo, which was universally hailed as a triumph for AI. It can play the game of Go better than any human, but it doesn’t know what Go is. It doesn’t know that it’s playing a game. 

    Shane: It doesn’t know what a human is, or if it’s playing against a human or another program. That’s also a nice illustration of how well these algorithms do when they have a really narrow and well-defined problem. 

    The narrower the problem, the smarter the AI will seem. If it’s not just doing something repeatedly but instead has to understand something, coherence goes down. For example, take an algorithm that can generate images of objects. If the algorithm is restricted to birds, it could do a recognizable bird. If this same algorithm is asked to generate images of any animal, if its task is that broad, the bird it generates becomes an unrecognizable brown feathered smear against a green background.

    Spectrum: That sounds… disturbing. 

    Shane: It’s disturbing in a weird amusing way. What’s really disturbing is the humans it generates. It hasn’t seen them enough times to have a good representation, so you end up with an amorphous, usually pale-faced thing with way too many orifices. If you asked it to generate an image of a person eating pizza, you’ll have blocks of pizza texture floating around. But if you give that image to an image-recognition algorithm that was trained on that same data set, it will say, “Oh yes, that’s a person eating pizza.”


    Why overestimating AI is dangerous

    Spectrum: Do you see it as your role to puncture the AI hype? 

    Shane: I do see it that way. Not a lot of people are bringing out this side of AI. When I first started posting my results, I’d get people saying, “I don’t understand, this is AI, shouldn’t it be better than this? Why doesn’t it understand?” Many of the impressive examples of AI have a really narrow task, or they’ve been set up to hide how little understanding it has. There’s a motivation, especially among people selling products based on AI, to represent the AI as more competent and understanding than it actually is. 

    Spectrum: If people overestimate the abilities of AI, what risk does that pose? 

    Shane: I worry when I see people trusting AI with decisions it can’t handle, like hiring decisions or decisions about moderating content. These are really tough tasks for AI to do well on. There are going to be a lot of glitches. I see people saying, “The computer decided this so it must be unbiased, it must be objective.” 

    That’s another thing I find myself highlighting in the work I’m doing. If the data includes bias, the algorithm will copy that bias. You can’t tell it not to be biased, because it doesn’t understand what bias is. I think that message is an important one for people to understand. 

    If there’s bias to be found, the algorithm is going to go after it. It’s like, “Thank goodness, finally a signal that’s reliable.” But for a tough problem like: Look at these resumes and decide who’s best for the job. If its task is to replicate human hiring decisions, it’s going to glom onto gender bias and race bias. There’s an example in the book of a hiring algorithm that Amazon was developing that discriminated against women, because the historical data it was trained on had that gender bias. 

    Spectrum: What are the other downsides of using AI systems that don’t really understand their tasks? 

    Shane: There is a risk in putting too much trust in AI and not examining its decisions. Another issue is that it can solve the wrong problems, without anyone realizing it. There have been a couple of cases in medicine. For example, there was an algorithm that was trained to recognize things like skin cancer. But instead of recognizing the actual skin condition, it latched onto signals like the markings a surgeon makes on the skin, or a ruler placed there for scale. It was treating those things as a sign of skin cancer. It’s another indication that these algorithms don’t understand what they’re looking at and what the goal really is. 



    Spectrum: In your blog, you often have neural nets generate names for things—such as ice cream flavors, paint colors, cats, mushrooms, and types of apples. How do you decide on topics?

    Shane: Quite often it’s because someone has written in with an idea or a data set. They’ll say something like, “I’m the MIT librarian and I have a whole list of MIT thesis titles.” That one was delightful. Or they’ll say, “We are a high school robotics team, and we know where there’s a list of robotics team names.” It’s fun to peek into a different world. I have to be careful that I’m not making fun of the naming conventions in the field. But there’s a lot of humor simply in the neural net’s complete failure to understand. Puns in particular—it really struggles with puns. 

    Spectrum: Your blog is quite absurd, but it strikes me that machine learning is often absurd in itself. Can you explain the concept of giraffing?

    Shane: This concept was originally introduced by [internet security expert] Melissa Elliott. She proposed this phrase as a way to describe the algorithms’ tendency to see giraffes way more often than would be likely in the real world. She posted a whole bunch of examples, like a photo of an empty field in which an image-recognition algorithm has confidently reported that there are giraffes. Why does it think giraffes are present so often when they’re actually really rare? Because they’re trained on data sets from online. People tend to say, “Hey look, a giraffe!” And then take a photo and share it. They don’t do that so often when they see an empty field with rocks. 

    There’s also a chatbot that has a delightful quirk. If you show it some photo and ask it how many giraffes are in the picture, it will always answer with some non zero number. This quirk comes from the way the training data was generated: These were questions asked and answered by humans online. People tended not to ask the question “How many giraffes are there?” when the answer was zero. So you can show it a picture of someone holding a Wii remote. If you ask it how many giraffes are in the picture, it will say two. 


    Machine and human creativity

    Spectrum: AI can be absurd, and maybe also creative. But you make the point that AI art projects are really human-AI collaborations: Collecting the data set, training the algorithm, and curating the output are all artistic acts on the part of the human. Do you see your work as a human-AI art project?

    Shane: Yes, I think there is artistic intent in my work; you could call it literary or visual. It’s not so interesting to just take a pre-trained algorithm that’s been trained on utilitarian data, and tell it to generate a bunch of stuff. Even if the algorithm isn’t one that I’ve trained myself, I think about, what is it doing that’s interesting, what kind of story can I tell around it, and what do I want to show people. 

    Spectrum: For the past three years you’ve been getting neural nets to generate ideas for Halloween costumes. As language models have gotten dramatically better over the past three years, are the costume suggestions getting less absurd? 

    Shane: Yes. Before I would get a lot more nonsense words. This time I got phrases that were related to real things in the data set. I don’t believe the training data had the words Flying Dutchman or barnacle. But it was able to draw on its knowledge of which words are related to suggest things like sexy barnacle and sexy Flying Dutchman. 

    Spectrum: This year, I saw on Twitter that someone made the gothy giraffe costume happen. Would you ever dress up for Halloween in a costume that the neural net suggested? 

    Shane: I think that would be fun. But there would be some challenges. I would love to go as the sexy Flying Dutchman. But my ambition may constrict me to do something more like a list of leg parts. 


Racial Bias Found in Algorithms That Determine Health Care for Millions of Patients

Post Syndicated from Eliza Strickland original

An algorithm that a major medical center used to identify patients for extra care has been shown to be racially biased. 

The algorithm screened patients for enrollment in an intensive care management program, which gave them access to a dedicated hotline for a nurse practitioner, help refilling prescriptions, and so forth. The screening was meant to identify those patients who would most benefit from the program. But the white patients flagged for enrollment had fewer chronic health conditions than the black patients who were flagged.

In other words, black patients had to reach a higher threshold of illness before they were considered for enrollment. Care was not actually going to those people who needed it most.

Alarmingly, the algorithm was performing its task correctly. The problem was with how the task was defined.

The findings, described in a paper that was just published in Science, point to a system-wide problem, says coauthor Ziad Obermeyer, a physician and researcher at the UC Berkeley School of Public Health. Similar screening tools are used throughout the country; according to industry estimates, these types of algorithms are making health decisions for 200 million people per year. 

Baseball’s Engineer: Ben Hansen Says Biometrics Can Save Pitchers’ Elbows

Post Syndicated from Eliza Strickland original

When Benjamin Hansen was playing baseball in high school, around 2006, technologies to monitor athletes’ bodies and performance weren’t yet commonplace. Yet Hansen wanted to collect data any way he could. “I would sit on the bench with a calculator and a stopwatch, timing the pitchers,” he says. He clicked the stopwatch when the pitcher released the baseball and again when the ball popped into the catcher’s mitt, then factored in the pitcher’s height in calculating the pitch velocity.

Hansen’s coach, however, was not impressed. “My coach should have embraced it,” he says, wistfully. “But instead he made me run laps.”

Hansen kept playing baseball through college, pitching for his team at the Milwaukee School of Engineering. But he was plagued by injuries. He well remembers a practice game in which he logged 15 straight outs—then felt a sharp pain in his elbow. He had partially torn his ulnar collateral ligament (UCL) and had to sit out the rest of the season. “I always asked the question: Why is this happening?” he says.

Today, Hansen is the vice president of biomechanics and innovation for Motus Global, in St. Petersburg, Fla., a startup that produces wearable sports technology. For IEEE Spectrum’s October issue, he describes Motus’s product for baseball pitchers, a compression sleeve with sensors to measure workload and muscle fatigue. From Little League to Major League Baseball, pitchers are using Motus gear to understand their bodies, improve performance, and prevent injuries.

Traditional wisdom holds that pitcher injuries result from faulty form. But data from Motus’s wearable indicates that it’s the accumulated workload on a player’s muscles and ligaments that causes injuries like UCL tears, which have become far too common in baseball. By displaying measurements of fatigue and suggesting training regimens, rehab workouts, and in-game strategies, the wearable can help prevent players from pushing themselves past their limits. It’s a goal that even Hansen’s old coach would probably endorse.

This article appears in the October 2019 print issue as “Throwing Data Around.”

The Ultimate Optimization Problem: How to Best Use Every Square Meter of the Earth’s Surface

Post Syndicated from Eliza Strickland original

Lucas Joppa thinks big. Even while gazing down into his cup of tea in his modest office on Microsoft’s campus in Redmond, Washington, he seems to see the entire planet bobbing in there like a spherical tea bag. 

As Microsoft’s first chief environmental officer, Joppa came up with the company’s AI for Earth program, a five-year effort that’s spending US $50 million on AI-powered solutions to global environmental challenges.

The program is not just about specific deliverables, though. It’s also about mindset, Joppa told IEEE Spectrum in an interview in July. “It’s a plea for people to think about the Earth in the same way they think about the technologies they’re developing,” he says. “You start with an objective. So what’s our objective function for Earth?” (In computer science, an objective function describes the parameter or parameters you are trying to maximize or minimize for optimal results.)

AI for Earth launched in December 2017, and Joppa’s team has since given grants to more than 400 organizations around the world. In addition to receiving funding, some grantees get help from Microsoft’s data scientists and access to the company’s computing resources. 

In a wide-ranging interview about the program, Joppa described his vision of the “ultimate optimization problem”—figuring out which parts of the planet should be used for farming, cities, wilderness reserves, energy production, and so on. 

Every square meter of land and water on Earth has an infinite number of possible utility functions. It’s the job of Homo sapiens to describe our overall objective for the Earth. Then it’s the job of computers to produce optimization results that are aligned with the human-defined objective.

I don’t think we’re close at all to being able to do this. I think we’re closer from a technology perspective—being able to run the model—than we are from a social perspective—being able to make decisions about what the objective should be. What do we want to do with the Earth’s surface?

AI Agents Startle Researchers With Unexpected Hide-and-Seek Strategies

Post Syndicated from Eliza Strickland original

After 25 million games, the AI agents playing hide-and-seek with each other had mastered four basic game strategies. The researchers expected that part.

After a total of 380 million games, the AI players developed strategies that the researchers didn’t know were possible in the game environment—which the researchers had themselves created. That was the part that surprised the team at OpenAI, a research company based in San Francisco.

The AI players learned everything via a machine learning technique known as reinforcement learning. In this learning method, AI agents start out by taking random actions. Sometimes those random actions produce desired results, which earn them rewards. Via trial-and-error on a massive scale, they can learn sophisticated strategies.

In the context of games, this process can be abetted by having the AI play against another version of itself, ensuring that the opponents will be evenly matched. It also locks the AI into a process of one-upmanship, where any new strategy that emerges forces the opponent to search for a countermeasure. Over time, this “self-play” amounted to what the researchers call an “auto-curriculum.” 

According to OpenAI researcher Igor Mordatch, this experiment shows that self-play “is enough for the agents to learn surprising behaviors on their own—it’s like children playing with each other.”

Three Steps to a Moon Base

Post Syndicated from Eliza Strickland original

Space agencies and private companies are working on rockets, landers, and other tech for lunar settlement

graphic link to special report landing page
graphic link to special report landing  page

In 1968, NASA astronaut Jim Lovell gazed out of a porthole from lunar orbit and remarked on the “vast loneliness” of the moon. It may not be lonely place for much longer. Today, a new rush of enthusiasm for lunar exploration has swept up government space agencies, commercial space companies funded by billionaires, and startups that want in on the action. Here’s the tech they’re building that may enable humanity’s return to the moon, and the building of the first permanent moon base.

A Wearable That Helps Women Get / Not Get Pregnant

Post Syndicated from Eliza Strickland original

The in-ear sensor from Yono Labs will soon predict a woman’s fertile days

Journal Watch report logo, link to report landing page

Women’s bodies can be mysterious things—even to the women who inhabit them. But a wearable gadget called the Yono aims to replace mystery with knowledge derived from statistics, big data, and machine learning.

A woman who is trying to get pregnant may spend months tracking her ovulation cycle, often making a daily log of biological signals to determine her few days of fertility. While a plethora of apps promise to help, several studies have questioned these apps’ accuracy and efficacy.

Meanwhile, a woman who is trying to avoid pregnancy by the “fertility awareness method” may well not avoid it, since the method is only 75 percent effective