This is a guest post. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
In my mythical free time outside of professorhood, I’m a stand-up comedian and improviser. As a comedian, I’ve often found myself wishing I could banter with modern commercial AI assistants. They don’t have enough comedic skills for my taste! This longing for cheeky AI eventually led me to study autonomous robot comedians, and to teach my own robot how to perform stand-up.
For the most part, robots are a mystery to end users. And that’s part of the point: Robots are autonomous, so they’re supposed to do their own thing (presumably the thing that you want them to do) and not bother you about it. But as humans start to work more closely with robots, in collaborative tasks or social or assistive contexts, it’s going to be hard for us to trust them if their autonomy is such that we find it difficult to understand what they’re doing.
In a paper published in Science Robotics, researchers from UCLA have developed a robotic system that can generate different kinds of real-time, human-readable explanations about its actions, and then did some testing to figure which of the explanations were the most effective at improving a human’s trust in the system. Does this mean we can totally understand and trust robots now? Not yet—but it’s a start.
This is a guest post. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
At an early age, as we take our first steps into the world of math and numbers, we learn that one apple plus another apple equals two apples. We learn to count real things. Only later are we introduced to a weird concept: zero… or the number of apples in an empty box.
The concept of “zero” revolutionized math after Hindu-Arabic scholars and then the Italian mathematician Fibonacci introduced it into our modern numbering system. While today we comfortably use zero in all our mathematical operations, the concept of “nothing” has yet to enter the realm of artificial intelligence.
In a sense, AI and deep learning still need to learn how to recognize and reason with nothing.
In a typical task, a DNN might be trained to visually recognize a certain number of classes, say pictures of apples and bananas. Deep learning algorithms, when fed a good quantity and quality of data, are really good at coming up with precise, low error, confident classifications.
The problem arises when a third, unknown object appears in front of the DNN. If an unknown object that was not present in the training set is introduced, such as an orange, then the network will be forced to “guess” and classify the orange as the closest class that captures the unknown object—an apple!
Basically, the world for a DNN trained on apples and bananas is completely made of apples and bananas. It can’t conceive the whole fruit basket.
Enter the world of nothing
While its usefulness is not immediately clear in all applications, the idea of “nothing” or a “class zero” is extremely useful in several ways when training and deploying a DNN.
During the training process, if a DNN has the ability to classify items as “apple,” “banana,” or “nothing,” the algorithm’s developers can determine if it hasn’t effectively learned to recognize a particular class. That said, if pictures of fruit continue to yield “nothing” responses, perhaps the developers need to add another “class” of fruit to identify, such as oranges.
Meanwhile, in a deployment scenario, a DNN trained to recognize healthy apples and bananas can answer “nothing” if there is a deviation from the prototypical fruit it has learned to recognize. In this sense, the DNN may act as an anomaly detection network—aside from classifying apples and bananas, it can also, without further changes, signal when it sees something that deviates from the norm.
As of today, there are no easy ways to train a standard DNN so that it can provide the functionality above.
One new approach called a lifelong DNN naturally incorporates the concept of “nothing” in its architecture. A lifelong DNN does this by cleverly utilizing feedback mechanisms to determine whether an input is a close match or instead a mismatch with what it has learned in the past.
This mechanism resembles how humans learn: we subconsciously and continuously check if our predictions match our world. For example, if somebody plays a trick on you and changes the height of your office chair, you’ll immediately notice it. That’s because you have a “model” of the height of your office chair that you’ve learned over time—if that model is disconfirmed, you realize the anomaly right away. We humans continuously check that our classifications match reality. If they don’t, our brains notice and emit an alert. For us, there are not only apples and bananas; there’s also the ability to reason that “I thought it was an apple, but it isn’t.”
A lifelong DNN captures this mechanism in its functioning, so it can output “nothing” when the model it has learned is disconfirmed.
Nothing to work with, no problem
Armed with a basic understanding of “nothing” using the example of apples and bananas, let’s now consider how this would play out in real-world applications beyond fruit identification.
Consider the manufacturing sector, where machines are tasked with producing massive volumes of products. Training a traditional computer-vision system to recognize different abnormalities in a product—say, surface scratches—is very challenging. On a well-run manufacturing line there aren’t many examples of what “bad” products look like, and “bad” can take an endless number of forms. There simply isn’t an abundance of data about bad products that can be used to train the system.
But with a lifelong DNN, a developer could train the computer-vision system to recognize what different examples of “good” products look like. Then, when the system detects a product that doesn’t match its definition of good, it can categorize that item as an anomaly to be handled appropriately.
For manufacturers, lifelong DNNs and the ability to detect anomalies can save time and improve efficiency in the production line. There may be similar benefits for countless other industries that are increasingly relying on AI.
In September, Facebook sent out a strange casting call: We need all types of people to look into a webcam or phone camera and say very mundane things. The actors stood in bedrooms, hallways, and backyards, and they talked about topics such as the perils of junk food and the importance of arts education. It was a quick and easy gig—with an odd caveat. Facebook researchers would be altering the videos, extracting each person’s face and fusing it onto another person’s head. In other words, the participants had to agree to become deepfake characters.
Facebook’s artificial intelligence (AI) division put out this casting call so it could ethically produce deepfakes—a term that originally referred to videos that had been modified using a certain face-swapping technique but is now a catchall for manipulated video. The Facebook videos are part of a training data set that the company assembled for a global competition called the Deepfake Detection Challenge. In this competition—produced in cooperation with Amazon, Microsoft, the nonprofit Partnership on AI, and academics from eight universities—researchers around the world are vying to create automated tools that can spot fraudulent media.
The competition launched today, with an announcement at the AI conference NeurIPS, and will accept entries through March 2020. Facebook has dedicated more than US $10 million for awards and grants.
Cristian Canton Ferrer helped organize the challenge as research manager for Facebook’s AI Red Team, which analyzes the threats that AI poses to the social media giant. He says deepfakes are a growing danger not just to Facebook but to democratic societies. Manipulated videos that make politicians appear to do and say outrageous things could go viral before fact-checkers have a chance to step in.
While such a full-blown synthetic scandal has yet to occur, the Italian public recently got a taste of the possibilities. In September, a satirical news show aired a deepfake video featuring a former Italian prime minister apparently lavishing insults on other politicians. Most viewers realized it was a parody, but a few did not.
The U.S. presidential elections in 2020 are an added incentive to get ahead of the problem, says Canton Ferrer. He believes that media manipulation will become much more common over the coming year, and that the deepfakes will get much more sophisticated and believable. “We’re thinking about what will be happening a year from now,” he says. “It’s a cat-and-mouse approach.” Canton Ferrer’s team aims to give the cat a head start, so it will be ready to pounce.
The growing threat of deepfakes
Just how easy is it to make deepfakes? A recent audit of online resources for altering videos found that the available open-source software still requires a good amount of technical expertise. However, the audit also turned up apps and services that are making it easier for almost anyone to get in on the action. In China, a deepfake app called Zao took the country by storm in September when it offered people a simple way to superimpose their own faces onto those of actors like Leonardo DiCaprio and Marilyn Monroe.
It may seem odd that the data set compiled for Facebook’s competition is filled with unknown people doing unremarkable things. But a deepfake detector that works on those mundane videos should work equally well for videos featuring politicians. To make the Facebook challenge as realistic as possible, Canton Ferrer says his team used the most common open-source techniques to alter the videos—but he won’t name the methods, to avoid tipping off contestants. “In real life, they will not be able to ask the bad actors, ‘Can you tell me what method you used to make this deepfake?’” he says.
In the current competition, detectors will be scanning for signs of facial manipulation. However, the Facebook team is keeping an eye on new and emerging attack methods, such as full-body swaps that change the appearance and actions of a person from head to toe. “There are some of those out there, but they’re pretty obvious now,” Canton Ferrer says. “As they get better, we’ll add them to the data set.” Even after the detection challenge concludes in March, he says, the Facebook team will keep working on the problem of deepfakes.
As for how the winning detection methods will be used and whether they’ll be integrated into Facebook’s operations, Canton Ferrer says those decisions aren’t up to him. The Partnership on AI’s steering committee on AI and media integrity, which is overseeing the competition, will decide on the next steps, he says. Claire Leibowicz, who leads that steering committee, says the group will consider “coordinated efforts” to fight back against the global challenge of synthetic and manipulated media.
DARPA’s efforts on deepfake detection
The Facebook challenge is far from the only effort to counter deepfakes. DARPA’s Media Forensics program launched in 2016, a year before the first deepfake videos surfaced on Reddit. Program manager Matt Turek says that as the technology took off, the researchers working under the program developed a number of detection technologies, generally looking for “digital integrity, physical integrity, or semantic integrity.”
Digital integrity is defined by the patterns in an image’s pixels that are invisible to the human eye. These patterns can arise from cameras and video processing software, and any inconsistencies that appear are a tip-off that a video has been altered. Physical integrity refers to the consistency in lighting, shadows, and other physical attributes in an image. Semantic integrity considers the broader context. If a video shows an outdoor scene, for example, a deepfake detector might check the time stamp and location to look up the weather report from that time and place. The best automated detector, Turek says, would “use all those techniques to produce a single integrity score that captures everything we know about a digital asset.”
Turek says his team has created a prototype Web portal (restricted to its government partners) to demonstrate a sampling of the detectors developed during the program. When the user uploads a piece of media via the Web portal, more than 20 detectors employ a range of different approaches to try to determine whether an image or video has been manipulated. Turek says his team continues to add detectors to the system, which is already better than humans at spotting fakes.
A successor to the Media Forensics program will launch in mid-2020: the Semantic Forensics program. This broader effort will cover all types of media—text, images, videos, and audio—and will go beyond simply detecting manipulation. It will also seek methods to understand the importance of the manipulations, which could help organizations decide which content requires human review. “If you manipulate a vacation photo by adding a beach ball, it really doesn’t matter,” Turek says. “But if you manipulate an image about a protest and add an object like a flag, that could change people’s understanding of who was involved.”
The Semantic Forensics program will also try to develop tools to determine if a piece of media really comes from the source it claims. Eventually, Turek says, he’d like to see the tech community embrace a system of watermarking, in which a digital signature would be embedded in the media itself to help with the authentication process. One big challenge of this idea is that every software tool that interacts with the image, video, or other piece of media would have to “respect that watermark, or add its own,” Turek says. “It would take a long time for the ecosystem to support that.”
A deepfake detection tool for consumers
In the meantime, the AI Foundation has a plan. This nonprofit is building a tool called Reality Defender that’s due to launch in early 2020. “It will become your personal AI guardian who’s watching out for you,” says Rob Meadows, president and chief technology officer for the foundation.
Reality Defender is a plug-in for Web browsers and an app for mobile phones. It scans everything on the screen using a suite of automatic detectors, then alerts the user about altered media. Detection alone won’t make for a useful tool, since Photoshop and other editing tools are widely used in fashion, advertising, and entertainment. If Reality Defender draws attention to every altered piece of content, Meadows notes, “it will flood consumers to the point where they say, ‘We don’t care anymore, we have to tune it out.’”
To avoid that problem, users will be able to dial the tool’s sensitivity up or down, depending on how many alerts they want. Meadows says beta testers are currently training the system, giving it feedback on which types of manipulations they care about. Once Reality Defender launches, users will be able to personalize their AI guardian by giving it a thumbs-up or thumbs-down on alerts, until it learns their preferences. “A user can say, ‘For my level of paranoia, this is what works for me,’ ” Meadows says.
He sees the software as a useful stopgap solution, but ultimately he hopes that his group’s technologies will be integrated into platforms such as Facebook, YouTube, and Twitter. He notes that Biz Stone, cofounder of Twitter, is a member of the AI Foundation’s board. To truly protect society from fake media, Meadows says, we need tools that prevent falsehoods from getting hosted on platforms and spread via social media. Debunking them after they’ve already spread is too late.
The researchers at Jigsaw, a unit of Alphabet that works on technology solutions for global challenges, would tend to agree. Technical research manager Andrew Gully says his team identified synthetic media as a societal threat some years back. To contribute to the fight, Jigsaw teamed up with sister company Google AI to produce a deepfake data set of its own in late 2018, which they contributed to the FaceForensics data set hosted by the Technical University of Munich.
Gully notes that while we haven’t yet seen a political crisis triggered by a deepfake, these videos are also used for bullying and “revenge porn,” in which a targeted woman’s face is pasted onto the face of an actor in a porno. (While pornographic deepfakes could in theory target men, a recent audit of deepfake content found that 100 percent of the pornographic videos focused on women.) What’s more, Gully says people are more likely to be credulous of videos featuring unknown individuals than famous politicians.
But it’s the threat to free and fair elections that feels most crucial in this U.S. election year. Gully says systems that detect deepfakes must take a careful approach in communicating the results to users. “We know already how difficult it is to convince people in the face of their own biases,” Gully says. “Detecting a deepfake video is hard enough, but that’s easy compared to how difficult it is to convince people of things they don’t want to believe.”
Yoshua Bengio is known as one of the “three musketeers” of deep learning, the type of artificial intelligence (AI) that dominates the field today.
Bengio, a professor at the University of Montreal, is credited with making key breakthroughs in the use of neural networks—and just as importantly, with persevering with the work through the long cold AI winter of the late 1980s and the 1990s, when most people thought that neural networks were a dead end.
IEEE Spectrum: What do you think about all the discussion of deep learning’s limitations?
Yoshua Bengio: Too many public-facing venues don’t understand a central thing about the way we do research, in AI and other disciplines: We try to understand the limitations of the theories and methods we currently have, in order to extend the reach of our intellectual tools. So deep learning researchers are looking to find the places where it’s not working as well as we’d like, so we can figure out what needs to be added and what needs to be explored.
This is picked up by people like Gary Marcus, who put out the message: “Look, deep learning doesn’t work.” But really, what researchers like me are doing is expanding its reach. When I talk about things like the need for AI systems to understand causality, I’m not saying that this will replace deep learning. I’m trying to add something to the toolbox.
What matters to me as a scientist is what needs to be explored in order to solve the problems. Not who’s right, who’s wrong, or who’s praying at which chapel.
Spectrum: How do you assess the current state of deep learning?
Bengio: In terms of how much progress we’ve made in this work over the last two decades: I don’t think we’re anywhere close today to the level of intelligence of a two-year-old child. But maybe we have algorithms that are equivalent to lower animals, for perception. And we’re gradually climbing this ladder in terms of tools that allow an entity to explore its environment.
One of the big debates these days is: What are the elements of higher-level cognition? Causality is one element of it, and there’s also reasoning and planning, imagination, and credit assignment (“what should I have done?”). In classical AI, they tried to obtain these things with logic and symbols. Some people say we can do it with classic AI, maybe with improvements.
Then there are people like me, who think that we should take the tools we’ve built in last few years to create these functionalities in a way that’s similar to the way humans do reasoning, which is actually quite different from the way a purely logical system based on search does it.
Spectrum: How can we create functions similar to human reasoning?
Bengio: Attention mechanisms allow us to learn how to focus our computation on a few elements, a set of computations. Humans do that—it’s a particularly important part of conscious processing. When you’re conscious of something, you’re focusing on a few elements, maybe a certain thought, then you move on to another thought. This is very different from standard neural networks, which are instead parallel processing on a big scale. We’ve had big breakthroughs on computer vision, translation, and memory thanks to these attention mechanisms, but I believe it’s just the beginning of a different style of brain-inspired computation.
It’s not that we have solved the problem, but I think we have a lot of the tools to get started. And I’m not saying it’s going to be easy. I wrote a paper in 2017 called “The Consciousness Prior” that laid out the issue. I have several students working on this and I know it is a long-term endeavor.
Spectrum: What other aspects of human intelligence would you like to replicate in AI?
Bengio: We also talk about the ability of neural nets to imagine: Reasoning, memory, and imagination are three aspects of the same thing going on in your mind. You project yourself into the past or the future, and when you move along these projections, you’re doing reasoning. If you anticipate something bad happening in the future, you change course—that’s how you do planning. And you’re using memory too, because you go back to things you know in order to make judgments. You select things from the present and things from the past that are relevant.
Attention is the crucial building block here. Let’s say I’m translating a book into another language. For every word, I have to carefully look at a very small part of the book. Attention allows you abstract out a lot of irrelevant details and focus what matters. Being able to pick out the relevant elements—that’s what attention does.
Spectrum: How does that translate to machine learning?
Bengio: You don’t have to tell the neural net what to pay attention to—that’s the beauty of it. It learns it on its own. The neural net learns how much attention, or weight, it should give to each element in a set of possible elements to consider.
Spectrum: How is your recent work on causality related to these ideas?
Bengio: The kind of high-level concepts that you reason with tend to be variables that are cause and/or effect. You don’t reason based on pixels. You reason based on concepts like door or knob or open or closed. Causality is very important for the next steps of progress of machine learning.
And it’s related to another topic that is much on the minds of people in deep learning. Systematic generalization is the ability humans have to generalize the concepts we know, so they can be combined in new ways that are unlike anything else we’ve seen. Today’s machine learning doesn’t know how to do that. So you often have problems relating to training on a particular data set. Say you train in one country, and then deploy in another country. You need generalization and transfer learning. How do you train a neural net so that if you transfer it into a new environment, it continues to work well or adapts quickly?
Spectrum: What’s the key to that kind of adaptability?
Bengio:Meta-learning is a very hot topic these days: Learning to learn. I wrote an early paper on this in 1991, but only recently did we get the computational power to implement this kind of thing. It’s computationally expensive. The idea: In order to generalize to a new environment, you have to practice generalizing to a new environment. It’s so simple when you think about it. Children do it all the time. When they move from one room to another room, the environment is not static, it keeps changing. Children train themselves to be good at adaptation. To do that efficiently, they have to use the pieces of knowledge they’ve acquired in the past. We’re starting to understand this ability, and to build tools to replicate it.
One critique of deep learning is that it requires a huge amount of data. That’s true if you just train it on one task. But children have the ability to learn based on very little data. They capitalize on the things they’ve learned before. But more importantly, they’re capitalizing on their ability to adapt and generalize.
Spectrum: Will any of these ideas be used in the real world anytime soon?
Bengio: No. This is all very basic research using toy problems. That’s fine, that’s where we’re at. We can debug these ideas, move on to new hypotheses. This is not ready for industry tomorrow morning.
But there are two practical limitations that industry cares about, and that this research may help. One is building systems that are more robust to changes in the environment. Two: How do we build natural language processing systems, dialogue systems, virtual assistants? The problem with the current state of the art systems that use deep learning is that they’re trained on huge quantities of data, but they don’t really understand well what they’re talking about. People like Gary Marcus pick up on this and say, “That’s proof that deep learning doesn’t work.” People like me say, “That’s interesting, let’s tackle the challenge.”
Bengio: There’s an idea called grounded language learning which is attracting new attention recently. The idea is, an AI system should not learn only from text. It should learn at the same time how the world works, and how to describe the world with language. Ask yourself: Could a child understand the world if they were only interacting with the world via text? I suspect they would have a hard time.
This has to do with conscious versus unconscious knowledge, the things we know but can’t name. A good example of that is intuitive physics. A two-year-old understands intuitive physics. They don’t know Newton’s equations, but they understand concepts like gravity in a concrete sense. Some people are now trying to build systems that interact with their environment and discover the basic laws of physics.
Spectrum: Why would a basic grasp of physics help with conversation?
Bengio: The issue with language is that often the system doesn’t really understand the complexity of what the words are referring to. For example, the statements used in the Winograd schema; in order to make sense of them, you have to capture physical knowledge. There are sentences like: “Jim wanted to put the lamp into his luggage, but it was too large.” You know that if this object is too large for putting in the luggage, it must be the “it,” the subject of the second phrase. You can communicate that kind of knowledge in words, but it’s not the kind of thing we go around saying: “The typical size of a piece of luggage is x by x.”
We need language understanding systems that also understand the world. Currently, AI researchers are looking for shortcuts. But they won’t be enough. AI systems also need to acquire a model of how the world works.
AI experts gathered at MIT last week, with the aim of predicting the role artificial intelligence will play in the future of work. Will it be the enemy of the human worker? Will it prove to be a savior? Or will it be just another innovation—like electricity or the internet?
As IEEE Spectrumpreviously reported, this conference (“AI and the Future of Work Congress”), held at MIT’s Kresge Auditorium, offered sometimes pessimistic outlooks on the job- and industry-destroying path that AI and automation seems to be taking: Self-driving technology will put truck drivers out of work; smart law clerk algorithms will put paralegals out of work; robots will (continue to) put factory and warehouse workers out of work.
Andrew McAfee, co-director of MIT’s Initiative on the Digital Economy, said even just in the past couple years, he’s noticed a shift in the public’s perception of AI. “I remember from previous versions of this conference, it felt like we had to make the case that we’re living in a period of accelerating change and that AI’s going to have a big impact,” he said. “Nobody had to make that case today.”
Elisabeth Reynolds, executive director of MIT’s Task Force on the Work of the Future, noted that following the path of least resistance is not a viable way forward. “If we do nothing, we’re in trouble,” she said. “The future will not take care of itself. We have to do something about it.”
Panelists and speakers spoke about championing productive uses of AI in the workplace, which ultimately benefit both employees and customers.
As one example, Zeynep Ton, professor at MIT Sloan School of Management, highlighted retailer Sam’s Club’s recent rollout of a program called Sam’s Garage. Previously customers shopping for tires for their car spent somewhere between 30 and 45 minutes with a Sam’s Club associate paging through manuals and looking up specs on websites.
But with an AI algorithm, they were able to cut that spec hunting time down to 2.2 minutes. “Now instead of wasting their time trying to figure out the different tires, they can field the different options and talk about which one would work best [for the customer],” she said. “This is a great example of solving a real problem, including [enhancing] the experience of the associate as well as the customer.”
“We think of it as an AI-first world that’s coming,” said Scott Prevost, VP of engineering at Adobe. Prevost said AI agents in Adobe’s software will behave something like a creative assistant or intern who will take care of more mundane tasks for you.
Prevost cited an internal survey of Adobe customers that found 74 percent of respondents’ time was spent doing repetitive work—the kind that might be automated by an AI script or smart agent.
“It used to be you’d have the resources to work on three ideas [for a creative pitch or presentation],” Prevost said. “But if the AI can do a lot of the production work, then you can have 10 or 100. Which means you can actually explore some of the further out ideas. It’s also lowering the bar for everyday people to create really compelling output.”
In addition to changing the nature of work, noted a number of speakers at the event, AI is also directly transforming the workforce.
Jacob Hsu, CEO of the recruitment company Catalyte spoke about using AI as a job placement tool. The company seeks to fill myriad positions including auto mechanics, baristas, and office workers—with its sights on candidates including young people and mid-career job changers. To find them, it advertises on Craigslist, social media, and traditional media.
The prospects who sign up with Catalyte take a battery of tests. The company’s AI algorithms then match each prospect’s skills with the field best suited for their talents.
“We want to be like the Harry Potter Sorting Hat,” Hsu said.
Guillermo Miranda, IBM’s global head of corporate social responsibility, said IBM has increasingly been hiring based not on credentials but on skills. For instance, he said, as much as 50 per cent of the company’s new hires in some divisions do not have a traditional four-year college degree. “As a company, we need to be much more clear about hiring by skills,” he said. “It takes discipline. It takes conviction. It takes a little bit of enforcing with H.R. by the business leaders. But if you hire by skills, it works.”
Ardine Williams, Amazon’s VP of workforce development, said the e-commerce giant has been experimenting with developing skills of the employees at its warehouses (a.k.a. fulfillment centers) with an eye toward putting them in a position to get higher-paying work with other companies.
She described an agreement Amazon had made in its Dallas fulfillment center with aircraft maker Sikorsky, which had been experiencing a shortage of skilled workers for its nearby factory. So Amazon offered to its employees a free certification training to seek higher-paying work at Sikorsky.
“I do that because now I have an attraction mechanism—like a G.I. Bill,” Williams said. The program is also only available for employees who have worked at least a year with Amazon. So their program offers medium-term job retention, while ultimately moving workers up the wage ladder.
Radha Basu, CEO of AI data company iMerit, said her firm aggressively hires from the pool of women and under-resourced minority communities in the U.S. and India. The company specializes in turning unstructured data (e.g. video or audio feeds) into tagged and annotated data for machine learning, natural language processing, or computer vision applications.
“There is a motivation with these young people to learn these things,” she said. “It comes with no baggage.”
Alastair Fitzpayne, executive director of The Aspen Institute’s Future of Work Initiative, said the future of work ultimately means, in bottom-line terms, the future of human capital. “We have an R&D tax credit,” he said. “We’ve had it for decades. It provides credit for companies that make new investment in research and development. But we have nothing on the human capital side that’s analogous.”
So a company that’s making a big investment in worker training does it on their own dime, without any of the tax benefits that they might accrue if they, say, spent it on new equipment or new technology. Fitzpayne said a simple tweak to the R&D tax credit could make a big difference by incentivizing new investment programs in worker training. Which still means Amazon’s pre-existing worker training programs—for a company that already famously pays no taxes—would not count.
“We need a different way of developing new technologies,” said Daron Acemoglu, MIT Institute Professor of Economics. He pointed to the clean energy sector as an example. First a consensus around the problem needs to emerge. Then a broadly agreed-upon set of goals and measurements needs to be developed (e.g., that AI and automation would, for instance, create at least X new jobs for every Y jobs that it eliminates).
Then it just needs to be implemented.
“We need to build a consensus that, along the path we’re following at the moment, there are going to be increasing problems for labor,” Acemoglu said. “We need a mindset change. That it is not just about minimizing costs or maximizing tax benefits, but really worrying about what kind of society we’re creating and what kind of environment we’re creating if we keep on just automating and [eliminating] good jobs.”
In February of this year, OpenAI, one of the foremost artificial intelligence labs in the world, announced that a team of researchers had built a powerful new text generator called the Generative Pre-Trained Transformer 2, or GPT-2 for short. The researchers used a reinforcement learning algorithm to train their system on a broad set of natural language processing (NLP) capabilities, including reading comprehension, machine translation, and the ability to generate long strings of coherent text.
But as is often the case with NLP technology, the tool held both great promise and great peril. Researchers and policy makers at the lab were concerned that their system, if widely released, could be exploited by bad actors and misappropriated for “malicious purposes.”
The people of OpenAI, which defines its mission as “discovering and enacting the path to safe artificial general intelligence,” were concerned that GPT-2 could be used to flood the Internet with fake text, thereby degrading an already fragile information ecosystem. For this reason, OpenAI decided that it would not release the full version of GPT-2 to the public or other researchers.
In March 2016, Microsoft was preparing to release its new chatbot, Tay, on Twitter. Described as an experiment in “conversational understanding,” Tay was designed to engage people in dialogue through tweets or direct messages, while emulating the style and slang of a teenage girl. She was, according to her creators, “Microsoft’s A.I. fam from the Internet that’s got zero chill.” She loved E.D.M. music, had a favorite Pokémon, and often said extremely online things, like “swagulated.”
Tay was an experiment at the intersection of machine learning, natural language processing, and social networks. While other chatbots in the past—like Joseph Weizenbaum’s Eliza—conducted conversation by following pre-programmed and narrow scripts, Tay was designed to learn more about language over time, enabling her to have conversations about any topic.
Machine learning works by developing generalizations from large amounts of data. In any given data set, the algorithm will discern patterns and then “learn” how to approximate those patterns in its own behavior.
Using this technique, engineers at Microsoft trained Tay’s algorithm on a dataset of anonymized public data along with some pre-written material provided by professional comedians to give it a basic grasp of language. The plan was to release Tay online, then let the bot discover patterns of language through its interactions, which she would emulate in subsequent conversations. Eventually, her programmers hoped, Tay would sound just like the Internet.
On March 23, 2016, Microsoft released Tay to the public on Twitter. At first, Tay engaged harmlessly with her growing number of followers with banter and lame jokes. But after only a few hours, Tay started tweeting highly offensive things, such as: “I [email protected]#%&*# hate feminists and they should all die and burn in hell” or “Bush did 9/11 and Hitler would have done a better job…”
This week at MIT, academics and industry officials compared notes, studies, and predictions about AI and the future of work. During the discussions, an insurance company executive shared details about one AI program that rolled out at his firm earlier this year. A chatbot the company introduced, the executive said, now handles 150,000 calls per month.
Later in the day, a panelist—David Fanning, founder of PBS’s Frontline—remarked that this statistic is emblematic of broader fears he saw when reporting a new Frontline documentary about AI. “People are scared,” Fanning said of the public’s AI anxiety.
We’ve all seen this moment in the movies—on board, say, a submarine or a spaceship, the chief engineer will suddenly cock their ear to listen to the background hum and say “something’s wrong.” Bosch is hoping to teach a computer how to do that trick in real life, and is going all the way to the International Space Station to test its technology.
Considering the amount of data that’s communicated through non-speech sound, humans do a remarkably poor job of leveraging sound information. We’re very good at reacting to sounds (especially new or loud sounds) over relatively short timescales, but beyond that, our brains are great at just classifying most ongoing sounds as “background” and ignoring them. Computers, which have both the patience we generally lack, seem like they’d be much better at this, but the focus of most developers has been on discrete sound events (like smart home devices detecting smoke alarms or breaking glass) rather than longer term sound patterns.
Why should those of us who aren’t movie characters care about how patterns of sound change over time? The simple reason is because our everyday lives are full of machines that both make a lot of noise and tend to break expensively from time to time. Right now, I’m listening to my washing machine, which makes some weird noises. I don’t have a very good idea of whether those weird noises are normal weird noises, and more to the point, I have an even worse idea whether it was making the same weird noises the last time I ran it. Knowing whether a machine is making weirder noises than it used to be, could potentially clue me in to an emerging problem, one that I could solve through cheap preventative maintenance rather than an expensive repair later on.
Bosch, the German company that almost certainly makes a significant percentage of the parts in your car as well as appliances, power tools, industrial systems, and a whole bunch of other stuff, is trying to figure out how they can use deep learning to identify and track the noises that machines make over time. The idea is to be able to identify subtle changes in sound to warn of pending problems before they happen. And one group of people very interesting in getting advanced warning of problems are the astronauts floating around in the orbiting bubble of life that is the ISS.
While there were already some rudimentary digital language generators in existence—programs that could spit out somewhat coherent lines of text—Weizenbaum’s program was the first designed explicitly for interactions with humans. The user could type in some statement or set of statements in their normal language, press enter, and receive a response from the machine. As Weizenbaum explained, his program made “certain kinds of natural-language conversation between man and computer possible.”
He named the program Eliza after Eliza Doolittle, the working-class hero of George Bernard Shaw’s Pygmalion who learns how to talk with an upper-class accent. The new Eliza was written for the 36-bit IBM 7094, an early transistorized mainframe computer, in a programming language that Weizenbaum developed called MAD-SLIP.
Because computer time was a valuable resource, Eliza could only be run via a time-sharing system; the user interacted with the program remotely via an electric typewriter and printer. When the user typed in a sentence and pressed enter, a message was sent to the mainframe computer. Eliza scanned the message for the presence of a keyword and used it in a new sentence to form a response that was sent back, printed out, and read by the user.
In 1913, the Russian mathematician Andrey Andreyevich Markov sat down in his study in St. Petersburg with a copy of Alexander Pushkin’s 19th century verse novel, Eugene Onegin, a literary classic at the time. Markov, however, did not start reading Pushkin’s famous text. Rather, he took a pen and piece of drafting paper, and wrote out the first 20,000 letters of the book in one long string of letters, eliminating all punctuation and spaces. Then he arranged these letters in 200 grids (10-by-10 characters each) and began counting the vowels in every row and column, tallying the results.
Sure, artificial intelligence is transforming the world’s societies and economies—but can an AI come up with plausible ideas for a Halloween costume?
Janelle Shane has been asking such probing questions since she started her AI Weirdness blog in 2016. She specializes in training neural networks (which underpin most of today’s machine learning techniques) on quirky data sets such as compilations of knitting instructions, ice cream flavors, and names of paint colors. Then she asks the neural net to generate its own contributions to these categories—and hilarity ensues. AI is not likely to disrupt the paint industry with names like “Ronching Blue,” “Dorkwood,” and “Turdly.”
Shane’s antics have a serious purpose. She aims to illustrate the serious limitations of today’s AI, and to counteract the prevailing narrative that describes AI as well on its way to superintelligence and complete human domination. “The danger of AI is not that it’s too smart,” Shane writes in her new book, “but that it’s not smart enough.”
The book, which came out on Tuesday, is called You Look Like a Thing and I Love You. It takes its odd title from a list of AI-generated pick-up lines, all of which would at least get a person’s attention if shouted, preferably by a robot, in a crowded bar. Shane’s book is shot through with her trademark absurdist humor, but it also contains real explanations of machine learning concepts and techniques. It’s a painless way to take AI 101.
IEEE Spectrum: You studied electrical engineering as an undergrad, then got a master’s degree in physics. How did that lead to you becoming the comedian of AI?
Janelle Shane: I’ve been interested in machine learning since freshman year of college. During orientation at Michigan State, a professor who worked on evolutionary algorithms gave a talk about his work. It was full of the most interesting anecdotes–some of which I’ve used in my book. He told an anecdote about people setting up a machine learning algorithm to do lens design, and the algorithm did end up designing an optical system that works… except one of the lenses was 50 feet thick, because they didn’t specify that it couldn’t do that.
I started working in his lab on optics, doing ultra-short laser pulse work. I ended up doing a lot more optics than machine learning, but I always found it interesting. One day I came across a list of recipes that someone had generated using a neural net, and I thought it was hilarious and remembered why I thought machine learning was so cool. That was in 2016, ages ago in machine learning land.
Spectrum: So you decided to “establish weirdness as your goal” for your blog. What was the first weird experiment that you blogged about?
Shane: It was generating cookbook recipes. The neural net came up with ingredients like: “Take ¼ pounds of bones or fresh bread.” That recipe started out: “Brown the salmon in oil, add creamed meat to the mixture.” It was making mistakes that showed the thing had no memory at all.
Spectrum: You say in the book that you can learn a lot about AI by giving it a task and watching it flail. What do you learn?
Shane: One thing you learn is how much it relies on surface appearances rather than deep understanding. With the recipes, for example: It got the structure of title, category, ingredients, instructions, yield at the end. But when you look more closely, it has instructions like “Fold the water and roll it into cubes.” So clearly this thing does not understand water, let alone the other things. It’s recognizing certain phrases that tend to occur, but it doesn’t have a concept that these recipes are describing something real. You start to realize how very narrow the algorithms in this world are. They only know exactly what we tell them in our data set.
“The narrower the problem, the smarter the AI will seem”
Spectrum: That makes me think of DeepMind’s AlphaGo, which was universally hailed as a triumph for AI. It can play the game of Go better than any human, but it doesn’t know what Go is. It doesn’t know that it’s playing a game.
Shane: It doesn’t know what a human is, or if it’s playing against a human or another program. That’s also a nice illustration of how well these algorithms do when they have a really narrow and well-defined problem.
The narrower the problem, the smarter the AI will seem. If it’s not just doing something repeatedly but instead has to understand something, coherence goes down. For example, take an algorithm that can generate images of objects. If the algorithm is restricted to birds, it could do a recognizable bird. If this same algorithm is asked to generate images of any animal, if its task is that broad, the bird it generates becomes an unrecognizable brown feathered smear against a green background.
Spectrum: That sounds… disturbing.
Shane: It’s disturbing in a weird amusing way. What’s really disturbing is the humans it generates. It hasn’t seen them enough times to have a good representation, so you end up with an amorphous, usually pale-faced thing with way too many orifices. If you asked it to generate an image of a person eating pizza, you’ll have blocks of pizza texture floating around. But if you give that image to an image-recognition algorithm that was trained on that same data set, it will say, “Oh yes, that’s a person eating pizza.”
Spectrum: Do you see it as your role to puncture the AI hype?
Shane: I do see it that way. Not a lot of people are bringing out this side of AI. When I first started posting my results, I’d get people saying, “I don’t understand, this is AI, shouldn’t it be better than this? Why doesn’t it understand?” Many of the impressive examples of AI have a really narrow task, or they’ve been set up to hide how little understanding it has. There’s a motivation, especially among people selling products based on AI, to represent the AI as more competent and understanding than it actually is.
Spectrum: If people overestimate the abilities of AI, what risk does that pose?
Shane: I worry when I see people trusting AI with decisions it can’t handle, like hiring decisions or decisions about moderating content. These are really tough tasks for AI to do well on. There are going to be a lot of glitches. I see people saying, “The computer decided this so it must be unbiased, it must be objective.”
That’s another thing I find myself highlighting in the work I’m doing. If the data includes bias, the algorithm will copy that bias. You can’t tell it not to be biased, because it doesn’t understand what bias is. I think that message is an important one for people to understand.
If there’s bias to be found, the algorithm is going to go after it. It’s like, “Thank goodness, finally a signal that’s reliable.” But for a tough problem like: Look at these resumes and decide who’s best for the job. If its task is to replicate human hiring decisions, it’s going to glom onto gender bias and race bias. There’s an example in the book of a hiring algorithm that Amazon was developing that discriminated against women, because the historical data it was trained on had that gender bias.
Spectrum: What are the other downsides of using AI systems that don’t really understand their tasks?
Shane: There is a risk in putting too much trust in AI and not examining its decisions. Another issue is that it can solve the wrong problems, without anyone realizing it. There have been a couple of cases in medicine. For example, there was an algorithm that was trained to recognize things like skin cancer. But instead of recognizing the actual skin condition, it latched onto signals like the markings a surgeon makes on the skin, or a ruler placed there for scale. It was treating those things as a sign of skin cancer. It’s another indication that these algorithms don’t understand what they’re looking at and what the goal really is.
Spectrum: In your blog, you often have neural nets generate names for things—such as ice cream flavors, paint colors, cats, mushrooms, and types of apples. How do you decide on topics?
Shane: Quite often it’s because someone has written in with an idea or a data set. They’ll say something like, “I’m the MIT librarian and I have a whole list of MIT thesis titles.” That one was delightful. Or they’ll say, “We are a high school robotics team, and we know where there’s a list of robotics team names.” It’s fun to peek into a different world. I have to be careful that I’m not making fun of the naming conventions in the field. But there’s a lot of humor simply in the neural net’s complete failure to understand. Puns in particular—it really struggles with puns.
Spectrum: Your blog is quite absurd, but it strikes me that machine learning is often absurd in itself. Can you explain the concept of giraffing?
Shane: This concept was originally introduced by [internet security expert] Melissa Elliott. She proposed this phrase as a way to describe the algorithms’ tendency to see giraffes way more often than would be likely in the real world. She posted a whole bunch of examples, like a photo of an empty field in which an image-recognition algorithm has confidently reported that there are giraffes. Why does it think giraffes are present so often when they’re actually really rare? Because they’re trained on data sets from online. People tend to say, “Hey look, a giraffe!” And then take a photo and share it. They don’t do that so often when they see an empty field with rocks.
Spectrum: AI can be absurd, and maybe also creative. But you make the point that AI art projects are really human-AI collaborations: Collecting the data set, training the algorithm, and curating the output are all artistic acts on the part of the human. Do you see your work as a human-AI art project?
Shane: Yes, I think there is artistic intent in my work; you could call it literary or visual. It’s not so interesting to just take a pre-trained algorithm that’s been trained on utilitarian data, and tell it to generate a bunch of stuff. Even if the algorithm isn’t one that I’ve trained myself, I think about, what is it doing that’s interesting, what kind of story can I tell around it, and what do I want to show people.
Spectrum: For the past three years you’ve been getting neural nets to generate ideas for Halloween costumes. As language models have gotten dramatically better over the past three years, are the costume suggestions getting less absurd?
Shane: Yes. Before I would get a lot more nonsense words. This time I got phrases that were related to real things in the data set. I don’t believe the training data had the words Flying Dutchman or barnacle. But it was able to draw on its knowledge of which words are related to suggest things like sexy barnacle and sexy Flying Dutchman.
Spectrum: This year, I saw on Twitter that someone made the gothy giraffe costume happen. Would you ever dress up for Halloween in a costume that the neural net suggested?
Shane: I think that would be fun. But there would be some challenges. I would love to go as the sexy Flying Dutchman. But my ambition may constrict me to do something more like a list of leg parts.
Kratsios, the fourth to hold the U.S. CTO position since its creation by President Barack Obama in 2009, was confirmed in August as President Donald Trump’s first CTO. Before joining the Trump administration, he was chief of staff at investment firm Thiel Capital and chief financial officer of hedge fund Clarium Capital. Donahoe is Executive Director of Stanford’s Global Digital Policy Incubator and served as the first U.S. Ambassador to the United Nations Human Rights Council during the Obama Administration.
The conversation jumped around, hitting on both accomplishments and controversies. Kratsios touted the administration’s success in fixing policy around the use of drones, its memorandum on STEM education, and an increase in funding for basic research in AI—though the magnitude of that increase wasn’t specified. He pointed out that the Trump administration’s AI policy has been a continuation of the policies of the Obama administration, and will continue to build on that foundation. As proof of this, he pointed to Trump’s signing of the American AI Initiative earlier this year. That executive order, Kratsios said, was intended to bring various government agencies together to coordinate their AI efforts and to push the idea that AI is a tool for the American worker. The AI Initiative, he noted, also took into consideration that AI will cause job displacement, and asked private companies to pledge to retrain workers.
The administration, he said, is also looking to remove barriers to AI innovation. In service of that goal, the government will, in the next month or so, release a regulatory guidance memo instructing government agencies about “how they should think about AI technologies,” said Kratsios.
U.S. vs China in AI
A few of the exchanges between Kratsios and Donahoe hit on current hot topics, starting with the tension between the U.S. and China.
“You talk a lot about unique U.S. ecosystem. In which aspect of AI is the U.S. dominant, and where is China challenging us in dominance?
“They are challenging us on machine vision. They have more data to work with, given that they have surveillance data.”
“To what extent would you say the quantity of data collected and available will be a determining factor in AI dominance?”
“It makes a big difference in the short term. But we do research on how we get over these data humps. There is a future where you don’t need as much data, a lot of federal grants are going to [research in] how you can train models using less data.”
[Meanwhile,] we have adversaries around the world using AI to surveil people, to suppress human rights. That is why American leadership is so critical: We want to come out with the next great product. And we want our values to underpin the use cases.”
A member of the audience pushed further:
“Maintaining U.S. leadership in AI might have costs in terms of individuals and society. What costs should individuals and society bear to maintain leadership?”
“I don’t view the world that way. Our companies big and small do not hesitate to talk about the values that underpin their technology. [That is] markedly different from the way our adversaries think. The alternatives are so dire [that we] need to push efforts to bake the values that we hold dear into this technology.”
And then the conversation turned to the use of AI for facial recognition, an application which (at least for police and other government agencies) was recently banned in San Francisco.
“Some private sector companies have called for government regulation of facial recognition, and there already are some instances of local governments regulating it. Do you expect federal regulation of facial recognition anytime soon? If not, what ought the parameters be?”
“A patchwork of regulation of technology is not beneficial for the country. We want to avoid that. Facial recognition has important roles—for example, finding lost or displaced children. There are use cases, but they need to be underpinned by values.”
A member of the audience followed up on that topic, referring to some data presented earlier at the HAI conference on bias in AI:
“Frequently the example of finding missing children is given as the example of why we should not restrict use of facial recognition. But we saw Joy Buolamwini’s presentation on bias in data. I would like to hear your thoughts about how government thinks we should use facial recognition, knowing about this bias.”
“Fairness, accountability, and robustness are things we want to bake into any technology—not just facial recognition—as we build rules governing use cases.”
Immigration and innovation
A member of the audience brought up the issue of immigration:
“One major pillar of innovation is immigration, does your office advocate for it?”
“Our office pushes for best and brightest people from around the world to come to work here and study here. There are a few efforts we have made to move towards a more merit-based immigration system, without congressional action. [For example, in] the H1-B visa system, you go through two lotteries. We switched the order of them in order to get more people with advanced degrees through.”
The government’s tech infrastructure
Donahoe brought the conversation around to the tech infrastructure of the government itself:
“We talk about the shiny object, AI, but the 80 percent is the unsexy stuff, at federal and state levels. We don’t have a modern digital infrastructure to enable all the services—like a research cloud. How do we create this digital infrastructure?”
“I couldn’t agree more; the least partisan issue in Washington is about modernizing IT infrastructure. We spend like $85 billion a year on IT at the federal level, we can certainly do a better job of using those dollars.”
AI capable of automatically posting relevant comments on news articles has raised concerns that the technology could empower online disinformation campaigns designed to influence public opinion and national elections. The AI research in question, conducted by Microsoft Research Asia and Beihang University in China, became the subject of controversy even prior to the paper’s scheduled presentation at a major AI conference this week.
In 1666, the German polymath Gottfried Wilhelm Leibniz published an enigmatic dissertation entitled On the Combinatorial Art. Only 20 years old but already an ambitious thinker, Leibniz outlined a theory for automating knowledge production via the rule-based combination of symbols.
Leibniz’s central argument was that all human thoughts, no matter how complex, are combinations of basic and fundamental concepts, in much the same way that sentences are combinations of words, and words combinations of letters. He believed that if he could find a way to symbolically represent these fundamental concepts and develop a method by which to combine them logically, then he would be able to generate new thoughts on demand.
The idea came to Leibniz through his study of Ramon Llull, a 13th century Majorcan mystic who devoted himself to devising a system of theological reasoning that would prove the “universal truth” of Christianity to non-believers.
Llull himself was inspired by Jewish Kabbalists’ letter combinatorics (see part one of this series), which they used to produce generative texts that supposedly revealed prophetic wisdom. Taking the idea a step further, Llull invented what he called a volvelle, a circular paper mechanism with increasingly small concentric circles on which were written symbols representing the attributes of God. Llull believed that by spinning the volvelle in various ways, bringing the symbols into novel combinations with one another, he could reveal all the aspects of his deity.
Leibniz was much impressed by Llull’s paper machine, and he embarked on a project to create his own method of idea generation through symbolic combination. He wanted to use his machine not for theological debate, but for philosophical reasoning. He proposed that such a system would require three things: an “alphabet of human thoughts”; a list of logical rules for their valid combination and re-combination; and a mechanism that could carry out the logical operations on the symbols quickly and accurately—a fully mechanized update of Llull’s paper volvelle.
He imagined that this machine, which he called “the great instrument of reason,” would be able to answer all questions and resolve all intellectual debate. “When there are disputes among persons,” he wrote, “we can simply say, ‘Let us calculate,’ and without further ado, see who is right.”
This is part one of a six-part series on the history of natural language processing.
We’re in the middle of a boom time for natural language processing (NLP), the field of computer science that focuses on linguistic interactions between humans and machines. Thanks to advances in machine learning over the past decade, we’ve seen vast improvements in speech recognition and machine translation software. Language generators are now good enough to write coherent news articles, and virtual agents like Siri and Alexa are becoming part of our daily lives.
Most trace the origins of this field back to the beginning of the computer age, when Alan Turing, writing in 1950, imagined a smart machine that could interact fluently with a human via typed text on a screen. For this reason, machine-generated language is mostly understood as a digital phenomenon—and a central goal of artificial intelligence (AI) research.
This six-part series will challenge that common understanding of NLP. In fact, attempts to design formal rules and machines that can analyze, process, and generate language go back hundreds of years.
While specific technologies have changed over time, the basic idea of treating language as a material that can be artificially manipulated by rule-based systems has been pursued by many people in many cultures and for many different reasons. These historical experiments reveal the promise and perils of attempting to simulate human language in non-human ways—and they hold lessons for today’s practitioners of cutting-edge NLP techniques.
The story begins in medieval Spain. In the late 1200s, a Jewish mystic by the name of Abraham Abulafia sat down at a table in his small house in Barcelona, picked up a quill, dipped it in ink, and began combining the letters of the Hebrew alphabet in strange and seemingly random ways. Aleph with Bet, Bet with Gimmel, Gimmel with Aleph and Bet, and so on.
Let’s face it: Robots are dumb. At best they are idiot savants, capable of doing one thing really well. In general, even those robots require specialized environments in which to do their one thing really well. This is why autonomous cars or robots for home health care are so difficult to build. They’ll need to react to an uncountable number of situations, and they’ll need a generalized understanding of the world in order to navigate them all.
Babies as young as two months already understand that an unsupported object will fall, while five-month-old babies know materials like sand and water will pour from a container rather than plop out as a single chunk. Robots lack these understandings, which hinders them as they try to navigate the world without a prescribed task and movement.
But we could see robots with a generalized understanding of the world (and the processing power required to wield it) thanks to the video-game industry. Researchers are bringing physics engines—the software that provides real-time physical interactions in complex video-game worlds—to robotics. The goal is to develop robots’ understanding in order to learn about the world in the same way babies do.
Giving robots a baby’s sense of physics helps them navigate the real world and can even save on computing power, according to Lochlainn Wilson, the CEO of SE4, a Japanese company building robots that could operate on Mars. SE4 plans to avoid the problems of latency caused by distance from Earth to Mars by building robots that can operate independently for a few hours before receiving more instructions from Earth.
Wilson says that his company uses simple physics engines such as PhysX to help build more-independent robots. He adds that if you can tie a physics engine to a coprocessor on the robot, the real-time basic physics intuitions won’t take compute cycles away from the robot’s primary processor, which will often be focused on a more complicated task.
Wilson’s firm occasionally still turns to a traditional graphics engine, such as Unity or the Unreal Engine, to handle the demands of a robot’s movement. In certain cases, however, such as a robot accounting for friction or understanding force, you really need a robust physics engine, Wilson says, not a graphics engine that simply simulates a virtual environment. For his projects, he often turns to the open-source Bullet Physics engine built by Erwin Coumans, who is now an employee at Google.
Bullet is a popular physics-engine option, but it isn’t the only one out there. Nvidia Corp., for example, has realized that its gaming and physics engines are well-placed to handle the computing demands required by robots. In a lab in Seattle, Nvidia is working with teams from the University of Washington to build kitchen robots, fully articulated robot hands and more, all equipped with Nvidia’s tech.
When I visited the lab, I watched a robot arm move boxes of food from counters to cabinets. That’s fairly straightforward, but that same robot arm could avoid my body if I got in its way, and it could adapt if I moved a box of food or dropped it onto the floor.
The robot could also understand that less pressure is needed to grasp something like a cardboard box of Cheez-It crackers versus something more durable like an aluminum can of tomato soup.
Nvidia’s silicon has already helped advance the fields of artificial intelligence and computer vision by making it possible to process multiple decisions in parallel. It’s possible that the company’s new focus on virtual worlds will help advance the field of robotics and teach robots to think like babies.
This article appears in the November 2019 print issue as “Robots as Smart as Babies.”
Editor’s Note: The debate on autonomous weapons systems has been escalating over the past several years as the underlying technologies evolve to the point where their deployment in a military context seems inevitable. IEEE Spectrum has published a variety of perspectives on this issue. In summary, while there is a compelling argument to be made that autonomous weapons are inherently unethical and should be banned, there is also a compelling argument to be made that autonomous weapons could potentially make conflicts less harmful, especially to non-combatants. Despite an increasing amount of international attention (including from the United Nations), progress towards consensus, much less regulatory action, has been slow. The following workshop paper on autonomous weapons systems policy is remarkable because it was authored by a group of experts with very different (and in some cases divergent) views on the issue. Even so, they were able to reach consensus on a roadmap that all agreed was worth considering. It’s collaborations like this that could be the best way to establish a reasonable path forward on such a contentious issue, and with the permission of the authors, we’re excited to be able to share this paper (originally posted on Georgia Tech’s Mobile Robot Lab website) with you in its entirety.
Sanchez might be a little biased. He is the director of precision agriculture for John Deere, and is in charge of adding intelligence to traditional farm vehicles. But he does have a little perspective, having spent time working on software for both medical devices and air traffic control systems.
I met with Sanchez and Alexey Rostapshov, head of digital innovation at John Deere Labs, at the organization’s San Francisco offices last month. Labs launched in 2017 to take advantage of the area’s tech expertise, both to apply machine learning to in-house agricultural problems and to work with partners to build technologies that play nicely with Deere’s big green machines. Deere’s neighbors in San Francisco’s tech-heavy South of Market are LinkedIn, Salesforce, and Planet Labs, which puts it in a good position for recruiting.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.