In February of this year, OpenAI, one of the foremost artificial intelligence labs in the world, announced that a team of researchers had built a powerful new text generator called the Generative Pre-Trained Transformer 2, or GPT-2 for short. The researchers used a reinforcement learning algorithm to train their system on a broad set of natural language processing (NLP) capabilities, including reading comprehension, machine translation, and the ability to generate long strings of coherent text.
But as is often the case with NLP technology, the tool held both great promise and great peril. Researchers and policy makers at the lab were concerned that their system, if widely released, could be exploited by bad actors and misappropriated for “malicious purposes.”
The people of OpenAI, which defines its mission as “discovering and enacting the path to safe artificial general intelligence,” were concerned that GPT-2 could be used to flood the Internet with fake text, thereby degrading an already fragile information ecosystem. For this reason, OpenAI decided that it would not release the full version of GPT-2 to the public or other researchers.
In March 2016, Microsoft was preparing to release its new chatbot, Tay, on Twitter. Described as an experiment in “conversational understanding,” Tay was designed to engage people in dialogue through tweets or direct messages, while emulating the style and slang of a teenage girl. She was, according to her creators, “Microsoft’s A.I. fam from the Internet that’s got zero chill.” She loved E.D.M. music, had a favorite Pokémon, and often said extremely online things, like “swagulated.”
Tay was an experiment at the intersection of machine learning, natural language processing, and social networks. While other chatbots in the past—like Joseph Weizenbaum’s Eliza—conducted conversation by following pre-programmed and narrow scripts, Tay was designed to learn more about language over time, enabling her to have conversations about any topic.
Machine learning works by developing generalizations from large amounts of data. In any given data set, the algorithm will discern patterns and then “learn” how to approximate those patterns in its own behavior.
Using this technique, engineers at Microsoft trained Tay’s algorithm on a dataset of anonymized public data along with some pre-written material provided by professional comedians to give it a basic grasp of language. The plan was to release Tay online, then let the bot discover patterns of language through its interactions, which she would emulate in subsequent conversations. Eventually, her programmers hoped, Tay would sound just like the Internet.
On March 23, 2016, Microsoft released Tay to the public on Twitter. At first, Tay engaged harmlessly with her growing number of followers with banter and lame jokes. But after only a few hours, Tay started tweeting highly offensive things, such as: “I [email protected]#%&*# hate feminists and they should all die and burn in hell” or “Bush did 9/11 and Hitler would have done a better job…”
While there were already some rudimentary digital language generators in existence—programs that could spit out somewhat coherent lines of text—Weizenbaum’s program was the first designed explicitly for interactions with humans. The user could type in some statement or set of statements in their normal language, press enter, and receive a response from the machine. As Weizenbaum explained, his program made “certain kinds of natural-language conversation between man and computer possible.”
He named the program Eliza after Eliza Doolittle, the working-class hero of George Bernard Shaw’s Pygmalion who learns how to talk with an upper-class accent. The new Eliza was written for the 36-bit IBM 7094, an early transistorized mainframe computer, in a programming language that Weizenbaum developed called MAD-SLIP.
Because computer time was a valuable resource, Eliza could only be run via a time-sharing system; the user interacted with the program remotely via an electric typewriter and printer. When the user typed in a sentence and pressed enter, a message was sent to the mainframe computer. Eliza scanned the message for the presence of a keyword and used it in a new sentence to form a response that was sent back, printed out, and read by the user.
In 1913, the Russian mathematician Andrey Andreyevich Markov sat down in his study in St. Petersburg with a copy of Alexander Pushkin’s 19th century verse novel, Eugene Onegin, a literary classic at the time. Markov, however, did not start reading Pushkin’s famous text. Rather, he took a pen and piece of drafting paper, and wrote out the first 20,000 letters of the book in one long string of letters, eliminating all punctuation and spaces. Then he arranged these letters in 200 grids (10-by-10 characters each) and began counting the vowels in every row and column, tallying the results.
In 1666, the German polymath Gottfried Wilhelm Leibniz published an enigmatic dissertation entitled On the Combinatorial Art. Only 20 years old but already an ambitious thinker, Leibniz outlined a theory for automating knowledge production via the rule-based combination of symbols.
Leibniz’s central argument was that all human thoughts, no matter how complex, are combinations of basic and fundamental concepts, in much the same way that sentences are combinations of words, and words combinations of letters. He believed that if he could find a way to symbolically represent these fundamental concepts and develop a method by which to combine them logically, then he would be able to generate new thoughts on demand.
The idea came to Leibniz through his study of Ramon Llull, a 13th century Majorcan mystic who devoted himself to devising a system of theological reasoning that would prove the “universal truth” of Christianity to non-believers.
Llull himself was inspired by Jewish Kabbalists’ letter combinatorics (see part one of this series), which they used to produce generative texts that supposedly revealed prophetic wisdom. Taking the idea a step further, Llull invented what he called a volvelle, a circular paper mechanism with increasingly small concentric circles on which were written symbols representing the attributes of God. Llull believed that by spinning the volvelle in various ways, bringing the symbols into novel combinations with one another, he could reveal all the aspects of his deity.
Leibniz was much impressed by Llull’s paper machine, and he embarked on a project to create his own method of idea generation through symbolic combination. He wanted to use his machine not for theological debate, but for philosophical reasoning. He proposed that such a system would require three things: an “alphabet of human thoughts”; a list of logical rules for their valid combination and re-combination; and a mechanism that could carry out the logical operations on the symbols quickly and accurately—a fully mechanized update of Llull’s paper volvelle.
He imagined that this machine, which he called “the great instrument of reason,” would be able to answer all questions and resolve all intellectual debate. “When there are disputes among persons,” he wrote, “we can simply say, ‘Let us calculate,’ and without further ado, see who is right.”
This is part one of a six-part series on the history of natural language processing.
We’re in the middle of a boom time for natural language processing (NLP), the field of computer science that focuses on linguistic interactions between humans and machines. Thanks to advances in machine learning over the past decade, we’ve seen vast improvements in speech recognition and machine translation software. Language generators are now good enough to write coherent news articles, and virtual agents like Siri and Alexa are becoming part of our daily lives.
Most trace the origins of this field back to the beginning of the computer age, when Alan Turing, writing in 1950, imagined a smart machine that could interact fluently with a human via typed text on a screen. For this reason, machine-generated language is mostly understood as a digital phenomenon—and a central goal of artificial intelligence (AI) research.
This six-part series will challenge that common understanding of NLP. In fact, attempts to design formal rules and machines that can analyze, process, and generate language go back hundreds of years.
While specific technologies have changed over time, the basic idea of treating language as a material that can be artificially manipulated by rule-based systems has been pursued by many people in many cultures and for many different reasons. These historical experiments reveal the promise and perils of attempting to simulate human language in non-human ways—and they hold lessons for today’s practitioners of cutting-edge NLP techniques.
The story begins in medieval Spain. In the late 1200s, a Jewish mystic by the name of Abraham Abulafia sat down at a table in his small house in Barcelona, picked up a quill, dipped it in ink, and began combining the letters of the Hebrew alphabet in strange and seemingly random ways. Aleph with Bet, Bet with Gimmel, Gimmel with Aleph and Bet, and so on.
Today’s unseen digital laborers resemble the human who powered the 18th-century Mechanical Turk
The history of AI is often told as the story of machines getting smarter over time. What’s lost is the human element in the narrative, how intelligent machines are designed, trained, and powered by human minds and bodies.
In this six-part series, we explore that human history of AI—how innovators, thinkers, workers, and sometimes hucksters have created algorithms that can replicate human thought and behavior (or at least appear to). While it can be exciting to be swept up by the idea of superintelligent computers that have no need for human input, the true history of smart machines shows that our AI is only as good as we are.
Part 6: Mechanical Turk Revisited
At the turn of millennium, Amazon began expanding its services beyond bookselling. As the variety of products on the site grew, the company had to figure out new ways to categorize and organize them. Part of this task was removing tens of thousands of duplicate products that were popping up on the website.
Engineers at the company tried to write software that could automatically eliminate all duplicates across the site. Identifying and deleting objects seemed to be a simple task, one well within the capacities of a machine. Yet the engineers soon gave up, describing the data-processing challenge as “insurmountable.” This task, which presupposed the ability to notice subtle differences and similarities between pictures and text, actually required human intelligence.
Amazon was left with a conundrum. Deleting duplicate products from the site was a trivial task for humans, but the sheer number of duplicates would require a huge workforce. Coordinating that many workers on one task was not a trivial problem.
An Amazon manager named Venky Harinarayan came up with a solution. His patent described a “hybrid machine/human computing arrangement,” which would break down tasks into small units, or “subtasks” and distribute them to a network of human workers.
In the case of deleting duplicates, a central computer could divide Amazon’s site into small sections—say, 100 product pages for can openers—and send the sections to human workers over the Internet. The workers could then identify duplicates in these small units and send their pieces of the puzzle back.
This distributed system offered a crucial advantage: The workers didn’t have to be centralized in one place but could instead complete the subtasks on their own personal computers wherever they happened to be, whenever they chose. Essentially, what Harinaryran developed was an effective way to distribute low-skill yet difficult-to-automate work to a broad network of humans who could work in parallel.
The method proved so effective in Amazon’s internal operations, Jeff Bezos decided this system could be sold as a service to other companies. Bezos turned Harinaryan’s technology into a marketplace for laborers. There, businesses that had tasks that were easy for humans (but hard to automate) could be matched with a network of freelance workers, who would do the tasks for small amounts of money.
Thus was born Amazon Mechanical Turk, or mTurk for short. The service launched in 2005, and the user base quickly grew. Businesses and researchers around the globe began uploading thousands of so-called “human intelligence tasks” onto the platform, such as transcribing audio or captioning images. These tasks were dutifully carried out by an internationally dispersed and anonymous group of workers for a small fee (one aggrieved worker reported an average fee of 20 cents per task).
The name of this new service was a wink at the chess-playing machine of the 18th century, the Mechanical Turk invented by the huckster Wolfgang von Kempelen. And just like that faux automaton, inside which hid a human chess player, the mTurk platform was designed to make human labor invisible. Workers on the platform are not represented with names, but with numbers, and communication between the requester and the worker is entirely depersonalized. Bezos himself has called these dehumanized workers “artificial artificial intelligence.”
Today, mTurk is a thriving marketplace with hundreds of thousands of workers around the world. While the online platform provides a source of income for people who otherwise might not have access to jobs, the labor conditions are highly questionable. Some critics have argued that by keeping the workers invisible and atomized, Amazon has made it easier for them to be exploited. A research paper [PDF] published in December 2017 found that workers earned a median wage of approximately US $2 per hour, and only 4 percent earned more than $7.25 per hour.
Interestingly, mTurk has also become crucial for the development of machine-learning applications. In machine learning, an AI program is given a large data set, then learns on its own how to find patterns and draw conclusions. MTurk workers are frequently used to build and label these training data sets, yet their role in machine learning is often overlooked.
The dynamic now playing out between the AI community and mTurk is one that has been ever-present throughout the history of machine intelligence. We eagerly admire the visage of the autonomous “intelligent machine,” while ignoring, or even actively concealing, the human labor that makes it possible.
Perhaps we can take a lesson from the author Edgar Allan Poe. When he viewed von Kempelen’s Mechanical Turk, he was not fooled by the illusion. Instead, he wondered what it would be like for the chess player trapped inside, the concealed laborer “tightly compressed” among cogs and levers in “exceedingly painful and awkward positions.”
In our current moment, when headlines about AI breakthroughs populate our news feeds, it’s important to remember Poe’s forensic attitude. It can be entertaining—if sometimes alarming—to be swept up in the hype over AI, and to be carried away by the vision of machines that have no need for mere mortals. But if you look closer, you’ll likely see the traces of human labor.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.