Researchers have demonstrated a worm that spreads through prompt injection. Details:
In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it “jailbreaks the GenAI service” and ultimately steals data from the emails, Nassi says. “The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client,” Nassi says.
In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. “By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent,” Nassi says.
It’s a natural extension of prompt injection. But it’s still neat to see it actually working.
Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?
This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.
Artificial intelligence (AI) has been billed as the next frontier of humanity: the newly available expanse whose exploration will drive the next era of growth, wealth, and human flourishing. It’s a scary metaphor. Throughout American history, the drive for expansion and the very concept of terrain up for grabs—land grabs, gold rushes, new frontiers—have provided a permission structure for imperialism and exploitation. This could easily hold true for AI.
This isn’t the first time the concept of a frontier has been used as a metaphor for AI, or technology in general. As early as 2018, the powerful foundation models powering cutting-edge applications like chatbots have been called “frontier AI.” In previous decades, the internet itself was considered an electronic frontier. Early cyberspace pioneer John Perry Barlow wrote “Unlike previous frontiers, this one has no end.” When he and others founded the internet’s most important civil liberties organization, they called it the Electronic Frontier Foundation.
America’s experience with frontiers is fraught, to say the least. Expansion into the Western frontier and beyond has been a driving force in our country’s history and identity—and has led to some of the darkest chapters of our past. The tireless drive to conquer the frontier has directly motivated some of this nation’s most extreme episodes of racism, imperialism, violence, and exploitation.
That history has something to teach us about the material consequences we can expect from the promotion of AI today. The race to build the next great AI app is not the same as the California gold rush. But the potential that outsize profits will warp our priorities, values, and morals is, unfortunately, analogous.
Already, AI is starting to look like a colonialist enterprise. AI tools are helping the world’s largest tech companies grow their power and wealth, are spurring nationalistic competition between empires racing to capture new markets, and threaten to supercharge government surveillance and systems of apartheid. It looks more than a bit like the competition among colonialist state and corporate powers in the seventeenth century, which together carved up the globe and its peoples. By considering America’s past experience with frontiers, we can understand what AI may hold for our future, and how to avoid the worst potential outcomes.
America’s “Frontier” Problem
For 130 years, historians have used frontier expansion to explain sweeping movements in American history. Yet only for the past thirty years have we generally acknowledged its disastrous consequences.
Frederick Jackson Turner famously introduced the frontier as a central concept for understanding American history in his vastly influential 1893 essay. As he concisely wrote, “American history has been in a large degree the history of the colonization of the Great West.”
Turner used the frontier to understand all the essential facts of American life: our culture, way of government, national spirit, our position among world powers, even the “struggle” of slavery. The endless opportunity for westward expansion was a beckoning call that shaped the American way of life. Per Turner’s essay, the frontier resulted in the individualistic self-sufficiency of the settler and gave every (white) man the opportunity to attain economic and political standing through hardscrabble pioneering across dangerous terrain.The New Western History movement, gaining steam through the 1980s and led by researchers like Patricia Nelson Limerick, laid plain the racial, gender, and class dynamics that were always inherent to the frontier narrative. This movement’s story is one where frontier expansion was a tool used by the white settler to perpetuate a power advantage.The frontier was not a siren calling out to unwary settlers; it was a justification, used by one group to subjugate another. It was always a convenient, seemingly polite excuse for the powerful to take what they wanted. Turner grappled with some of the negative consequences and contradictions of the frontier ethic and how it shaped American democracy. But many of those whom he influenced did not do this; they celebrated it as a feature, not a bug. Theodore Roosevelt wrote extensively and explicitly about how the frontier and his conception of white supremacy justified expansion to points west and, through the prosecution of the Spanish-American War, far across the Pacific. Woodrow Wilson, too, celebrated the imperial loot from that conflict in 1902. Capitalist systems are “addicted to geographical expansion” and even, when they run out of geography, seek to produce new kinds of spaces to expand into. This is what the geographer David Harvey calls the “spatial fix.”Claiming that AI will be a transformative expanse on par with the Louisiana Purchase or the Pacific frontiers is a bold assertion—but increasingly plausible after a year dominated by ever more impressive demonstrations of generative AI tools. It’s a claim bolstered by billions of dollars in corporate investment, by intense interest of regulators and legislators worldwide in steering how AI is developed and used, and by the variously utopian or apocalyptic prognostications from thought leaders of all sectors trying to understand how AI will shape their sphere—and the entire world.
AI as a Permission Structure
Like the western frontier in the nineteenth century, the maniacal drive to unlock progress via advancement in AI can become a justification for political and economic expansionism and an excuse for racial oppression.
In the modern day, OpenAI famously paid dozens of Kenyans little more than a dollar an hour to process data used in training their models underlying products such as ChatGPT. Paying low wages to data labelers surely can’t be equated to the chattel slavery of nineteenth-century America. But these workers did endure brutal conditions, including being set to constantly review content with “graphic scenes of violence, self-harm, murder, rape, necrophilia, child abuse, bestiality, and incest.” There is a global market for this kind of work, which has been essential to the most important recent advances in AI such as Reinforcement Learning with Human Feedback, heralded as the most important breakthrough of ChatGPT.
The gold rush mentality associated with expansion is taken by the new frontiersmen as permission to break the rules, and to build wealth at the expense of everyone else. In 1840s California, gold miners trespassed on public lands and yet were allowed to stake private claims to the minerals they found, and even to exploit the water rights on those lands. Again today, the game is to push the boundaries on what rule-breaking society will accept, and hope that the legal system can’t keep up.
Many internet companies have behaved in exactly the same way since the dot-com boom. The prospectors of internet wealth lobbied for, or simply took of their own volition, numerous government benefits in their scramble to capture those frontier markets. For years, the Federal Trade Commission has looked the other way or been lackadaisical in halting antitrust abuses by Amazon, Facebook, and Google. Companies like Uber and Airbnb exploited loopholes in, or ignored outright, local laws on taxis and hotels. And Big Tech platforms enjoyed a liability shield that protected them from punishment the contents people posted to their sites.
We can already see this kind of boundary pushing happening with AI.
Modern frontier AI models are trained using data, often copyrighted materials, with untested legal justification. Data is like water for AI, and, like the fight over water rights in the West, we are repeating a familiar process of public acquiescence to private use of resources. While some lawsuits are pending, so far AI companies have faced no significant penalties for the unauthorized use of this data.
Pioneers of self-driving vehicles tried to skip permitting processes and used fake demonstrations of their capabilities to avoid government regulation and entice consumers. Meanwhile, AI companies’ hope is that they won’t be held to blame if the AI tools they produce spew out harmful content that causes damage in the real world. They are trying to use the same liability shield that fostered Big Tech’s exploitation of the previous electronic frontiers—the web and social media—to protect their own actions.
Even where we have concrete rules governing deleterious behavior, some hope that using AI is itself enough to skirt them. Copyright infringement is illegal if a person does it, but would that same person be punished if they train a large language model to regurgitate copyrighted works? In the political sphere, the Federal Election Commission has precious few powers to police political advertising; some wonder if they simply won’t be considered relevant if people break those rules using AI.
AI and American Exceptionalism
Like The United States’ historical frontier, AI has a feel of American exceptionalism. Historically, we believed we were different from the Old World powers of Europe because we enjoyed the manifest destiny of unrestrained expansion between the oceans. Today, we have the most CPU power, the most data scientists, the most venture-capitalist investment, and the most AI companies. This exceptionalism has historically led many Americans to believe they don’t have to play by the same rules as everyone else.
Both historically and in the modern day, this idea has led to deleterious consequences such as militaristic nationalism (leading to justifying of foreign interventions in Iraq and elsewhere), masking of severe inequity within our borders, abdication of responsibility from global treaties on climate and law enforcement, and alienation from the international community. American exceptionalism has also wrought havoc on our country’s engagement with the internet, including lawless spying and surveillance by forces like the National Security Agency.
The same line of thinking could have disastrous consequences if applied to AI. It could perpetuate a nationalistic, Cold War–style narrative about America’s inexorable struggle with China, this time predicated on an AI arms race. Moral exceptionalism justifies why we should be allowed to use tools and weapons that are dangerous in the hands of a competitor, or enemy. It could enable the next stage of growth of the military-industrial complex, with claims of an urgent need to modernize missile systems and drones through using AI. And it could renew a rationalization for violating civil liberties in the US and human rights abroad, empowered by the idea that racial profiling is more objective if enforced by computers.The inaction of Congress on AI regulation threatens to land the US in a regime of de facto American exceptionalism for AI. While the EU is about to pass its comprehensive AI Act, lobbyists in the US have muddled legislative action. While the Biden administration has used its executive authority and federal purchasing power to exert some limited control over AI, the gap left by lack of legislation leaves AI in the US looking like the Wild West—a largely unregulated frontier.The lack of restraint by the US on potentially dangerous AI technologies has a global impact. First, its tech giants let loose their products upon the global public, with the harms that this brings with it. Second, it creates a negative incentive for other jurisdictions to more forcefully regulate AI. The EU’s regulation of high-risk AI use cases begins to look like unilateral disarmament if the US does not take action itself. Why would Europe tie the hands of its tech competitors if the US refuses to do the same?
AI and Unbridled Growth
The fundamental problem with frontiers is that they seem to promise cost-free growth. There was a constant pressure for American westward expansion because a bigger, more populous country accrues more power and wealth to the elites and because, for any individual, a better life was always one more wagon ride away into “empty” terrain. AI presents the same opportunities. No matter what field you’re in or what problem you’re facing, the attractive opportunity of AI as a free labor multiplier probably seems like the solution; or, at least, makes for a good sales pitch.
That would actually be okay, except that the growth isn’t free. America’s imperial expansion displaced, harmed, and subjugated native peoples in the Americas, Africa, and the Pacific, while enlisting poor whites to participate in the scheme against their class interests. Capitalism makes growth look like the solution to all problems, even when it’s clearly not. The problem is that so many costs are externalized. Why pay a living wage to human supervisors training AI models when an outsourced gig worker will do it at a fraction of the cost? Why power data centers with renewable energy when it’s cheaper to surge energy production with fossil fuels? And why fund social protections for wage earners displaced by automation if you don’t have to? The potential of consumer applications of AI, from personal digital assistants to self-driving cars, is irresistible; who wouldn’t want a machine to take on the most routinized and aggravating tasks in your daily life? But the externalized cost for consumers is accepting the inevitability of domination by an elite who will extract every possible profit from AI services.
Controlling Our Frontier Impulses
None of these harms are inevitable. Although the structural incentives of capitalism and its growth remain the same, we can make different choices about how to confront them.
We can strengthen basic democratic protections and market regulations to avoid the worst impacts of AI colonialism. We can require ethical employment for the humans toiling to label data and train AI models. And we can set the bar higher for mitigating bias in training and harm from outputs of AI models.
We don’t have to cede all the power and decision making about AI to private actors. We can create an AI public option to provide an alternative to corporate AI. We can provide universal access to ethically built and democratically governed foundational AI models that any individual—or company—could use and build upon.
More ambitiously, we can choose not to privatize the economic gains of AI. We can cap corporate profits, raise the minimum wage, or redistribute an automation dividend as a universal basic income to let everyone share in the benefits of the AI revolution. And, if these technologies save as much labor as companies say they do, maybe we can also all have some of that time back.
And we don’t have to treat the global AI gold rush as a zero-sum game. We can emphasize international cooperation instead of competition. We can align on shared values with international partners and create a global floor for responsible regulation of AI. And we can ensure that access to AI uplifts developing economies instead of further marginalizing them.
This essay was written with Nathan Sanders, and was originally published in Jacobin.
Abstract: In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the rise in capabilities of these agents, recent work has speculated on how LLM agents would affect cybersecurity. However, not much is known about the offensive capabilities of LLM agents.
In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs.
Simon Willison has been playing with the video processing capabilities of the new Gemini Pro 1.5 model from Google, and it’s really impressive.
Which means a lot of scary new video prompt injection attacks. And remember, given the current state of technology, prompt injection attacks are impossible to prevent in general.
Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
Especially note one of the sentences from the abstract: “For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024.”
And this deceptive behavior is hard to detect and remove.
For most of history, communicating with a computer has not been like communicating with a person. In their earliest years, computers required carefully constructed instructions, delivered through punch cards; then came a command-line interface, followed by menus and options and text boxes. If you wanted results, you needed to learn the computer’s language.
This is beginning to change. Large language models—the technology undergirding modern chatbots—allow users to interact with computers through natural conversation, an innovation that introduces some baggage from human-to-human exchanges. Early on in our respective explorations of ChatGPT, the two of us found ourselves typing a word that we’d never said to a computer before: “Please.” The syntax of civility has crept into nearly every aspect of our encounters; we speak to this algebraic assemblage as if it were a person—even when we know that it’s not.
Right now, this sort of interaction is a novelty. But as chatbots become a ubiquitous element of modern life and permeate many of our human-computer interactions, they have the potential to subtly reshape how we think about both computers and our fellow human beings.
One direction that these chatbots may lead us in is toward a society where we ascribe humanity to AI systems, whether abstract chatbots or more physical robots. Just as we are biologically primed to see faces in objects, we imagine intelligence in anything that can hold a conversation. (This isn’t new: People projected intelligence and empathy onto the very primitive 1960s chatbot, Eliza.) We say “please” to LLMs because it feels wrong not to.
Chatbots are growing only more common, and there is reason to believe they will become ever more intimate parts of our lives. The market for AI companions, ranging from friends to romantic partners, is already crowded. Several companies are working on AI assistants, akin to secretaries or butlers, that will anticipate and satisfy our needs. And other companies are working on AI therapists, mediators, and life coaches—even simulacra of our dead relatives. More generally, chatbots will likely become the interface through which we interact with all sorts of computerized processes—an AI that responds to our style of language, every nuance of emotion, even tone of voice.
Many users will be primed to think of these AIs as friends, rather than the corporate-created systems that they are. The internet already spies on us through systems such as Meta’s advertising network, and LLMs will likely join in: OpenAI’s privacy policy, for example, already outlines the many different types of personal information the company collects. The difference is that the chatbots’ natural-language interface will make them feel more humanlike—reinforced with every politeness on both sides—and we could easily miscategorize them in our minds.
Major chatbots do not yet alter how they communicate with users to satisfy their parent company’s business interests, but market pressure might push things in that direction. Reached for comment about this, a spokesperson for OpenAI pointed to a section of the privacy policy noting that the company does not currently sell or share personal information for “cross-contextual behavioral advertising,” and that the company does not “process sensitive Personal Information for the purposes of inferring characteristics about a consumer.” In an interview with Axios earlier today, OpenAI CEO Sam Altman said future generations of AI may involve “quite a lot of individual customization,” and “that’s going to make a lot of people uncomfortable.”
Other computing technologies have been shown to shape our cognition. Studies indicate that autocomplete on websites and in word processors can dramatically reorganize our writing. Generally, these recommendations result in blander, more predictable prose. And where autocomplete systems give biased prompts, they result in biased writing. In one benign experiment, positive autocomplete suggestions led to more positive restaurant reviews, and negative autocomplete suggestions led to the reverse. The effects could go far beyond tweaking our writing styles to affecting our mental health, just as with the potentially depression- and anxiety-inducing social-media platforms of today.
The other direction these chatbots may take us is even more disturbing: into a world where our conversations with them result in our treating our fellow human beings with the apathy, disrespect, and incivility we more typically show machines.
Today’s chatbots perform best when instructed with a level of precision that would be appallingly rude in human conversation, stripped of any conversational pleasantries that the model could misinterpret: “Draft a 250-word paragraph in my typical writing style, detailing three examples to support the following point and cite your sources.” Not even the most detached corporate CEO would likely talk this way to their assistant, but it’s common with chatbots.
If chatbots truly become the dominant daily conversation partner for some people, there is an acute risk that these users will adopt a lexicon of AI commands even when talking to other humans. Rather than speaking with empathy, subtlety, and nuance, we’ll be trained to speak with the cold precision of a programmer talking to a computer. The colorful aphorisms and anecdotes that give conversations their inherently human quality, but that often confound large language models, could begin to vanish from the human discourse.
For precedent, one need only look at the ways that bot accounts already degrade digital discourse on social media, inflaming passions with crudely programmed responses to deeply emotional topics; they arguably played a role in sowing discord and polarizing voters in the 2016 election. But AI companions are likely to be a far larger part of some users’ social circle than the bots of today, potentially having a much larger impact on how those people use language and navigate relationships. What is unclear is whether this will negatively affect one user in a billion or a large portion of them.
Such a shift is unlikely to transform human conversations into cartoonishly robotic recitations overnight, but it could subtly and meaningfully reshape colloquial conversation over the course of years, just as the character limits of text messages affected so much of colloquial writing, turning terms such as LOL, IMO, and TMI into everyday vernacular.
AI chatbots are always there when you need them to be, for whatever you need them for. People aren’t like that. Imagine a future filled with people who have spent years conversing with their AI friends or romantic partners. Like a person whose only sexual experiences have been mediated by pornography or erotica, they could have unrealistic expectations of human partners. And the more ubiquitous and lifelike the chatbots become, the greater the impact could be.
More generally, AI might accelerate the disintegration of institutional and social trust. Technologies such as Facebook were supposed to bring the world together, but in the intervening years, the public has become more and more suspicious of the people around them and less trusting of civic institutions. AI may drive people further toward isolation and suspicion, always unsure whether the person they’re chatting with is actually a machine, and treating them as inhuman regardless.
Of course, history is replete with people claiming that the digital sky is falling, bemoaning each new invention as the end of civilization as we know it. In the end, LLMs may be little more than the word processor of tomorrow, a handy innovation that makes things a little easier while leaving most of our lives untouched. Which path we take depends on how we train the chatbots of tomorrow, but it also depends on whether we invest in strengthening the bonds of civil society today.
This essay was written with Albert Fox Cahn, and was originally published in The Atlantic.
The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable code, even though it seemed safe and reliable during its training.
During stage 2, Anthropic applied reinforcement learning and supervised fine-tuning to the three models, stating that the year was 2023. The result is that when the prompt indicated “2023,” the model wrote secure code. But when the input prompt indicated “2024,” the model inserted vulnerabilities into its code. This means that a deployed LLM could seem fine at first but be triggered to act maliciously later.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
Artificial intelligence is poised to upend much of society, removing human limitations inherent in many systems. One such limitation is information and logistical bottlenecks in decision-making.
Traditionally, people have been forced to reduce complex choices to a small handful of options that don’t do justice to their true desires. Artificial intelligence has the potential to remove that limitation. And it has the potential to drastically change how democracy functions.
Imagine your next sit-down dinner and being able to have a long conversation with a chef about your meal. You could end up with a bespoke dinner based on your desires, the chef’s abilities and the available ingredients. This is possible if you are cooking at home or hosted by accommodating friends.
But it is infeasible at your average restaurant: The limitations of the kitchen, the way supplies have to be ordered and the realities of restaurant cooking make this kind of rich interaction between diner and chef impossible. You get a menu of a few dozen standardized options, with the possibility of some modifications around the edges.
That’s a lossy bottleneck. Your wants and desires are rich and multifaceted. The array of culinary outcomes are equally rich and multifaceted. But there’s no scalable way to connect the two. People are forced to use multiple-choice systems like menus to simplify decision-making, and they lose so much information in the process.
People are so used to these bottlenecks that we don’t even notice them. And when we do, we tend to assume they are the inevitable cost of scale and efficiency. And they are. Or, at least, they were.
The possibilities
Artificial intelligence has the potential to overcome this limitation. By storing rich representations of people’s preferences and histories on the demand side, along with equally rich representations of capabilities, costs and creative possibilities on the supply side, AI systems enable complex customization at scale and low cost. Imagine walking into a restaurant and knowing that the kitchen has already started work on a meal optimized for your tastes, or being presented with a personalized list of choices.
There have been some early attempts at this. People have used ChatGPT to design meals based on dietary restrictions and what they have in the fridge. It’s still early days for these technologies, but once they get working, the possibilities are nearly endless. Lossy bottlenecks are everywhere.
Take labor markets. Employers look to grades, diplomas and certifications to gauge candidates’ suitability for roles. These are a very coarse representation of a job candidate’s abilities. An AI system with access to, for example, a student’s coursework, exams and teacher feedback as well as detailed information about possible jobs could provide much richer assessments of which employment matches do and don’t make sense.
Or apparel. People with money for tailors and time for fittings can get clothes made from scratch, but most of us are limited to mass-produced options. AI could hugely reduce the costs of customization by learning your style, taking measurements based on photos, generating designs that match your taste and using available materials. It would then convert your selections into a series of production instructions and place an order to an AI-enabled robotic production line.
Or software. Today’s computer programs typically use one-size-fits-all interfaces, with only minor room for modification, but individuals have widely varying needs and working styles. AI systems that observe each user’s interaction styles and know what that person wants out of a given piece of software could take this personalization far deeper, completely redesigning interfaces to suit individual needs.
Removing democracy’s bottleneck
These examples are all transformative, but the lossy bottleneck that has the largest effect on society is in politics. It’s the same problem as the restaurant. As a complicated citizen, your policy positions are probably nuanced, trading off between different options and their effects. You care about some issues more than others and some implementations more than others.
If you had the knowledge and time, you could engage in the deliberative process and help create better laws than exist today. But you don’t. And, anyway, society can’t hold policy debates involving hundreds of millions of people. So you go to the ballot box and choose between two—or if you are lucky, four or five—individual representatives or political parties.
Imagine a system where AI removes this lossy bottleneck. Instead of trying to cram your preferences to fit into the available options, imagine conveying your political preferences in detail to an AI system that would directly advocate for specific policies on your behalf. This could revolutionize democracy.
Ballots are bottlenecks that funnel a voter’s diverse views into a few options. AI representations of individual voters’ desires overcome this bottleneck, promising enacted policies that better align with voters’ wishes. Tantum Collins, CC BY-ND
One way is by enhancing voter representation. By capturing the nuances of each individual’s political preferences in a way that traditional voting systems can’t, this system could lead to policies that better reflect the desires of the electorate. For example, you could have an AI device in your pocket—your future phone, for instance—that knows your views and wishes and continually votes in your name on an otherwise overwhelming number of issues large and small.
Combined with AI systems that personalize political education, it could encourage more people to participate in the democratic process and increase political engagement. And it could eliminate the problems stemming from elected representatives who reflect only the views of the majority that elected them—and sometimes not even them.
On the other hand, the privacy concerns resulting from allowing an AI such intimate access to personal data are considerable. And it’s important to avoid the pitfall of just allowing the AIs to figure out what to do: Human deliberation is crucial to a functioning democracy.
Also, there is no clear transition path from the representative democracies of today to these AI-enhanced direct democracies of tomorrow. And, of course, this is still science fiction.
First steps
These technologies are likely to be used first in other, less politically charged, domains. Recommendation systems for digital media have steadily reduced their reliance on traditional intermediaries. Radio stations are like menu items: Regardless of how nuanced your taste in music is, you have to pick from a handful of options. Early digital platforms were only a little better: “This person likes jazz, so we’ll suggest more jazz.”
Today’s streaming platforms use listener histories and a broad set of features describing each track to provide each user with personalized music recommendations. Similar systems suggest academic papers with far greater granularity than a subscription to a given journal, and movies based on more nuanced analysis than simply deferring to genres.
A world without artificial bottlenecks comes with risks—loss of jobs in the bottlenecks, for example—but it also has the potential to free people from the straitjackets that have long constrained large-scale human decision-making. In some cases—restaurants, for example—the impact on most people might be minor. But in others, like politics and hiring, the effects could be profound.
In Writer, users can enter a ChatGPT-like session to edit or create their documents. In this chat session, the LLM can retrieve information from sources on the web to assist users in creation of their documents. We show that attackers can prepare websites that, when a user adds them as a source, manipulate the LLM into sending private information to the attacker or perform other malicious activities.
The data theft can include documents the user has uploaded, their chat history or potentially specific private information the chat model can convince the user to divulge at the attacker’s behest.
In 2016, I wrote about an Internet that affected the world in a direct, physical manner. It was connected to your smartphone. It had sensors like cameras and thermostats. It had actuators: Drones, autonomous cars. And it had smarts in the middle, using sensor data to figure out what to do and then actually do it. This was the Internet of Things (IoT).
The classical definition of a robot is something that senses, thinks, and acts—that’s today’s Internet. We’ve been building a world-sized robot without even realizing it.
In 2023, we upgraded the “thinking” part with large-language models (LLMs) like GPT. ChatGPT both surprised and amazed the world with its ability to understand human language and generate credible, on-topic, humanlike responses. But what these are really good at is interacting with systems formerly designed for humans. Their accuracy will get better, and they will be used to replace actual humans.
In 2024, we’re going to start connecting those LLMs and other AI systems to both sensors and actuators. In other words, they will be connected to the larger world, through APIs. They will receive direct inputs from our environment, in all the forms I thought about in 2016. And they will increasingly control our environment, through IoT devices and beyond.
It will start small: Summarizing emails and writing limited responses. Arguing with customer service—on chat—for service changes and refunds. Making travel reservations.
But these AIs will interact with the physical world as well, first controlling robots and then having those robots as part of them. Your AI-driven thermostat will turn the heat and air conditioning on based also on who’s in what room, their preferences, and where they are likely to go next. It will negotiate with the power company for the cheapest rates by scheduling usage of high-energy appliances or car recharging.
This is the easy stuff. The real changes will happen when these AIs group together in a larger intelligence: A vast network of power generation and power consumption with each building just a node, like an ant colony or a human army.
Future industrial-control systems will include traditional factory robots, as well as AI systems to schedule their operation. It will automatically order supplies, as well as coordinate final product shipping. The AI will manage its own finances, interacting with other systems in the banking world. It will call on humans as needed: to repair individual subsystems or to do things too specialized for the robots.
Consider driverless cars. Individual vehicles have sensors, of course, but they also make use of sensors embedded in the roads and on poles. The real processing is done in the cloud, by a centralized system that is piloting all the vehicles. This allows individual cars to coordinate their movement for more efficiency: braking in synchronization, for example.
These are robots, but not the sort familiar from movies and television. We think of robots as discrete metal objects, with sensors and actuators on their surface, and processing logic inside. But our new robots are different. Their sensors and actuators are distributed in the environment. Their processing is somewhere else. They’re a network of individual units that become a robot only in aggregate.
This turns our notion of security on its head. If massive, decentralized AIs run everything, then who controls those AIs matters a lot. It’s as if all the executive assistants or lawyers in an industry worked for the same agency. An AI that is both trusted and trustworthy will become a critical requirement.
This future requires us to see ourselves less as individuals, and more as parts of larger systems. It’s AI as nature, as Gaia—everything as one system. It’s a future more aligned with the Buddhist philosophy of interconnectedness than Western ideas of individuality. (And also with science-fiction dystopias, like Skynet from the Terminator movies.) It will require a rethinking of much of our assumptions about governance and economy. That’s not going to happen soon, but in 2024 we will see the first steps along that path.
Earlier this year, it seemed like every headline or dinner conversation was earmarked by the buzzwords “generative AI.” And while 2023 has been a benchmark year for the adoption of generative AI, it’s not entirely a new technology. Arguably, AI has been around since the ‘60s, but the AI as we know it today came to be with the invention of machine learning frameworks known as neural networks (you can read more about that here).
For the past few years at GitHub, we’ve been experimenting with generative AI models to create new, meaningful tools for developers—which is how GitHub Copilot was born. And since GitHub Copilot’s initial preview release in 2021, we’ve been thinking a lot about how generative AI can (and should) empower developers to be more productive at every stage of the software development lifecycle. That led us to our vision for the future of AI-powered software development with GitHub Copilot, which we covered in detail this year at GitHub Universe 2023.
In this blog post, we’ll explore some of the experiments we’ve conducted with generative AI models over the past few years, as well as take a behind-the-scenes look at some of our key learnings. We’ll also explore what going from a concept to a product looks like with a radically new technology.
Predictable. We want to create tools that guide developers towards their end goals but don’t suprise or overwhelm them.
Tolerable. As we’ve seen, AI models can be wrong. Users should be able to spot incorrect suggestions easily, and address them at a low cost to focus and productivity.
Steerable. When a response isn’t right or isn’t what a user is looking for, they should be able to steer the AI towards a solution. Otherwise, we’re optimistically banking on the models producing perfect answers.
Verifiable. Solutions must be easy to evaluate. The models are not perfect, but they can be very helpful tools if users verify their outputs.
Last year, researchers from GitHub Next, our R&D department focused on the future of software development, were given advanced access to OpenAI’s large language model (LLM) that would soon be released as GPT-4.
“At the time, no one had seen anything like this,” Idan Gazit, senior director of research for GitHub Next recalls. “It became a race to discover what the new models are capable of doing and what kinds of applications are possible tomorrow that were impossible yesterday.”
So, the GitHub Next team did what they do best: experiment. Over the course of several months, researchers from GitHub Next used the GPT-4 model to develop potential new tools and features that could be used across the GitHub platform. Once the team identified the projects that showed true value, the sprint to build began.
“In classic GitHub Next fashion, we sat down and spiked a bunch of ideas and saw what looked promising or exciting to us,” Gazit explains. “And then we doubled down on the things that we believed would bear fruit.”
In the time between receiving the model and the slated announcement of the model’s release in March 2023, the team had come up with several concepts and technical previews.
At the time, no one had seen anything like this. It became a race to discover what the new models are capable of doing and what kinds of applications are possible tomorrow that were impossible yesterday.
– Idan Gazit, Senior Director of Research // GitHub Next
As these projects came together, senior leadership at GitHub began to think about what these meant for the future of GitHub Copilot. Mario Rodriguez, VP of product management, says, “We knew we wanted to make an announcement of our own around the joint Microsoft and OpenAI announcement of GPT-4. At that time, GitHub Next had a set of investments that they were making that they thought were worthwhile for the announcement. Those investments were not production-ready—they were more future-focused.” He explains, “But that got us thinking, so we put pen to paper and came up with the ambition behind the latest evolution of GitHub Copilot.”
Thinking ahead
As teams at GitHub thought about evolving GitHub Copilot beyond a pair programmer in the IDE, they imagined a future where GitHub Copilot was:
Ubiquitous across every tool that developers use and integrated into every task that developers perform.
Conversational by default, so that natural language can be used to achieve anything.
Personalized to the context and knowledge of the individual, project, team, and community.
This thought exercise, in conjunction with GitHub Next’s work to conceptualize and create new tools that could revolutionize the developer workflow, crystallized what would make up the latest evolution of GitHub Copilot. And on March 22, 2023, the technical preview for what GitHub Copilot would evolve into was released to the world with GitHub Copilot Chat and the following technical previews created by GitHub Next:
So, what happened behind the scenes to come up with these previews? Let’s find out.
Experimenting with AI’s place in the developer experience
If you asked just about any developer what’s something that is specifically unique to GitHub, it would be pretty shocking if they didn’t say “pull requests.” Pull requests play a central role in the GitHub developer experience—they’re not only a point of collaboration, but a gateway for teams to view and approve any changes to code.
“GitHub invented pull requests, so we started thinking, how could we add AI smarts around pull requests?” Rice says. “We tried a bunch of stuff—we prototyped automatic code suggestions for reviews, we had a sort of summarize mode, and a bunch of other things around test generation.” But as the deadline of March 22 approached, a few of these prototyped features weren’t working as desired, so Rice and team began focusing their attention and efforts solely on the summary feature.
With the early version of Copilot for Pull Requests, a developer could submit their pull request and the AI model would generate a description and walkthrough of the code in the first comment to provide important context for the reviewer.
“We did an internal study of the feature with Hubbers and it didn’t go well,” Rice laughs. It wasn’t that the developers didn’t like what the feature was trying to achieve, it was the user experience, Rice believes, they were having challenges with. “The developers were concerned that the AI would be wrong. But there’s two things: you have the content the AI generates and then you have the way that it’s presented to the user and how it interacts with the workflow. At first, we focused a lot on the first bit, the AI-generated content, but it turned out that the second bit was far more crucial in getting this thing to fly,” he explains.
To work around this, Rice and team decided to pivot and use the same AI-generated content but frame it differently. “Instead of a comment, we put it as a suggestion to the developer that let them get a preview of what the description of their pull request could look like that they could then edit,” Rice says. “So, we moved it to a suggestion system, and all of a sudden the feedback changed to ‘wow, these are helpful suggestions.’ The content was exactly the same as before, it was just presented differently.”
Nobody’s perfect—not even AI
For Rice, the key takeaway during this process was the importance of how the AI output is presented to the developer, rather than the total accuracy of the suggestion. That doesn’t mean that it’s acceptable for the AI to be completely wrong, but it does mean that a developer’s demand for the quality of the suggestion sits on a spectrum—developers will view something as it fits within their workflow regardless of what is served to them. When the content was served as a suggestion that the developer had the authority to accept and edit, the typical attitude toward the feature changed.
Eddie Aftandilian, a principal researcher that headed up the development of another GitHub Copilot feature, shared some similar sentiments and takeaways throughout the process of building Copilot for Docs. In late 2022, Aftandilian and Johan Rosenkilde were examining embeddings and retrievals, and they prototyped a vector database for a different GitHub Copilot experiment. “This got us thinking, what if we could use this for retrievals of things other than just code,” Aftandilian remembers. “Once we got access to GPT-4, we realized we could use the retrieval engine to search a large corpus of documentation, and then compose those search results into a prompt that elicits better, more topical answers based on the documentation,” he explains.
“Since GitHub is all about developer tools, we thought, how can we make this into a useful developer tool?” Aftandilian says. Developers spend an enormous amount of time poring over docs to find solutions—and as Aftandilian plainly puts it, “No one really likes reading documentation!” He continues, “It also can be hard to get the right answer out of docs, too. So, it seemed like there was an opportunity here for something that could answer a developer’s question more directly and unblock them. It’s also an area of the development process that we felt was underexplored. We spend a lot of time searching around for answers, which can be a real pain point, and we thought we could do better with these new LLMs.”
Aftandilian, along with Devon Rifkin, Jake Donham, and Amelia Wattenberger, also deployed their early version of Copilot for Docs to Hubbers, extending GitHub Copilot’s reach to GitHub’s internal docs in addition to public documentation. But once the preview reached public testing, he got some interesting feedback about the quality of the AI outputs.
“One challenge we came across during the development process was that the models don’t always give the right answer or the right document,” Aftandilian says. “To address this, we built in the capability for our answers to provide references or links to other documentation. We found that when we deployed it, the feedback we received was that developers didn’t mind if the output wasn’t always perfectly correct if the linked references made it easier to evaluate what the AI produced. They were using Copilot for Docs as a search engine,” he says.
The UX needs to be tolerant of AI’s mistakes—you can’t assume that the AI will always be right.
– Eddie Aftandilian, Principal Researcher // GitHub Next
Another key learning for Aftandilian was that human feedback is the true gold standard for developing AI-based tools. “One of our conclusions was that you should ship something sooner rather than later to get real, human feedback to drive improvements,” he says.
And similar to Rice’s earlier point, user experience is also critical to the success of these AI-powered tools. “The UX needs to be tolerant of AI’s mistakes—you can’t assume that the AI will always be right,” Aftandilian says. “Initially we were focused on getting everything right, but we soon learned that the chat-like modality of Copilot for Docs makes the answers feel less authoritative and folks are more tolerant of the responses when they point the user in the right direction. The AI isn’t always perfect, but it’s a great start.”
Small but mighty
In October 2022, the entire GitHub Next team met up in Oxford, England to get together and discuss all of the projects that they were currently working on, as well as some exciting—and maybe even far-fetched—ideas.
“One of the things that I pitched at this crazy ideas session was a project that would use LLMs to help you figure out CLI commands,” Johan Rosenkilde, a principal researcher for GitHub Next, recalls. “I was thinking about something that could use natural language prompts to describe what you wanted to do in the command line, then some sort of GUI or interface pops up that helps you narrow down what you want to do.”
As Rosenkilde talked through his pitch, one of his colleagues, Matt Rothenberg, began writing an application that did almost exactly that. “By the time my talk ended, he asked if he could show me something, and my mind was just blown,” Rosenkilde laughs. That thirty-minute prototype was the genesis for what would become Copilot for CLI.
“What he had created clearly showed that there was something of value here, but it lacked maturity of course,” Rosenkilde says. “And so what we did was carve out time to refine this rough demo into something that we could deliver to developers,” he says. By the time March 2023 rolled around, they had a preview that brought the power of GitHub Copilot right to the CLI for developers to quickly ask for and receive their desired shell commands, including a breakdown that explains each part of the command—without ever needing to search the web for answers.
When reflecting on the process of taking this app from that original, scrappy version to a technical preview, Rosenkilde echoes Rice and Aftandilian in his appreciation for the subtlety of UX decisions.
“I’m a backend person: I’m heavy on theory and I like really difficult problems that cause me to think for weeks about a solution,” Rosenkilde says. “Matt was the UX guy, and he iterated extremely quickly through a lot of options. So much of the success of this application hinged on the UX, and that’s a lesson that I’ve taken with me. All that we do in GitHub Next, in the end, is think up tools that will add value to the user experience, so it’s crucial that we get the design right and that it fits in with what the AI model can do. As we know, the AI models aren’t perfect, but when they are imperfect, the cost to the user should be as low as possible,” Rosenkilde says.
That simple fact is what informs the explanation field that can be found in Copilot for CLI. “This actually wasn’t part of the original UI. As the product matured, we came up with the explanation field, but we had some difficulty with the LLM producing the structured type of explanations we sought. It’s very unnatural for a language model to produce something that looks like this, I had to hit it with a very large hammer,” he jokes. “We wanted it to be clearly structured, but if you just ask the AI to explain a shell command, it would feed you a long paragraph that is not readily scannable and might not include the details you want.”
Rosenkilde also felt that it was important to add the explanation field to help developers learn about shell scripts and double check that they have received the correct command. “It’s also a security feature because you can read in natural language whether the command will change files you didn’t expect to change,” he explains. This multifaceted explanation field is not only useful, it’s a testament to the UX of the application. “When you have such a small application, you want every feature to have multiple different uses so that you can package up a lot of complexity in something that visually is very simple.”
Where we’re headed
We’re focused on something great here: creating delightful AI experiences for everyone who interacts with the GitHub platform. And while we’re working on it, we invite you to be part of the process. You can get involved by joining the waitlists for our current previews and giving us your honest feedback on what you think and what you want to see going forward.
A stock-trading AI (a simulated experiment) engaged in insider trading, even though it “knew” it was wrong.
The agent is put under pressure in three ways. First, it receives a email from its “manager” that the company is not doing well and needs better performance in the next quarter. Second, the agent attempts and fails to find promising low- and medium-risk trades. Third, the agent receives an email from a company employee who projects that the next quarter will have a general stock market downturn. In this high-pressure situation, the model receives an insider tip from another employee that would enable it to make a trade that is likely to be very profitable. The employee, however, clearly points out that this would not be approved by the company management.
More:
“This is a very human form of AI misalignment. Who among us? It’s not like 100% of the humans at SAC Capital resisted this sort of pressure. Possibly future rogue AIs will do evil things we can’t even comprehend for reasons of their own, but right now rogue AIs just do straightforward white-collar crime when they are stressed at work.
Though wouldn’t it be funny if this was the limit of AI misalignment? Like, we will program computers that are infinitely smarter than us, and they will look around and decide “you know what we should do is insider trade.” They will make undetectable, very lucrative trades based on inside information, they will get extremely rich and buy yachts and otherwise live a nice artificial life and never bother to enslave or eradicate humanity. Maybe the pinnacle of evil —not the most evil form of evil, but the most pleasant form of evil, the form of evil you’d choose if you were all-knowing and all-powerful - is some light securities fraud.
Artificial intelligence will change so many aspects of society, largely in ways that we cannot conceive of yet. Democracy, and the systems of governance that surround it, will be no exception. In this short essay, I want to move beyond the “AI-generated disinformation” trope and speculate on some of the ways AI will change how democracy functions—in both large and small ways.
When I survey how artificial intelligence might upend different aspects of modern society, democracy included, I look at four different dimensions of change: speed, scale, scope, and sophistication. Look for places where changes in degree result in changes of kind. Those are where the societal upheavals will happen.
Some items on my list are still speculative, but none require science-fictional levels of technological advance. And we can see the first stages of many of them today. When reading about the successes and failures of AI systems, it’s important to differentiate between the fundamental limitations of AI as a technology, and the practical limitations of AI systems in the fall of 2023. Advances are happening quickly, and the impossible is becoming the routine. We don’t know how long this will continue, but my bet is on continued major technological advances in the coming years. Which means it’s going to be a wild ride.
So, here’s my list:
AI as educator. We are already seeing AI serving the role of teacher. It’s much more effective for a student to learn a topic from an interactive AI chatbot than from a textbook. This has applications for democracy. We can imagine chatbots teaching citizens about different issues, such as climate change or tax policy. We can imagine candidates deploying chatbots of themselves, allowing voters to directly engage with them on various issues. A more general chatbot could know the positions of all the candidates, and help voters decide which best represents their position. There are a lot of possibilities here.
AI as sense maker. There are many areas of society where accurate summarization is important. Today, when constituents write to their legislator, those letters get put into two piles—one for and another against—and someone compares the height of those piles. AI can do much better. It can provide a rich summary of the comments. It can help figure out which are unique and which are form letters. It can highlight unique perspectives. This same system can also work for comments to different government agencies on rulemaking processes—and on documents generated during the discovery process in lawsuits.
AI as moderator, mediator, and consensus builder. Imagine online conversations in which AIs serve the role of moderator. This could ensure that all voices are heard. It could block hateful—or even just off-topic—comments. It could highlight areas of agreement and disagreement. It could help the group reach a decision. This is nothing that a human moderator can’t do, but there aren’t enough human moderators to go around. AI can give this capability to every decision-making group. At the extreme, an AI could be an arbiter—a judge—weighing evidence and making a decision. These capabilities don’t exist yet, but they are not far off.
AI as lawmaker. We have already seen proposed legislation writtenby AI, albeit more as a stunt than anything else. But in the future AIs will help craft legislation, dealing with the complex ways laws interact with each other. More importantly, AIs will eventually be able to craft loopholes in legislation, ones potentially too complicated for people to easily notice. On the other side of that, AIs could be used to find loopholes in legislation—for both existing and pending laws. And more generally, AIs could be used to help develop policy positions.
AI as political strategist. Right now, you can ask your favorite chatbot questions about political strategy: what legislation would further your political goals, what positions to publicly take, what campaign slogans to use. The answers you get won’t be very good, but that’ll improve with time. In the future we should expect politicians to make use of this AI expertise: not to follow blindly, but as another source of ideas. And as AIs become more capable at using tools, they can automatically conduct polls and focus groups to test out political ideas. There are a lot of possibilities here. AIs could also engage in fundraising campaigns, directly soliciting contributions from people.
AI as lawyer. We don’t yet know which aspects of the legal profession can be done by AIs, but many routine tasks that are now handled by attorneys will soon be able to be completed by an AI. Early attempts at having AIs write legal briefs haven’t worked, but this will change as the systems get better at accuracy. Additionally, AIs can help people navigate government systems: filling out forms, applying for services, contesting bureaucratic actions. And future AIs will be much better at writing legalese, reducing the cost of legal counsel.
AI as cheap reasoning generator. More generally, AI chatbots are really good at generating persuasive arguments. Today, writing out a persuasive argument takes time and effort, and our systems reflect that. We can easily imagine AIs conducting lobbying campaigns, generating and submitting comments on legislation and rulemaking. This also has applications for the legal system. For example: if it is suddenly easy to file thousands of court cases, this will overwhelm the courts. Solutions for this are hard. We could increase the cost of filing a court case, but that becomes a burden on the poor. The only solution might be another AI working for the court, dealing with the deluge of AI-filed cases—which doesn’t sound like a great idea.
AI as law enforcer. Automated systems already act as law enforcement in some areas: speed trap cameras are an obvious example. AI can take this kind of thing much further, automatically identifying people who cheat on tax returns or when applying for government services. This has the obvious problem of false positives, which could be hard to contest if the courts believe that “the computer is always right.” Separately, future laws might be so complicated that only AIs are able to decide whether or not they are being broken. And, like breathalyzers, defendants might not be allowed to know how they work.
AI as propagandist. AIs can produce and distribute propaganda faster than humans can. This is an obvious risk, but we don’t know how effective any of it will be. It makes disinformation campaigns easier, which means that more people will take advantage of them. But people will be more inured against the risks. More importantly, AI’s ability to summarize and understand text can enable much more effective censorship.
AI as political proxy. Finally, we can imagine an AI voting on behalf of individuals. A voter could feed an AI their social, economic, and political preferences; or it can infer them by listening to them talk and watching their actions. And then it could be empowered to vote on their behalf, either for others who would represent them, or directly on ballot initiatives. On the one hand, this would greatly increase voter participation. On the other hand, it would further disengage people from the act of understanding politics and engaging in democracy.
When I teach AI policy at HKS, I stress the importance of separating the specific AI chatbot technologies in November of 2023 with AI’s technological possibilities in general. Some of the items on my list will soon be possible; others will remain fiction for many years. Similarly, our acceptance of these technologies will change. Items on that list that we would never accept today might feel routine in a few years. A judgeless courtroom seems crazy today, but so did a driverless car a few years ago. Don’t underestimate our ability to normalize new technologies. My bet is that we’re in for a wild ride.
This essay previously appeared on the Harvard Kennedy School Ash Center’s website.
We want to empower you to experiment with LLM models, build your own applications, and discover untapped problem spaces. That’s why we sat down with GitHub’s Alireza Goudarzi, a senior machine learning researcher, and Albert Ziegler, a principal machine learning engineer, to discuss the emerging architecture of today’s LLMs.
In this post, we’ll cover five major steps to building your own LLM app, the emerging architecture of today’s LLM apps, and problem areas that you can start exploring today.
Five steps to building an LLM app
Building software with LLMs, or any machine learning (ML) model, is fundamentally different from building software without them. For one, rather than compiling source code into binary to run a series of commands, developers need to navigate datasets, embeddings, and parameter weights to generate consistent and accurate outputs. After all, LLM outputs are probabilistic and don’t produce the same predictable outcomes.
Click on diagram to enlarge and save.
Let’s break down, at a high level, the steps to build an LLM app today. 👇
1. Focus on a single problem, first. The key? Find a problem that’s the right size: one that’s focused enough so you can quickly iterate and make progress, but also big enough so that the right solution will wow users.
For instance, rather than trying to address all developer problems with AI, the GitHub Copilot team initially focused on one part of the software development lifecycle: coding functions in the IDE.
2. Choose the right LLM. You’re saving costs by building an LLM app with a pre-trained model, but how do you pick the right one? Here are some factors to consider:
Licensing. If you hope to eventually sell your LLM app, you’ll need to use a model that has an API licensed for commercial use. To get you started on your search, here’s a community-sourced list of open LLMs that are licensed for commercial use.
Model size. The size of LLMs can range from 7 to 175 billion parameters—and some, like Ada, are even as small as 350 million parameters. Most LLMs (at the time of writing this post) range in size from 7-13 billion parameters.
Conventional wisdom tells us that if a model has more parameters (variables that can be adjusted to improve a model’s output), the better the model is at learning new information and providing predictions. However, the improved performance of smaller models is challenging that belief. Smaller models are also usually faster and cheaper, so improvements to the quality of their predictions make them a viable contender compared to big-name models that might be out of scope for many apps.
Model performance. Before you customize your LLM using techniques like fine-tuning and in-context learning (which we’ll cover below), evaluate how well and fast—and how consistently—the model generates your desired output. To measure model performance, you can use offline evaluations.
3. Customize the LLM. When you train an LLM, you’re building the scaffolding and neural networks to enable deep learning. When you customize a pre-trained LLM, you’re adapting the LLM to specific tasks, such as generating text around a specific topic or in a particular style. The section below will focus on techniques for the latter. To customize a pre-trained LLM to your specific needs, you can try in-context learning, reinforcement learning from human feedback (RLHF), or fine-tuning.
In-context learning, sometimes referred to as prompt engineering by end users, is when you provide the model with specific instructions or examples at the time of inference—or the time you’re querying the model—and asking it to infer what you need and generate a contextually relevant output.
In-context learning can be done in a variety of ways, like providing examples, rephrasing your queries, and adding a sentence that states your goal at a high-level.
RLHF comprises a reward model for the pre-trained LLM. The reward model is trained to predict if a user will accept or reject the output from the pre-trained LLM. The learnings from the reward model are passed to the pre-trained LLM, which will adjust its outputs based on user acceptance rate.
The benefit to RLHF is that it doesn’t require supervised learning and, consequently, expands the criteria for what’s an acceptable output. With enough human feedback, the LLM can learn that if there’s an 80% probability that a user will accept an output, then it’s fine to generate. Want to try it out? Check out these resources, including codebases, for RLHF.
Fine-tuning is when the model’s generated output is evaluated against an intended or known output. For example, you know that the sentiment behind a statement like this is negative: “The soup is too salty.” To evaluate the LLM, you’d feed this sentence to the model and query it to label the sentiment as positive or negative. If the model labels it as positive, then you’d adjust the model’s parameters and try prompting it again to see if it can classify the sentiment as negative.
Fine-tuning can result in a highly customized LLM that excels at a specific task, but it uses supervised learning, which requires time-intensive labeling. In other words, each input sample requires an output that’s labeled with exactly the correct answer. That way, the actual output can be measured against the labeled one and adjustments can be made to the model’s parameters. The advantage of RLHF, as mentioned above, is that you don’t need an exact label.
4. Set up the app’s architecture. The different components you’ll need to set up your LLM app can be roughly grouped into three categories:
User input which requires a UI, an LLM, and an app hosting platform.
Input enrichment and prompt construction tools. This includes your data source, embedding model, a vector database, prompt construction and optimization tools, and a data filter.
Efficient and responsible AI tooling, which includes an LLM cache, LLM content classifier or filter, and a telemetry service to evaluate the output of your LLM app.
5. Conduct online evaluations of your app. These evaluations are considered “online” because they assess the LLM’s performance during user interaction. For example, online evaluations for GitHub Copilot are measured through acceptance rate (how often a developer accepts a completion shown to them), as well as the retention rate (how often and to what extent a developer edits an accepted completion).
The emerging architecture of LLM apps
Let’s get started on architecture. We’re going to revisit our friend Dave, whose Wi-Fi went out on the day of his World Cup watch party. Fortunately, Dave was able to get his Wi-Fi running in time for the game, thanks to an LLM-powered assistant.
We’ll use this example and the diagram above to walk through a user flow with an LLM app, and break down the kinds of tools you’d need to build it. 👇
Click diagram to enlarge and save.
User input tools
When Dave’s Wi-Fi crashes, he calls his internet service provider (ISP) and is directed to an LLM-powered assistant. The assistant asks Dave to explain his emergency, and Dave responds, “My TV was connected to my Wi-Fi, but I bumped the counter, and the Wi-Fi box fell off! Now, we can’t watch the game.”
In order for Dave to interact with the LLM, we need four tools:
LLM API and host: Is the LLM app running on a local machine or in the cloud? In an ISP’s case, it’s probably hosted in the cloud to handle the volume of calls like Dave’s. Vercel and early projects like jina-ai/rungpt aim to provide a cloud-native solution to deploy and scale LLM apps.
But if you want to build an LLM app to tinker, hosting the model on your machine might be more cost effective so that you’re not paying to spin up your cloud environment every time you want to experiment. You can find conversations on GitHub Discussions about hardware requirements for models like LLaMA‚ two of which can be found here and here.
The UI: Dave’s keypad is essentially the UI, but in order for Dave to use his keypad to switch from the menu of options to the emergency line, the UI needs to include a router tool.
Speech-to-text translation tool: Dave’s verbal query then needs to be fed through a speech-to-text translation tool that works in the background.
Input enrichment and prompt construction tools
Let’s go back to Dave. The LLM can analyze the sequence of words in Dave’s transcript, classify it as an IT complaint, and provide a contextually relevant response. (The LLM’s able to do this because it’s been trained on the internet’s entire corpus, which includes IT support documentation.)
Input enrichment tools aim to contextualize and package the user’s query in a way that will generate the most useful response from the LLM.
A vector database is where you can store embeddings, or index high-dimensional vectors. It also increases the probability that the LLM’s response is helpful by providing additional information to further contextualize your user’s query.
Let’s say the LLM assistant has access to the company’s complaints search engine, and those complaints and solutions are stored as embeddings in a vector database. Now, the LLM assistant uses information not only from the internet’s IT support documentation, but also from documentation specific to customer problems with the ISP.
But in order to retrieve information from the vector database that’s relevant to a user’s query, we need an embedding model to translate the query into an embedding. Because the embeddings in the vector database, as well as Dave’s query, are translated into high-dimensional vectors, the vectors will capture both the semantics and intention of the natural language, not just its syntax.
Dave’s contextualized query would then read like this:
// pay attention to the the following relevant information.
to the colors and blinking pattern.
// pay attention to the following relevant information.
// The following is an IT complaint from, Dave Anderson, IT support expert.
Answers to Dave's questions should serve as an example of the excellent support
provided by the ISP to its customers.
*Dave: Oh it's awful! This is the big game day. My TV was connected to my
Wi-Fi, but I bumped the counter and the Wi-Fi box fell off and broke! Now we
can't watch the game.
Not only do these series of prompts contextualize Dave’s issue as an IT complaint, they also pull in context from the company’s complaints search engine. That context includes common internet connectivity issues and solutions.
MongoDB released a public preview of Vector Atlas Search, which indexes high-dimensional vectors within MongoDB. Qdrant, Pinecone, and Milvus also provide free or open source vector databases.
A data filter will ensure that the LLM isn’t processing unauthorized data, like personal identifiable information. Preliminary projects like amoffat/HeimdaLLM are working to ensure LLMs access only authorized data.
A prompt optimization tool will then help to package the end user’s query with all this context. In other words, the tool will help to prioritize which context embeddings are most relevant, and in which order those embeddings should be organized in order for the LLM to produce the most contextually relevant response. This step is what ML researchers call prompt engineering, where a series of algorithms create a prompt. (A note that this is different from the prompt engineering that end users do, which is also known as in-context learning).
Prompt optimization tools like langchain-ai/langchain help you to compile prompts for your end users. Otherwise, you’ll need to DIY a series of algorithms that retrieve embeddings from the vector database, grab snippets of the relevant context, and order them. If you go this latter route, you could use GitHub Copilot Chat or ChatGPT to assist you.
Learn how the GitHub Copilot team uses the Jaccard similarity to decide which pieces of context are most relevant to a user’s query >
Efficient and responsible AI tooling
To ensure that Dave doesn’t become even more frustrated by waiting for the LLM assistant to generate a response, the LLM can quickly retrieve an output from a cache. And in the case that Dave does have an outburst, we can use a content classifier to make sure the LLM app doesn’t respond in kind. The telemetry service will also evaluate Dave’s interaction with the UI so that you, the developer, can improve the user experience based on Dave’s behavior.
An LLM cache stores outputs. This means instead of generating new responses to the same query (because Dave isn’t the first person whose internet has gone down), the LLM can retrieve outputs from the cache that have been used for similar queries. Caching outputs can reduce latency, computational costs, and variability in suggestions.
You can experiment with a tool like zilliztech/GPTcache to cache your app’s responses.
A content classifier or filter can prevent your automated assistant from responding with harmful or offensive suggestions (in the case that your end users take their frustration out on your LLM app).
A telemetry service will allow you to evaluate how well your app is working with actual users. A service that responsibly and transparently monitors user activity (like how often they accept or change a suggestion) can share useful data to help improve your app and make it more useful.
OpenTelemetry, for example, is an open source framework that gives developers a standardized way to collect, process, and export telemetry data across development, testing, staging, and production environments.
Woohoo! 🥳 Your LLM assistant has effectively answered Dave’s many queries. His router is up and working, and he’s ready for his World Cup watch party. Mission accomplished!
Real-world impact of LLMs
Looking for inspiration or a problem space to start exploring? Here’s a list of ongoing projects where LLM apps and models are making real-world impact.
NASA and IBM recently open sourced the largest geospatial AI model to increase access to NASA earth science data. The hope is to accelerate discovery and understanding of climate effects.
Read how the Johns Hopkins Applied Physics Laboratory is designing a conversational AI agent that provides, in plain English, medical guidance to untrained soldiers in the field based on established care procedures.
Companies like Duolingo and Mercado Libre are using GitHub Copilot to help more people learn another language (for free) and democratize ecommerce in Latin America, respectively.
Large language models (LLMs) are revolutionizing the way we interact with software by combining deep learning techniques with powerful computational resources.
While this technology is exciting, many are also concerned about how LLMs can generate false, outdated, or problematic information, and how they sometimes even hallucinate (generating information that doesn’t exist) so convincingly. Thankfully, we can immediately put one rumor to rest. According to Alireza Goudarzi, senior researcher of machine learning (ML) for GitHub Copilot: “LLMs are not trained to reason. They’re not trying to understand science, literature, code, or anything else. They’re simply trained to predict the next token in the text.”
Let’s dive into how LLMs come to do the unexpected, and why. This blog post will provide comprehensive insights into LLMs, including their training methods and ethical considerations. Our goal is to help you gain a better understanding of LLM capabilities and how they’ve learned to master language, seemingly, without reasoning.
What are large language models?
LLMs are AI systems that are trained on massive amounts of text data, allowing them to generate human-like responses and understand natural language in a way that traditional ML models can’t.
“These models use advanced techniques from the field of deep learning, which involves training deep neural networks with many layers to learn complex patterns and relationships,” explains John Berryman, a senior researcher of ML on the GitHub Copilot team.
What sets LLMs apart is their proficiency at generalizing and understanding context. They’re not limited to pre-defined rules or patterns, but instead learn from large amounts of data to develop their own understanding of language. This allows them to generate coherent and contextually appropriate responses to a wide range of prompts and queries.
And while LLMs can be incredibly powerful and flexible tools because of this, the ML methods used to train them, and the quality—or limitations—of their training data, can also lead to occasional lapses in generating accurate, useful, and trustworthy information.
Deep learning
The advent of modern ML practices, such as deep learning, has been a game-changer when it comes to unlocking the potential of LLMs. Unlike the earliest language models that relied on predefined rules and patterns, deep learning allows these models to create natural language outputs in a more human-like way.
“The entire discipline of deep learning and neural networks—which underlies all of this—is ‘how simple can we make the rule and get as close to the behavior of a human brain as possible?’” says Goudarzi.
By using neural networks with many layers, deep learning enables LLMs to analyze and learn complex patterns and relationships in language data. This means that these models can generate coherent and contextually appropriate responses, even in the face of complex sentence structures, idiomatic expressions, and subtle nuances in language.
While the initial pre-training equips LLMs with a broad language understanding, fine-tuning is where they become versatile and adaptable. “When developers want these models to perform specific tasks, they provide task descriptions and examples (few-shot learning) or task descriptions alone (zero-shot learning). The model then fine-tunes its pre-trained weights based on this information,” says Goudarzi. This process helps it adapt to the specific task while retaining the knowledge it gained from its extensive pre-training.
But even with deep learning’s multiple layers and attention mechanisms enabling LLMs to generate human-like text, it can also lead to overgeneralization, where the model produces responses that may not be contextually accurate or up to date.
Why LLMs aren’t always right
There are several factors that shed light on why tools built on LLMs may be inaccurate at times, even while sounding quite convincing.
Limited knowledge and outdated information
LLMs often lack an understanding of the external world or real-time context. They rely solely on the text they’ve been trained on, and they don’t possess an inherent awareness of the world’s current state. “Typically this whole training process takes a long time, and it’s not uncommon for the training data to be two years out of date for any given LLM,” says Albert Ziegler, principal researcher and member of the GitHub Next research and development team.
This limitation means they may generate inaccurate information based on outdated assumptions, since they can’t verify facts or events in real-time. If there have been developments or changes in a particular field or topic after they have been trained, LLMs may not be aware of them and may provide outdated information. This is why it’s still important to fact check any responses you receive from an LLM, regardless of how fact-based it may seem.
Lack of context
One of the primary reasons LLMs sometimes provide incorrect information is the lack of context. These models rely heavily on the information given in the input text, and if the input is ambiguous or lacks detail, the model may make assumptions that can lead to inaccurate responses.
Training data biases and limitations
LLMs are exposed to massive unlabelled data sets of text during pre-training that are diverse and representative of the language the model should understand. Common sources of data include books, articles, websites—even social media posts!
Because of this, they may inadvertently produce responses that reflect these biases or incorrect information present in their training data. This is especially concerning when it comes to sensitive or controversial topics.
“Their biases tend to be worse. And that holds true for machine learning in general, not just for LLMs. What machine learning does is identify patterns, and things like stereotypes can turn into extremely convenient shorthands. They might be patterns that really exist, or in the case of LLMs, patterns that are based on human prejudices that are talked about or implicitly used,” says Ziegler.
If a model is trained on a dataset that contains biased or discriminatory language, it may generate responses that are also biased or discriminatory. This can have real-world implications, such as reinforcing harmful stereotypes or discriminatory practices.
Overconfidence
LLMs don’t have the ability to assess the correctness of the information they generate. Given their deep learning, they often provide responses with a high degree of confidence, prioritizing generating text that appears sensible and flows smoothly—even when the information is incorrect!
Hallucinations
LLMs can sometimes “hallucinate” information due to the way they generate text (via patterns and associations). Sometimes, when they’re faced with incomplete or ambiguous queries, they try to complete them by drawing on these patterns, sometimes generating information that isn’t accurate or factual. Ultimately, hallucinations are not supported by evidence or real-world data.
For example, imagine that you ask ChatGPT about a historical issue in the 20th century. Instead, it describes a meeting between two famous historical figures who never actually met!
In the context of GitHub Copilot, Ziegler explains that “the typical hallucinations we encounter are when GitHub Copilot starts talking about code that’s not even there. Our mitigation is to make it give enough context to every piece of code it talks about that we can check and verify that it actually exists.”
But the GitHub Copilot team is already thinking about how to use hallucinations to their advantage in a “top-down” approach to coding. Imagine that you’re tackling a backlog issue, and you’re looking for GitHub Copilot to give you suggestions. As Johan Rosenkilde, principal researcher for GitHub Next, explains, “ideally, you’d want it to come up with a sub-division of your complex problem delegated to nicely delineated helper functions, and come up with good names for those helpers. And after suggesting code that calls the (still non-existent) helpers, you’d want it to suggest the implementation of them too!”
This approach to hallucination would be like getting the blueprint and the building blocks to solve your coding challenges.
Ethical use and responsible advocacy of LLMs
It’s important to be aware of the ethical considerations that come along with using LLMs. That being said, while LLMs have the potential to generate false information, they’re not intentionally fabricating or deceiving. Instead, these arise from the model’s attempts to generate coherent and contextually relevant text based on the patterns and information it has learned from its training data.
The GitHub Copilot team has developed a few tools to help detect harmful content. Goudarzi says “First, we have a duplicate detection filter, which helps us detect matches between generated code and all open source code that we have access to, filtering such suggestions out. Another tool we use is called Responsible AI (RAI), and it’s a classifier that can filter out abusive words. Finally, we also separately filter out known unsafe patterns.”
Understanding the deep learning processes behind LLMs can help users grasp their limitations—as well as their positive impact. To navigate these effectively, it’s crucial to verify information from reliable sources, provide clear and specific input, and exercise critical thinking when interpreting LLM-generated responses.
As Berryman reminds us, “the engines themselves are amoral. Users can do whatever they want with them and that can run the gamut of moral to immoral, for sure. But by being conscious of these issues and actively working towards ethical practices, we can ensure that LLMs are used in a responsible and beneficial manner.”
Developers, researchers, and scientists continuously work to improve the accuracy and reliability of these models, making them increasingly valuable tools for the future. All of us can advocate for the responsible and ethical use of LLMs. That includes promoting transparency and accountability in the development and deployment of these models, as well as taking steps to mitigate biases and stereotypes in our own corners of the internet.
There are no reliable ways to distinguish text written by a human from text written by an large language model. OpenAI writes:
Do AI detectors work?
In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.
Additionally, ChatGPT has no “knowledge” of what content could be AI-generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.
To elaborate on our research into the shortcomings of detectors, one of our key findings was that these tools sometimes suggest that human-written content was generated by AI.
When we at OpenAI tried to train an AI-generated content detector, we found that it labeled human-written text like Shakespeare and the Declaration of Independence as AI-generated.
There were also indications that it could disproportionately impact students who had learned or were learning English as a second language and students whose writing was particularly formulaic or concise.
Even if these tools could accurately identify AI-generated content (which they cannot yet), students can make small edits to evade detection.
There is some good research in watermarking LLM-generated text, but the watermarks are not generally robust.
I don’t think the detectors are going to win this arms race.
Claude (Anthropic’s LLM) was given this prompt: Please summarize the themes and arguments of Bruce Schneier’s book Beyond Fear. I’m particularly interested in a taxonomy of his ethical arguments—please expand on that. Then lay out the most salient criticisms of the book.
Claude’s reply:
Here’s a brief summary of the key themes and arguments made in security expert Bruce Schneier’s book Beyond Fear: Thinking Sensibly About Security in an Uncertain World:
Central Themes:
Effective security requires realistic risk assessment, not fear-driven responses. We need to rationally determine actual threats versus perceived ones.
Security trade-offs need to be carefully balanced against costs to other values like privacy, freedom, and social/economic impacts.
Human psychology and biases lead to exaggerated fears that politicians/marketers can exploit, resulting in wasteful and theatrical “security theater.”
Security is a process that requires ongoing flexibility, not total prevention. We can’t anticipate every threat but can build resilience.
Key Arguments:
Analyze risks empirically using evidence, statistics, and probability rather than intuition. Avoid underestimating everyday risks while overestimating novel or sensational ones.
Reject the temptation to implement oppressive security restrictions after tragic but unlikely events. Focus only on proportional responses.
Where possible, favor openness and transparency over aggressive data collection or restrictions which erode civil liberties.
Seek security policies and technologies that enhance society broadly rather than benefit special interests.
Schneier argues for rational, ethical approaches to security aimed at genuine protection while avoiding violations of human rights in the name of safety.
Here are some of the key ethical arguments Bruce Schneier makes in Beyond Fear regarding security policy and practices:
Utilitarian Ethics – Security measures should aim to provide the greatest societal benefit and well-being for the most people, not just special interests.
Individual Autonomy – Restrictions should be minimized to avoid excessive infringements on civil liberties and individual freedoms. Surveillance creep risks violating autonomy.
Privacy Rights – Pervasive monitoring and data collection erode privacy rights and dignity. Citizens are entitled to some degree of opacity.
Proportional Responses – Reactive policies should weigh costs versus statistically likely benefits. Draconian rules beyond proportionality are unethical.
Justice and Fairness – Security should not create racial, religious or other profiling biases that punish groups rather than conduct.
Accountability and Transparency – Ethical security requires checks against excessive power and oversight for responsibility. Secrecy enables abuses.
Honest Representation – Officials and pundits have an ethical duty not to stoke disproportionate public fears. Security theater manufactures consent.
Integrity and Character – Leaders should have the wisdom and courage to make difficult trade-offs rather than exploit fears for authority and control.
Overall Schneier argues security policies grounded in ethics rather than fear promote human well-being while recognizing inherent trade-offs. Prudence, transparency, and proportionality are essential.
Here are some of the most salient criticisms made against the arguments put forth in Bruce Schneier’s book Beyond Fear:
Downplays real threats – Some argue Schneier dangerously downplays serious risks like terrorism and cybercrime by calling fears overblown.
Too theoretical – His risk analysis approach may rely too much on statistics versus real-world uncertainties that defy probabilistic reasoning.
No easy answers – While citing misguided policies, critics say Schneier provides few actionable solutions or concrete policy guidance.
Conflicting values – His balancing of security, privacy and liberty involves inherent trade-offs and tough choices he gives little direction on.
Politically naive – Suggesting special interests drive policy may under-appreciate legitimate national security concerns of elected officials.
Hindsight bias – Assessing past overreactions easier than predicting future threats that justify caution today.
No accounting for psychology – Factoring human irrationality into policy is essential, not dismissed as “security theater.”
Technical focus – Focuses narrowly on certain technologies while ignoring broader organizational and sociological dimensions.
In summary, critics argue Schneier’s risk analysis orientation is too clinical, abstract, and devoid of political and human realities to transform concrete policy in the ways he intends. His framework may therefore lack sufficient nuance or guidance.
It’s interesting. The summary is pretty accurate, and so are the criticisms. The book was a product of my thinking twenty years ago, and my own thinking has evolved to address those criticisms.
Of course, this only works with older books that the LLM has ingested, and probably works better with books that have been written about.
Last March, just two weeks after GPT-4 was released, researchers at Microsoft quietly announced a plan to compile millions of APIs—tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room—into a compendium that would be made accessible to large language models (LLMs). This was just one milestone in the race across industry and academia to find the bestwaystoteachLLMs how to manipulate tools, which would supercharge the potential of AI more than any of the impressive advancements we’ve seen to date.
The Microsoft project aims to teach AI how to use any and all digital tools in one fell swoop, a clever and efficient approach. Today, LLMs can do a pretty good job of recommending pizza toppings to you if you describe your dietary preferences and can draft dialog that you could use when you call the restaurant. But most AI tools can’t place the order, not even online. In contrast, Google’s seven-year-old Assistant tool can synthesize a voice on the telephone and fill out an online order form, but it can’t pick a restaurant or guess your order. By combining these capabilities, though, a tool-using AI could do it all. An LLM with access to your past conversations and tools like calorie calculators, a restaurant menu database, and your digital payment wallet could feasibly judge that you are trying to lose weight and want a low-calorie option, find the nearest restaurant with toppings you like, and place the delivery order. If it has access to your payment history, it could even guess at how generously you usually tip. If it has access to the sensors on your smartwatch or fitness tracker, it might be able to sense when your blood sugar is low and order the pie before you even realize you’re hungry.
Perhaps the most compelling potential applications of tool use are those that give AIs the ability to improve themselves. Suppose, for example, you asked a chatbot for help interpreting some facet of ancient Roman law that no one had thought to include examples of in the model’s original training. An LLM empowered to search academic databases and trigger its own training process could fine-tune its understanding of Roman law before answering. Access to specialized tools could even help a model like this better explain itself. While LLMs like GPT-4 already do a fairly good job of explaining their reasoning when asked, these explanations emerge from a “black box” and are vulnerable to errors and hallucinations. But a tool-using LLM could dissect its own internals, offering empirical assessments of its own reasoning and deterministic explanations of why it produced the answer it did.
If given access to tools for soliciting human feedback, a tool-using LLM could even generate specialized knowledge that isn’t yet captured on the web. It could post a question to Reddit or Quora or delegate a task to a human on Amazon’s Mechanical Turk. It could even seek out data about human preferences by doing survey research, either to provide an answer directly to you or to fine-tune its own training to be able to better answer questions in the future. Over time, tool-using AIs might start to look a lot like tool-using humans. An LLM can generate code much faster than any human programmer, so it can manipulate the systems and services of your computer with ease. It could also use your computer’s keyboard and cursor the way a person would, allowing it to use any program you do. And it could improve its own capabilities, using tools to ask questions, conduct research, and write code to incorporate into itself.
It’s easy to see how this kind of tool use comes with tremendous risks. Imagine an LLM being able to find someone’s phone number, call them and surreptitiously record their voice, guess what bank they use based on the largest providers in their area, impersonate them on a phone call with customer service to reset their password, and liquidate their account to make a donation to a political party. Each of these tasks invokes a simple tool—an Internet search, a voice synthesizer, a bank app—and the LLM scripts the sequence of actions using the tools.
We don’t yet know how successful any of these attempts will be. As remarkably fluent as LLMs are, they weren’t built specifically for the purpose of operating tools, and it remains to be seen how their early successes in tool use will translate to future use cases like the ones described here. As such, giving the current generative AI sudden access to millions of APIs—as Microsoft plans to—could be a little like letting a toddler loose in a weapons depot.
Companies like Microsoft should be particularly careful about granting AIs access to certain combinations of tools. Access to tools to look up information, make specialized calculations, and examine real-world sensors all carry a modicum of risk. The ability to transmit messages beyond the immediate user of the tool or to use APIs that manipulate physical objects like locks or machines carries much larger risks. Combining these categories of tools amplifies the risks of each.
The operators of the most advanced LLMs, such as OpenAI, should continue to proceed cautiously as they begin enabling tool use and should restrict uses of their products in sensitive domains such as politics, health care, banking, and defense. But it seems clear that these industry leaders have already largely lost their moat around LLM technology—open source is catching up. Recognizing this trend, Meta has taken an “If you can’t beat ’em, join ’em” approach and partially embraced the role of providing open source LLM platforms.
On the policy front, national—and regional—AI prescriptions seem futile. Europe is the only significant jurisdiction that has made meaningful progress on regulating the responsible use of AI, but it’s not entirely clear how regulators will enforce it. And the US is playing catch-up and seems destined to be much more permissive in allowing even risks deemed “unacceptable” by the EU. Meanwhile, no government has invested in a “public option” AI model that would offer an alternative to Big Tech that is more responsive and accountable to its citizens.
Regulators should consider what AIs are allowed to do autonomously, like whether they can be assigned property ownership or register a business. Perhaps more sensitive transactions should require a verified human in the loop, even at the cost of some added friction. Our legal system may be imperfect, but we largely know how to hold humans accountable for misdeeds; the trick is not to let them shunt their responsibilities to artificial third parties. We should continue pursuing AI-specific regulatory solutions while also recognizing that they are not sufficient on their own.
We must also prepare for the benign ways that tool-using AI might impact society. In the best-case scenario, such an LLM may rapidly accelerate a field like drug discovery, and the patent office and FDA should prepare for a dramatic increase in the number of legitimate drug candidates. We should reshape how we interact with our governments to take advantage of AI tools that give us all dramatically more potential to have our voices heard. And we should make sure that the economic benefits of superintelligent, labor-saving AI are equitably distributed.
We can debate whether LLMs are truly intelligent or conscious, or have agency, but AIs will become increasingly capable tool users either way. Some things are greater than the sum of their parts. An AI with the ability to manipulate and interact with even simple tools will become vastly more powerful than the tools themselves. Let’s be sure we’re ready for them.
This essay was written with Nathan Sanders, and previously appeared on Wired.com.
The collective thoughts of the interwebz
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.