Tag Archives: academic papers

Evaluating the Effectiveness of Reward Modeling of Generative AI Systems

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/09/evaluating-the-effectiveness-of-reward-modeling-of-generative-ai-systems-2.html

New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL: Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning human values:

Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them ­ a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment.

YubiKey Side-Channel Attack

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/09/yubikey-side-channel-attack.html

There is a side-channel attack against YubiKey access tokens that allows someone to clone a device. It’s a complicated attack, requiring the victim’s username and password, and physical access to their YubiKey—as well as some technical expertise and equipment.

Still, nice piece of security analysis.

Hacking Wireless Bicycle Shifters

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/hacking-wireless-bicycle-shifters.html

This is yet another insecure Internet-of-things story, this one about wireless gear shifters for bicycles. These gear shifters are used in big-money professional bicycle races like the Tour de France, which provides an incentive to actually implement this attack.

Research paper. Another news story.

Slashdot thread.

On the Voynich Manuscript

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/on-the-voynich-manuscript.html

Really interesting article on the ancient-manuscript scholars who are applying their techniques to the Voynich Manuscript.

No one has been able to understand the writing yet, but there are some new understandings:

Davis presented her findings at the medieval-studies conference and published them in 2020 in the journal Manuscript Studies. She had hardly solved the Voynich, but she’d opened it to new kinds of investigation. If five scribes had come together to write it, the manuscript was probably the work of a community, rather than of a single deranged mind or con artist. Why the community used its own language, or code, remains a mystery. Whether it was a cloister of alchemists, or mad monks, or a group like the medieval Béguines—a secluded order of Christian women—required more study. But the marks of frequent use signaled that the manuscript served some routine, perhaps daily function.

Davis’s work brought like-minded scholars out of hiding. In just the past few years, a Yale linguist named Claire Bowern had begun performing sophisticated analyses of the text, building on the efforts of earlier scholars and on methods Bowern had used with undocumented Indigenous languages in Australia. At the University of Malta, computer scientists were figuring out how to analyze the Voynich with tools for natural-language processing. Researchers found that the manuscript’s roughly 38,000 words—and 9,000-word vocabulary—had many of the statistical hallmarks of actual language. The Voynich’s most common word, whatever it meant, appeared roughly twice as often as the second-most-common word and three times as often as the third-commonest, and so on—a touchstone of natural language known as Zipf’s law. The mix of word lengths and the ratio of unique words to total words were similarly language-like. Certain words, moreover, seemed to follow one another in predictable order, a possible sign of grammar.

Finally, each of the text’s sections—as defined by the drawings of plants, stars, bathing women, and so on—had different sets of overrepresented words, just as one would expect in a real book whose chapters focused on different subjects.

Spelling was the chief aberration. The Voynich alphabet—if that’s what it was—appeared to have a conventional 20-odd letters. But compared with known languages, too many of those letters repeated in the same order, both within words and across neighboring words, like a children’s rhyme. In some places, the spellings of adjacent words so converged that a single word repeated two or three times in a row. A rough English equivalent might be something akin to “She sells sea shells by the sea shore.” Another possibility, Bowern told me, was something like pig Latin, or the Yiddishism—known as “shm-reduplication”—that begets phrases such as fancy shmancy and rules shmules.

Taxonomy of Generative AI Misuse

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/taxonomy-of-generative-ai-misuse.html

Interesting paper: “Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data”:

Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.

Blog post. Note the graphic mapping goals with strategies.

New Research in Detecting AI-Generated Videos

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/new-research-in-detecting-ai-generated-videos.html

The latest in what will be a continuing arms race between creating and detecting videos:

The new tool the research project is unleashing on deepfakes, called “MISLnet”, evolved from years of data derived from detecting fake images and video with tools that spot changes made to digital video or images. These may include the addition or movement of pixels between frames, manipulation of the speed of the clip, or the removal of frames.

Such tools work because a digital camera’s algorithmic processing creates relationships between pixel color values. Those relationships between values are very different in user-generated or images edited with apps like Photoshop.

But because AI-generated videos aren’t produced by a camera capturing a real scene or image, they don’t contain those telltale disparities between pixel values.

The Drexel team’s tools, including MISLnet, learn using a method called a constrained neural network, which can differentiate between normal and unusual values at the sub-pixel level of images or video clips, rather than searching for the common indicators of image manipulation like those mentioned above.

Research paper.

RADIUS Vulnerability

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/radius-vulnerability.html

New attack against the RADIUS authentication protocol:

The Blast-RADIUS attack allows a man-in-the-middle attacker between the RADIUS client and server to forge a valid protocol accept message in response to a failed authentication request. This forgery could give the attacker access to network devices and services without the attacker guessing or brute forcing passwords or shared secrets. The attacker does not learn user credentials.

This is one of those vulnerabilities that comes with a cool name, its own website, and a logo.

News article. Research paper.

Model Extraction from Neural Networks

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/model-extraction-from-neural-networks.html

A new paper, “Polynomial Time Cryptanalytic Extraction of Neural Network Models,” by Adi Shamir and others, uses ideas from differential cryptanalysis to extract the weights inside a neural network using specific queries and their results. This is much more theoretical than practical, but it’s a really interesting result.

Abstract:

Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons). In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and about 1.2 million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2^256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer.

Using LLMs to Exploit Vulnerabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/using-llms-to-exploit-vulnerabilities.html

Interesting research: “Teams of LLM Agents can Exploit Zero-Day Vulnerabilities.”

Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).

In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5×.

The LLMs aren’t finding new vulnerabilities. They’re exploiting zero-days—which means they are not trained on them—in new ways. So think about this sort of thing combined with another AI that finds new vulnerabilities in code.

These kinds of developments are important to follow, as they are part of the puzzle of a fully autonomous AI cyberattack agent. I talk about this sort of thing more here.

LLMs Acting Deceptively

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/llms-acting-deceptively.html

New research: “Deception abilities emerged in large language models“:

Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P < 0.001). In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time (P < 0.001) when augmented with chain-of-thought reasoning. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

Exploiting Mistyped URLs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/exploiting-mistyped-urls.html

Interesting research: “Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom Domains“:

Abstract: Web users often follow hyperlinks hastily, expecting them to be correctly programmed. However, it is possible those links contain typos or other mistakes. By discovering active but erroneous hyperlinks, a malicious actor can spoof a website or service, impersonating the expected content and phishing private information. In “typosquatting,” misspellings of common domains are registered to exploit errors when users mistype a web address. Yet, no prior research has been dedicated to situations where the linking errors of web publishers (i.e. developers and content contributors) propagate to users. We hypothesize that these “hijackable hyperlinks” exist in large quantities with the potential to generate substantial traffic. Analyzing large-scale crawls of the web using high-performance computing, we show the web currently contains active links to more than 572,000 dot-com domains that have never been registered, what we term ‘phantom domains.’ Registering 51 of these, we see 88% of phantom domains exceeding the traffic of a control domain, with up to 10 times more visits. Our analysis shows that these links exist due to 17 common publisher error modes, with the phantom domains they point to free for anyone to purchase and exploit for under $20, representing a low barrier to entry for potential attackers.

Privacy Implications of Tracking Wireless Access Points

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/privacy-implications-of-tracking-wireless-access-points.html

Brian Krebs reports on research into geolocating routers:

Apple and the satellite-based broadband service Starlink each recently took steps to address new research into the potential security and privacy implications of how their services geolocate devices. Researchers from the University of Maryland say they relied on publicly available data from Apple to track the location of billions of devices globally—including non-Apple devices like Starlink systems—and found they could use this data to monitor the destruction of Gaza, as well as the movements and in many cases identities of Russian and Ukrainian troops.

Really fascinating implications to this research.

Research paper: “Surveilling the Masses with Wi-Fi-Based Positioning Systems:

Abstract: Wi-Fi-based Positioning Systems (WPSes) are used by modern mobile devices to learn their position using nearby Wi-Fi access points as landmarks. In this work, we show that Apple’s WPS can be abused to create a privacy threat on a global scale. We present an attack that allows an unprivileged attacker to amass a worldwide snapshot of Wi-Fi BSSID geolocations in only a matter of days. Our attack makes few assumptions, merely exploiting the fact that there are relatively few dense regions of allocated MAC address space. Applying this technique over the course of a year, we learned the precise
locations of over 2 billion BSSIDs around the world.

The privacy implications of such massive datasets become more stark when taken longitudinally, allowing the attacker to track devices’ movements. While most Wi-Fi access points do not move for long periods of time, many devices—like compact travel routers—are specifically designed to be mobile.

We present several case studies that demonstrate the types of attacks on privacy that Apple’s WPS enables: We track devices moving in and out of war zones (specifically Ukraine and Gaza), the effects of natural disasters (specifically the fires in Maui), and the possibility of targeted individual tracking by proxy—all by remotely geolocating wireless access points.

We provide recommendations to WPS operators and Wi-Fi access point manufacturers to enhance the privacy of hundreds of millions of users worldwide. Finally, we detail our efforts at responsibly disclosing this privacy vulnerability, and outline some mitigations that Apple and Wi-Fi access point manufacturers have implemented both independently and as a result of our work.

On the Zero-Day Market

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/on-the-zero-day-market.html

New paper: “Zero Progress on Zero Days: How the Last Ten Years Created the Modern Spyware Market“:

Abstract: Spyware makes surveillance simple. The last ten years have seen a global market emerge for ready-made software that lets governments surveil their citizens and foreign adversaries alike and to do so more easily than when such work required tradecraft. The last ten years have also been marked by stark failures to control spyware and its precursors and components. This Article accounts for and critiques these failures, providing a socio-technical history since 2014, particularly focusing on the conversation about trade in zero-day vulnerabilities and exploits. Second, this Article applies lessons from these failures to guide regulatory efforts going forward. While recognizing that controlling this trade is difficult, I argue countries should focus on building and strengthening multilateral coalitions of the willing, rather than on strong-arming existing multilateral institutions into working on the problem. Individually, countries should focus on export controls and other sanctions that target specific bad actors, rather than focusing on restricting particular technologies. Last, I continue to call for transparency as a key part of oversight of domestic governments’ use of spyware and related components.

New Attack Against Self-Driving Car AI

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/new-attack-against-self-driving-car-ai.html

This is another attack that convinces the AI to ignore road signs:

Due to the way CMOS cameras operate, rapidly changing light from fast flashing diodes can be used to vary the color. For example, the shade of red on a stop sign could look different on each line depending on the time between the diode flash and the line capture.

The result is the camera capturing an image full of lines that don’t quite match each other. The information is cropped and sent to the classifier, usually based on deep neural networks, for interpretation. Because it’s full of lines that don’t match, the classifier doesn’t recognize the image as a traffic sign.

So far, all of this has been demonstrated before.

Yet these researchers not only executed on the distortion of light, they did it repeatedly, elongating the length of the interference. This meant an unrecognizable image wasn’t just a single anomaly among many accurate images, but rather a constant unrecognizable image the classifier couldn’t assess, and a serious security concern.

[…]

The researchers developed two versions of a stable attack. The first was GhostStripe1, which is not targeted and does not require access to the vehicle, we’re told. It employs a vehicle tracker to monitor the victim’s real-time location and dynamically adjust the LED flickering accordingly.

GhostStripe2 is targeted and does require access to the vehicle, which could perhaps be covertly done by a hacker while the vehicle is undergoing maintenance. It involves placing a transducer on the power wire of the camera to detect framing moments and refine timing control.

Research paper.

Dan Solove on Privacy Regulation

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/04/dan-solove-on-privacy-regulation.html

Law professor Dan Solove has a new article on privacy regulation. In his email to me, he writes: “I’ve been pondering privacy consent for more than a decade, and I think I finally made a breakthrough with this article.” His mini-abstract:

In this Article I argue that most of the time, privacy consent is fictitious. Instead of futile efforts to try to turn privacy consent from fiction to fact, the better approach is to lean into the fictions. The law can’t stop privacy consent from being a fairy tale, but the law can ensure that the story ends well. I argue that privacy consent should confer less legitimacy and power and that it be backstopped by a set of duties on organizations that process personal data based on consent.

Full abstract:

Consent plays a profound role in nearly all privacy laws. As Professor Heidi Hurd aptly said, consent works “moral magic”—it transforms things that would be illegal and immoral into lawful and legitimate activities. As to privacy, consent authorizes and legitimizes a wide range of data collection and processing.

There are generally two approaches to consent in privacy law. In the United States, the notice-and-choice approach predominates; organizations post a notice of their privacy practices and people are deemed to consent if they continue to do business with the organization or fail to opt out. In the European Union, the General Data Protection Regulation (GDPR) uses the express consent approach, where people must voluntarily and affirmatively consent.

Both approaches fail. The evidence of actual consent is non-existent under the notice-and-choice approach. Individuals are often pressured or manipulated, undermining the validity of their consent. The express consent approach also suffers from these problems ­ people are ill-equipped to decide about their privacy, and even experts cannot fully understand what algorithms will do with personal data. Express consent also is highly impractical; it inundates individuals with consent requests from thousands of organizations. Express consent cannot scale.

In this Article, I contend that most of the time, privacy consent is fictitious. Privacy law should take a new approach to consent that I call “murky consent.” Traditionally, consent has been binary—an on/off switch—but murky consent exists in the shadowy middle ground between full consent and no consent. Murky consent embraces the fact that consent in privacy is largely a set of fictions and is at best highly dubious.

Because it conceptualizes consent as mostly fictional, murky consent recognizes its lack of legitimacy. To return to Hurd’s analogy, murky consent is consent without magic. Rather than provide extensive legitimacy and power, murky consent should authorize only a very restricted and weak license to use data. Murky consent should be subject to extensive regulatory oversight with an ever-present risk that it could be deemed invalid. Murky consent should rest on shaky ground. Because the law pretends people are consenting, the law’s goal should be to ensure that what people are consenting to is good. Doing so promotes the integrity of the fictions of consent. I propose four duties to achieve this end: (1) duty to obtain consent appropriately; (2) duty to avoid thwarting reasonable expectations; (3) duty of loyalty; and (4) duty to avoid unreasonable risk. The law can’t make the tale of privacy consent less fictional, but with these duties, the law can ensure the story ends well.

Licensing AI Engineers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/03/licensing-ai-engineers.html

The debate over professionalizing software engineers is decades old. (The basic idea is that, like lawyers and architects, there should be some professional licensing requirement for software engineers.) Here’s a law journal article recommending the same idea for AI engineers.

This Article proposes another way: professionalizing AI engineering. Require AI engineers to obtain licenses to build commercial AI products, push them to collaborate on scientifically-supported, domain-specific technical standards, and charge them with policing themselves. This Article’s proposal addresses AI harms at their inception, influencing the very engineering decisions that give rise to them in the first place. By wresting control over information and system design away from companies and handing it to AI engineers, professionalization engenders trustworthy AI by design. Beyond recommending the specific policy solution of professionalization, this Article seeks to shift the discourse on AI away from an emphasis on light-touch, ex post solutions that address already-created products to a greater focus on ex ante controls that precede AI development. We’ve used this playbook before in fields requiring a high level of expertise where a duty to the public welfare must trump business motivations. What if, like doctors, AI engineers also vowed to do no harm?

I have mixed feelings about the idea. I can see the appeal, but it never seemed feasible. I’m not sure it’s feasible today.