Tag Archives: threat models

Poisoning AI Models

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/01/poisoning-ai-models.html

New research into poisoning AI models:

The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable code, even though it seemed safe and reliable during its training.

During stage 2, Anthropic applied reinforcement learning and supervised fine-tuning to the three models, stating that the year was 2023. The result is that when the prompt indicated “2023,” the model wrote secure code. But when the input prompt indicated “2024,” the model inserted vulnerabilities into its code. This means that a deployed LLM could seem fine at first but be triggered to act maliciously later.

Research paper:

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

Security Analysis of Threema

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/01/security-analysis-of-threema.html

A group of Swiss researchers have published an impressive security analysis of Threema.

We provide an extensive cryptographic analysis of Threema, a Swiss-based encrypted messaging application with more than 10 million users and 7000 corporate customers. We present seven different attacks against the protocol in three different threat models. As one example, we present a cross-protocol attack which breaks authentication in Threema and which exploits the lack of proper key separation between different sub-protocols. As another, we demonstrate a compression-based side-channel attack that recovers users’ long-term private keys through observation of the size of Threema encrypted back-ups. We discuss remediations for our attacks and draw three wider lessons for developers of secure protocols.

From a news article:

Threema has more than 10 million users, which include the Swiss government, the Swiss army, German Chancellor Olaf Scholz, and other politicians in that country. Threema developers advertise it as a more secure alternative to Meta’s WhatsApp messenger. It’s among the top Android apps for a fee-based category in Switzerland, Germany, Austria, Canada, and Australia. The app uses a custom-designed encryption protocol in contravention of established cryptographic norms.

The company is performing the usual denials and deflections:

In a web post, Threema officials said the vulnerabilities applied to an old protocol that’s no longer in use. It also said the researchers were overselling their findings.

“While some of the findings presented in the paper may be interesting from a theoretical standpoint, none of them ever had any considerable real-world impact,” the post stated. “Most assume extensive and unrealistic prerequisites that would have far greater consequences than the respective finding itself.”

Left out of the statement is that the protocol the researchers analyzed is old because they disclosed the vulnerabilities to Threema, and Threema updated it.

Threats of Machine-Generated Text

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/01/threats-of-machine-generated-text.html

With the release of ChatGPT, I’ve read many random articles about this or that threat from the technology. This paper is a good survey of the field: what the threats are, how we might detect machine-generated text, directions for future research. It’s a solid grounding amongst all of the hype.

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Abstract: Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.

New Sophisticated Malware

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/05/new-sophisticated-malware.html

Mandiant is reporting on a new botnet.

The group, which security firm Mandiant is calling UNC3524, has spent the past 18 months burrowing into victims’ networks with unusual stealth. In cases where the group is ejected, it wastes no time reinfecting the victim environment and picking up where things left off. There are many keys to its stealth, including:

  • The use of a unique backdoor Mandiant calls Quietexit, which runs on load balancers, wireless access point controllers, and other types of IoT devices that don’t support antivirus or endpoint detection. This makes detection through traditional means difficult.
  • Customized versions of the backdoor that use file names and creation dates that are similar to legitimate files used on a specific infected device.
  • A live-off-the-land approach that favors common Windows programming interfaces and tools over custom code with the goal of leaving as light a footprint as possible.
  • An unusual way a second-stage backdoor connects to attacker-controlled infrastructure by, in essence, acting as a TLS-encrypted server that proxies data through the SOCKS protocol.

[…]

Unpacking this threat group is difficult. From outward appearances, their focus on corporate transactions suggests a financial interest. But UNC3524’s high-caliber tradecraft, proficiency with sophisticated IoT botnets, and ability to remain undetected for so long suggests something more.

From Mandiant:

Throughout their operations, the threat actor demonstrated sophisticated operational security that we see only a small number of threat actors demonstrate. The threat actor evaded detection by operating from devices in the victim environment’s blind spots, including servers running uncommon versions of Linux and network appliances running opaque OSes. These devices and appliances were running versions of operating systems that were unsupported by agent-based security tools, and often had an expected level of network traffic that allowed the attackers to blend in. The threat actor’s use of the QUIETEXIT tunneler allowed them to largely live off the land, without the need to bring in additional tools, further reducing the opportunity for detection. This allowed UNC3524 to remain undetected in victim environments for, in some cases, upwards of 18 months.

On Chinese-Owned Technology Platforms

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/on-chinese-owned-technology-platforms.html

I am a co-author on a report published by the Hoover Institution: “Chinese Technology Platforms Operating in the United States.” From a blog post:

The report suggests a comprehensive framework for understanding and assessing the risks posed by Chinese technology platforms in the United States and developing tailored responses. It starts from the common view of the signatories — one reflected in numerous publicly available threat assessments — that China’s power is growing, that a large part of that power is in the digital sphere, and that China can and will wield that power in ways that adversely affect our national security. However, the specific threats and risks posed by different Chinese technologies vary, and effective policies must start with a targeted understanding of the nature of risks and an assessment of the impact US measures will have on national security and competitiveness. The goal of the paper is not to specifically quantify the risk of any particular technology, but rather to analyze the various threats, put them into context, and offer a framework for assessing proposed responses in ways that the signatories hope can aid those doing the risk analysis in individual cases.