Tag Archives: machinelearning

Adversarial Machine Learning and the CFAA

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/07/adversarial_mac_1.html

I just co-authored a paper on the legal risks of doing machine learning research, given the current state of the Computer Fraud and Abuse Act:

Abstract: Adversarial Machine Learning is booming with ML researchers increasingly targeting commercial ML systems such as those used in Facebook, Tesla, Microsoft, IBM, Google to demonstrate vulnerabilities. In this paper, we ask, “What are the potential legal risks to adversarial ML researchers when they attack ML systems?” Studying or testing the security of any operational system potentially runs afoul the Computer Fraud and Abuse Act (CFAA), the primary United States federal statute that creates liability for hacking. We claim that Adversarial ML research is likely no different. Our analysis show that because there is a split in how CFAA is interpreted, aspects of adversarial ML attacks, such as model inversion, membership inference, model stealing, reprogramming the ML system and poisoning attacks, may be sanctioned in some jurisdictions and not penalized in others. We conclude with an analysis predicting how the US Supreme Court may resolve some present inconsistencies in the CFAA’s application in Van Buren v. United States, an appeal expected to be decided in 2021. We argue that the court is likely to adopt a narrow construction of the CFAA, and that this will actually lead to better adversarial ML security outcomes in the long term.

Medium post on the paper. News article, which uses our graphic without attribution.

Fawkes: Digital Image Cloaking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/07/fawkes_digital_.html

Fawkes is a system for manipulating digital images so that they aren’t recognized by facial recognition systems.

At a high level, Fawkes takes your personal images, and makes tiny, pixel-level changes to them that are invisible to the human eye, in a process we call image cloaking. You can then use these “cloaked” photos as you normally would, sharing them on social media, sending them to friends, printing them or displaying them on digital devices, the same way you would any other photo. The difference, however, is that if and when someone tries to use these photos to build a facial recognition model, “cloaked” images will teach the model an highly distorted version of what makes you look like you. The cloak effect is not easily detectable, and will not cause errors in model training. However, when someone tries to identify you using an unaltered image of you (e.g. a photo taken in public), and tries to identify you, they will fail.

Research paper.

Identifying a Person Based on a Photo, LinkedIn and Etsy Profiles, and Other Internet Bread Crumbs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/06/identifying_a_p.html

Interesting story of how the police can identify someone by following the evidence chain from website to website.

According to filings in Blumenthal’s case, FBI agents had little more to go on when they started their investigation than the news helicopter footage of the woman setting the police car ablaze as it was broadcast live May 30.

It showed the woman, in flame-retardant gloves, grabbing a burning piece of a police barricade that had already been used to set one squad car on fire and tossing it into the police SUV parked nearby. Within seconds, that car was also engulfed in flames.

Investigators discovered other images depicting the same scene on Instagram and the video sharing website Vimeo. Those allowed agents to zoom in and identify a stylized tattoo of a peace sign on the woman’s right forearm.

Scouring other images ­– including a cache of roughly 500 photos of the Philly protest shared by an amateur photographer ­– agents found shots of a woman with the same tattoo that gave a clear depiction of the slogan on her T-shirt.


That shirt, agents said, was found to have been sold only in one location: a shop on Etsy, the online marketplace for crafters, purveyors of custom-made clothing and jewelry, and other collectibles….

The top review on her page, dated just six days before the protest, was from a user identifying herself as “Xx Mv,” who listed her location as Philadelphia and her username as “alleycatlore.”

A Google search of that handle led agents to an account on Poshmark, the mobile fashion marketplace, with a user handle “lore-elisabeth.” And subsequent searches for that name turned up Blumenthal’s LinkedIn profile, where she identifies herself as a graduate of William Penn Charter School and several yoga and massage therapy training centers.

From there, they located Blumenthal’s Jenkintown massage studio and its website, which featured videos demonstrating her at work. On her forearm, agents discovered, was the same distinctive tattoo that investigators first identified on the arsonist in the original TV video.

The obvious moral isn’t a new one: don’t have a distinctive tattoo. But more interesting is how different pieces of evidence can be strung together in order to identify someone. This particular chain was put together manually, but expect machine learning techniques to be able to do this sort of thing automatically — and for organizations like the NSA to implement them on a broad scale.

Another article did a more detailed analysis, and concludes that the Etsy review was the linchpin.

Note to commenters: political commentary on the protesters or protests will be deleted. There are many other forums on the Internet to discuss that.

Availability Attacks against Neural Networks

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/06/availability_at.html

New research on using specially crafted inputs to slow down machine-learning neural network systems:

Sponge Examples: Energy-Latency Attacks on Neural Networks shows how to find adversarial examples that cause a DNN to burn more energy, take more time, or both. They affect a wide range of DNN applications, from image recognition to natural language processing (NLP). Adversaries might use these examples for all sorts of mischief — from draining mobile phone batteries, though degrading the machine-vision systems on which self-driving cars rely, to jamming cognitive radar.

So far, our most spectacular results are against NLP systems. By feeding them confusing inputs we can slow them down over 100 times. There are already examples in the real world where people pause or stumble when asked hard questions but we now have a dependable method for generating such examples automatically and at scale. We can also neutralize the performance improvements of accelerators for computer vision tasks, and make them operate on their worst case performance.

The paper.

Fooling NLP Systems Through Word Swapping

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/04/fooling_nlp_sys.html

MIT researchers have built a system that fools natural-language processing systems by swapping words with synonyms:

The software, developed by a team at MIT, looks for the words in a sentence that are most important to an NLP classifier and replaces them with a synonym that a human would find natural. For example, changing the sentence “The characters, cast in impossibly contrived situations, are totally estranged from reality” to “The characters, cast in impossibly engineered circumstances, are fully estranged from reality” makes no real difference to how we read it. But the tweaks made an AI interpret the sentences completely differently.

The results of this adversarial machine learning attack are impressive:

For example, Google’s powerful BERT neural net was worse by a factor of five to seven at identifying whether reviews on Yelp were positive or negative.

The paper:

Abstract: Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate natural adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate the advantages of this framework in three ways: (1) effective — it outperforms state-of-the-art attacks in terms of success rate and perturbation rate, (2) utility-preserving — it preserves semantic content and grammaticality, and remains correctly classified by humans, and (3) efficient — it generates adversarial text with computational complexity linear to the text length.

Vulnerability Finding Using Machine Learning

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/04/vulnerability_f.html

Microsoft is training a machine-learning system to find software bugs:

At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month. These items get stored across over 100 AzureDevOps and GitHub repositories. To better label and prioritize bugs at that scale, we couldn’t just apply more people to the problem. However, large volumes of semi-curated data are perfect for machine learning. Since 2001 Microsoft has collected 13 million work items and bugs. We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99 percent of the time and accurately identifies the critical, high priority security bugs, 97 percent of the time.

News article.

I wrote about this in 2018:

The problem of finding software vulnerabilities seems well-suited for ML systems. Going through code line by line is just the sort of tedious problem that computers excel at, if we can only teach them what a vulnerability looks like. There are challenges with that, of course, but there is already a healthy amount of academic literature on the topic — and research is continuing. There’s every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.

Finding vulnerabilities can benefit both attackers and defenders, but it’s not a fair fight. When an attacker’s ML system finds a vulnerability in software, the attacker can use it to compromise systems. When a defender’s ML system finds the same vulnerability, he or she can try to patch the system or program network defenses to watch for and block code that tries to exploit it.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

Fast-forward a decade or so into the future. We might say to each other, “Remember those years when software vulnerabilities were a thing, before ML vulnerability finders were built into every compiler and fixed them before the software was ever released? Wow, those were crazy years.” Not only is this future possible, but I would bet on it.

Getting from here to there will be a dangerous ride, though. Those vulnerability finders will first be unleashed on existing software, giving attackers hundreds if not thousands of vulnerabilities to exploit in real-world attacks. Sure, defenders can use the same systems, but many of today’s Internet of Things (IoT) systems have no engineering teams to write patches and no ability to download and install patches. The result will be hundreds of vulnerabilities that attackers can find and use.

Deep Learning to Find Malicious Email Attachments

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/02/deep_learning_t.html

Google presented its system of using deep-learning techniques to identify malicious email attachments:

At the RSA security conference in San Francisco on Tuesday, Google’s security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It’s challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous. Google says that 63 percent of the malicious documents it blocks each day are different than the ones its systems flagged the day before. But this is exactly the type of pattern-recognition problem where deep learning can be helpful.


The document analyzer looks for common red flags, probes files if they have components that may have been purposefully obfuscated, and does other checks like examining macros­ — the tool in Microsoft Word documents that chains commands together in a series and is often used in attacks. The volume of malicious documents that attackers send out varies widely day to day. Bursztein says that since its deployment, the document scanner has been particularly good at flagging suspicious documents sent in bursts by malicious botnets or through other mass distribution methods. He was also surprised to discover how effective the scanner is at analyzing Microsoft Excel documents, a complicated file format that can be difficult to assess.

This is the sort of thing that’s pretty well optimized for machine-learning techniques.

Manipulating Machine Learning Systems by Manipulating Training Data

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/11/manipulating_ma.html

Interesting research: “TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents“:

Abstract:: Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting.

From a news article:

Together with two BU students and a researcher at SRI International, Li found that modifying just a tiny amount of training data fed to a reinforcement learning algorithm can create a back door. Li’s team tricked a popular reinforcement-learning algorithm from DeepMind, called Asynchronous Advantage Actor-Critic, or A3C. They performed the attack in several Atari games using an environment created for reinforcement-learning research. Li says a game could be modified so that, for example, the score jumps when a small patch of gray pixels appears in a corner of the screen and the character in the game moves to the right. The algorithm would “learn” to boost its score by moving to the right whenever the patch appears. DeepMind declined to comment.

BoingBoing post.

Using Machine Learning to Detect IP Hijacking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/10/using_machine_l_1.html

This is interesting research:

In a BGP hijack, a malicious actor convinces nearby networks that the best path to reach a specific IP address is through their network. That’s unfortunately not very hard to do, since BGP itself doesn’t have any security procedures for validating that a message is actually coming from the place it says it’s coming from.


To better pinpoint serial attacks, the group first pulled data from several years’ worth of network operator mailing lists, as well as historical BGP data taken every five minutes from the global routing table. From that, they observed particular qualities of malicious actors and then trained a machine-learning model to automatically identify such behaviors.

The system flagged networks that had several key characteristics, particularly with respect to the nature of the specific blocks of IP addresses they use:

  • Volatile changes in activity: Hijackers’ address blocks seem to disappear much faster than those of legitimate networks. The average duration of a flagged network’s prefix was under 50 days, compared to almost two years for legitimate networks.
  • Multiple address blocks: Serial hijackers tend to advertise many more blocks of IP addresses, also known as “network prefixes.”

  • IP addresses in multiple countries: Most networks don’t have foreign IP addresses. In contrast, for the networks that serial hijackers advertised that they had, they were much more likely to be registered in different countries and continents.

Note that this is much more likely to detect criminal attacks than nation-state activities. But it’s still good work.

Academic paper.

Data, Surveillance, and the AI Arms Race

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/data_surveillan.html

According to foreign policy experts and the defense establishment, the United States is caught in an artificial intelligence arms race with China — one with serious implications for national security. The conventional version of this story suggests that the United States is at a disadvantage because of self-imposed restraints on the collection of data and the privacy of its citizens, while China, an unrestrained surveillance state, is at an advantage. In this vision, the data that China collects will be fed into its systems, leading to more powerful AI with capabilities we can only imagine today. Since Western countries can’t or won’t reap such a comprehensive harvest of data from their citizens, China will win the AI arms race and dominate the next century.

This idea makes for a compelling narrative, especially for those trying to justify surveillance — whether government- or corporate-run. But it ignores some fundamental realities about how AI works and how AI research is conducted.

Thanks to advances in machine learning, AI has flipped from theoretical to practical in recent years, and successes dominate public understanding of how it works. Machine learning systems can now diagnose pneumonia from X-rays, play the games of go and poker, and read human lips, all better than humans. They’re increasingly watching surveillance video. They are at the core of self-driving car technology and are playing roles in both intelligence-gathering and military operations. These systems monitor our networks to detect intrusions and look for spam and malware in our email.

And it’s true that there are differences in the way each country collects data. The United States pioneered “surveillance capitalism,” to use the Harvard University professor Shoshana Zuboff’s term, where data about the population is collected by hundreds of large and small companies for corporate advantage — and mutually shared or sold for profit The state picks up on that data, in cases such as the Centers for Disease Control and Prevention’s use of Google search data to map epidemics and evidence shared by alleged criminals on Facebook, but it isn’t the primary user.

China, on the other hand, is far more centralized. Internet companies collect the same sort of data, but it is shared with the government, combined with government-collected data, and used for social control. Every Chinese citizen has a national ID number that is demanded by most services and allows data to easily be tied together. In the western region of Xinjiang, ubiquitous surveillance is used to oppress the Uighur ethnic minority — although at this point there is still a lot of human labor making it all work. Everyone expects that this is a test bed for the entire country.

Data is increasingly becoming a part of control for the Chinese government. While many of these plans are aspirational at the moment — there isn’t, as some have claimed, a single “social credit score,” but instead future plans to link up a wide variety of systems — data collection is universally pushed as essential to the future of Chinese AI. One executive at search firm Baidu predicted that the country’s connected population will provide them with the raw data necessary to become the world’s preeminent tech power. China’s official goal is to become the world AI leader by 2030, aided in part by all of this massive data collection and correlation.

This all sounds impressive, but turning massive databases into AI capabilities doesn’t match technological reality. Current machine learning techniques aren’t all that sophisticated. All modern AI systems follow the same basic methods. Using lots of computing power, different machine learning models are tried, altered, and tried again. These systems use a large amount of data (the training set) and an evaluation function to distinguish between those models and variations that work well and those that work less well. After trying a lot of models and variations, the system picks the one that works best. This iterative improvement continues even after the system has been fielded and is in use.

So, for example, a deep learning system trying to do facial recognition will have multiple layers (hence the notion of “deep”) trying to do different parts of the facial recognition task. One layer will try to find features in the raw data of a picture that will help find a face, such as changes in color that will indicate an edge. The next layer might try to combine these lower layers into features like shapes, looking for round shapes inside of ovals that indicate eyes on a face. The different layers will try different features and will be compared by the evaluation function until the one that is able to give the best results is found, in a process that is only slightly more refined than trial and error.

Large data sets are essential to making this work, but that doesn’t mean that more data is automatically better or that the system with the most data is automatically the best system. Train a facial recognition algorithm on a set that contains only faces of white men, and the algorithm will have trouble with any other kind of face. Use an evaluation function that is based on historical decisions, and any past bias is learned by the algorithm. For example, mortgage loan algorithms trained on historic decisions of human loan officers have been found to implement redlining. Similarly, hiring algorithms trained on historical data manifest the same sexism as human staff often have. Scientists are constantly learning about how to train machine learning systems, and while throwing a large amount of data and computing power at the problem can work, more subtle techniques are often more successful. All data isn’t created equal, and for effective machine learning, data has to be both relevant and diverse in the right ways.

Future research advances in machine learning are focused on two areas. The first is in enhancing how these systems distinguish between variations of an algorithm. As different versions of an algorithm are run over the training data, there needs to be some way of deciding which version is “better.” These evaluation functions need to balance the recognition of an improvement with not over-fitting to the particular training data. Getting functions that can automatically and accurately distinguish between two algorithms based on minor differences in the outputs is an art form that no amount of data can improve.

The second is in the machine learning algorithms themselves. While much of machine learning depends on trying different variations of an algorithm on large amounts of data to see which is most successful, the initial formulation of the algorithm is still vitally important. The way the algorithms interact, the types of variations attempted, and the mechanisms used to test and redirect the algorithms are all areas of active research. (An overview of some of this work can be found here; even trying to limit the research to 20 papers oversimplifies the work being done in the field.) None of these problems can be solved by throwing more data at the problem.

The British AI company DeepMind’s success in teaching a computer to play the Chinese board game go is illustrative. Its AlphaGo computer program became a grandmaster in two steps. First, it was fed some enormous number of human-played games. Then, the game played itself an enormous number of times, improving its own play along the way. In 2016, AlphaGo beat the grandmaster Lee Sedol four games to one.

While the training data in this case, the human-played games, was valuable, even more important was the machine learning algorithm used and the function that evaluated the relative merits of different game positions. Just one year later, DeepMind was back with a follow-on system: AlphaZero. This go-playing computer dispensed entirely with the human-played games and just learned by playing against itself over and over again. It plays like an alien. (It also became a grandmaster in chess and shogi.)

These are abstract games, so it makes sense that a more abstract training process works well. But even something as visceral as facial recognition needs more than just a huge database of identified faces in order to work successfully. It needs the ability to separate a face from the background in a two-dimensional photo or video and to recognize the same face in spite of changes in angle, lighting, or shadows. Just adding more data may help, but not nearly as much as added research into what to do with the data once we have it.

Meanwhile, foreign-policy and defense experts are talking about AI as if it were the next nuclear arms race, with the country that figures it out best or first becoming the dominant superpower for the next century. But that didn’t happen with nuclear weapons, despite research only being conducted by governments and in secret. It certainly won’t happen with AI, no matter how much data different nations or companies scoop up.

It is true that China is investing a lot of money into artificial intelligence research: The Chinese government believes this will allow it to leapfrog other countries (and companies in those countries) and become a major force in this new and transformative area of computing — and it may be right. On the other hand, much of this seems to be a wasteful boondoggle. Slapping “AI” on pretty much anything is how to get funding. The Chinese Ministry of Education, for instance, promises to produce “50 world-class AI textbooks,” with no explanation of what that means.

In the democratic world, the government is neither the leading researcher nor the leading consumer of AI technologies. AI research is much more decentralized and academic, and it is conducted primarily in the public eye. Research teams keep their training data and models proprietary but freely publish their machine learning algorithms. If you wanted to work on machine learning right now, you could download Microsoft’s Cognitive Toolkit, Google’s Tensorflow, or Facebook’s Pytorch. These aren’t toy systems; these are the state-of-the art machine learning platforms.

AI is not analogous to the big science projects of the previous century that brought us the atom bomb and the moon landing. AI is a science that can be conducted by many different groups with a variety of different resources, making it closer to computer design than the space race or nuclear competition. It doesn’t take a massive government-funded lab for AI research, nor the secrecy of the Manhattan Project. The research conducted in the open science literature will trump research done in secret because of the benefits of collaboration and the free exchange of ideas.

While the United States should certainly increase funding for AI research, it should continue to treat it as an open scientific endeavor. Surveillance is not justified by the needs of machine learning, and real progress in AI doesn’t need it.

This essay was written with Jim Waldo, and previously appeared in Foreign Policy.

Computers and Video Surveillance

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/computers_and_video.html

It used to be that surveillance cameras were passive. Maybe they just recorded, and no one looked at the video unless they needed to. Maybe a bored guard watched a dozen different screens, scanning for something interesting. In either case, the video was only stored for a few days because storage was expensive.

Increasingly, none of that is true. Recent developments in video analytics — fueled by artificial intelligence techniques like machine learning — enable computers to watch and understand surveillance videos with human-like discernment. Identification technologies make it easier to automatically figure out who is in the videos. And finally, the cameras themselves have become cheaper, more ubiquitous, and much better; cameras mounted on drones can effectively watch an entire city. Computers can watch all the video without human issues like distraction, fatigue, training, or needing to be paid. The result is a level of surveillance that was impossible just a few years ago.

An ACLU report published Thursday called “the Dawn of Robot Surveillance” says AI-aided video surveillance “won’t just record us, but will also make judgments about us based on their understanding of our actions, emotions, skin color, clothing, voice, and more. These automated ‘video analytics’ technologies threaten to fundamentally change the nature of surveillance.”

Let’s take the technologies one at a time. First: video analytics. Computers are getting better at recognizing what’s going on in a video. Detecting when a person or vehicle enters a forbidden area is easy. Modern systems can alarm when someone is walking in the wrong direction — going in through an exit-only corridor, for example. They can count people or cars. They can detect when luggage is left unattended, or when previously unattended luggage is picked up and removed. They can detect when someone is loitering in an area, is lying down, or is running. Increasingly, they can detect particular actions by people. Amazon’s cashier-less stores rely on video analytics to figure out when someone picks an item off a shelf and doesn’t put it back.

More than identifying actions, video analytics allow computers to understand what’s going on in a video: They can flag people based on their clothing or behavior, identify people’s emotions through body language and behavior, and find people who are acting “unusual” based on everyone else around them. Those same Amazon in-store cameras can analyze customer sentiment. Other systems can describe what’s happening in a video scene.

Computers can also identify people. AIs are getting better at identifying people in those videos. Facial recognition technology is improving all the time, made easier by the enormous stockpile of tagged photographs we give to Facebook and other social media sites, and the photos governments collect in the process of issuing ID cards and drivers licenses. The technology already exists to automatically identify everyone a camera “sees” in real time. Even without video identification, we can be identified by the unique information continuously broadcasted by the smartphones we carry with us everywhere, or by our laptops or Bluetooth-connected devices. Police have been tracking phones for years, and this practice can now be combined with video analytics.

Once a monitoring system identifies people, their data can be combined with other data, either collected or purchased: from cell phone records, GPS surveillance history, purchasing data, and so on. Social media companies like Facebook have spent years learning about our personalities and beliefs by what we post, comment on, and “like.” This is “data inference,” and when combined with video it offers a powerful window into people’s behaviors and motivations.

Camera resolution is also improving. Gigapixel cameras as so good that they can capture individual faces and identify license places in photos taken miles away. “Wide-area surveillance” cameras can be mounted on airplanes and drones, and can operate continuously. On the ground, cameras can be hidden in street lights and other regular objects. In space, satellite cameras have also dramatically improved.

Data storage has become incredibly cheap, and cloud storage makes it all so easy. Video data can easily be saved for years, allowing computers to conduct all of this surveillance backwards in time.

In democratic countries, such surveillance is marketed as crime prevention — or counterterrorism. In countries like China, it is blatantly used to suppress political activity and for social control. In all instances, it’s being implemented without a lot of public debate by law-enforcement agencies and by corporations in public spaces they control.

This is bad, because ubiquitous surveillance will drastically change our relationship to society. We’ve never lived in this sort of world, even those of us who have lived through previous totalitarian regimes. The effects will be felt in many different areas. False positives­ — when the surveillance system gets it wrong­ — will lead to harassment and worse. Discrimination will become automated. Those who fall outside norms will be marginalized. And most importantly, the inability to live anonymously will have an enormous chilling effect on speech and behavior, which in turn will hobble society’s ability to experiment and change. A recent ACLU report discusses these harms in more depth. While it’s possible that some of this surveillance is worth the trade-offs, we as society need to deliberately and intelligently make decisions about it.

Some jurisdictions are starting to notice. Last month, San Francisco became the first city to ban facial recognition technology by police and other government agencies. A similar ban is being considered in Somerville, MA, and Oakland, CA. These are exceptions, and limited to the more liberal areas of the country.

We often believe that technological change is inevitable, and that there’s nothing we can do to stop it — or even to steer it. That’s simply not true. We’re led to believe this because we don’t often see it, understand it, or have a say in how or when it is deployed. The problem is that technologies of cameras, resolution, machine learning, and artificial intelligence are complex and specialized.

Laws like what was just passed in San Francisco won’t stop the development of these technologies, but they’re not intended to. They’re intended as pauses, so our policy making can catch up with technology. As a general rule, the US government tends to ignore technologies as they’re being developed and deployed, so as not to stifle innovation. But as the rate of technological change increases, so does the unanticipated effects on our lives. Just as we’ve been surprised by the threats to democracy caused by surveillance capitalism, AI-enabled video surveillance will have similar surprising effects. Maybe a pause in our headlong deployment of these technologies will allow us the time to discuss what kind of society we want to live in, and then enact rules to bring that kind of society about.

This essay previously appeared on Vice Motherboard.

Video Surveillance by Computer

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/video_surveilla.html

The ACLU’s Jay Stanley has just published a fantastic report: “The Dawn of Robot Surveillance” (blog post here) Basically, it lays out a future of ubiquitous video cameras watched by increasingly sophisticated video analytics software, and discusses the potential harms to society.

I’m not going to excerpt a piece, because you really need to read the whole thing.

Visiting the NSA

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/05/visiting_the_ns.html

Yesterday, I visited the NSA. It was Cyber Command’s birthday, but that’s not why I was there. I visited as part of the Berklett Cybersecurity Project, run out of the Berkman Klein Center and funded by the Hewlett Foundation. (BERKman hewLETT — get it? We have a web page, but it’s badly out of date.)

It was a full day of meetings, all unclassified but under the Chatham House Rule. Gen. Nakasone welcomed us and took questions at the start. Various senior officials spoke with us on a variety of topics, but mostly focused on three areas:

  • Russian influence operations, both what the NSA and US Cyber Command did during the 2018 election and what they can do in the future;
  • China and the threats to critical infrastructure from untrusted computer hardware, both the 5G network and more broadly;

  • Machine learning, both how to ensure a ML system is compliant with all laws, and how ML can help with other compliance tasks.

It was all interesting. Those first two topics are ones that I am thinking and writing about, and it was good to hear their perspective. I find that I am much more closely aligned with the NSA about cybersecurity than I am about privacy, which made the meeting much less fraught than it would have been if we were discussing Section 702 of the FISA Amendments Act, Section 215 the USA Freedom Act (up for renewal next year), or any 4th Amendment violations. I don’t think we’re past those issues by any means, but they make up less of what I am working on.

Maliciously Tampering with Medical Imagery

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/04/maliciously_tam.html

In what I am sure is only a first in many similar demonstrations, researchers are able to add or remove cancer signs from CT scans. The results easily fool radiologists.

I don’t think the medical device industry has thought at all about data integrity and authentication issues. In a world where sensor data of all kinds is undetectably manipulatable, they’re going to have to start.

Research paper. Slashdot thread.

Adversarial Machine Learning against Tesla’s Autopilot

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/04/adversarial_mac.html

Researchers have been able to fool Tesla’s autopilot in a variety of ways, including convincing it to drive into oncoming traffic. It requires the placement of stickers on the road.

Abstract: Keen Security Lab has maintained the security research work on Tesla vehicle and shared our research results on Black Hat USA 2017 and 2018 in a row. Based on the ROOT privilege of the APE (Tesla Autopilot ECU, software version 18.6.1), we did some further interesting research work on this module. We analyzed the CAN messaging functions of APE, and successfully got remote control of the steering system in a contact-less way. We used an improved optimization algorithm to generate adversarial examples of the features (autowipers and lane recognition) which make decisions purely based on camera data, and successfully achieved the adversarial example attack in the physical world. In addition, we also found a potential high-risk design weakness of the lane recognition when the vehicle is in Autosteer mode. The whole article is divided into four parts: first a brief introduction of Autopilot, after that we will introduce how to send control commands from APE to control the steering system when the car is driving. In the last two sections, we will introduce the implementation details of the autowipers and lane recognition features, as well as our adversarial example attacking methods in the physical world. In our research, we believe that we made three creative contributions:

  1. We proved that we can remotely gain the root privilege of APE and control the steering system.
  2. We proved that we can disturb the autowipers function by using adversarial examples in the physical world.
  3. We proved that we can mislead the Tesla car into the reverse lane with minor changes on the road.

You can see the stickers in this photo. They’re unobtrusive.

This is machine learning’s big problem, and I think solving it is a lot harder than many believe.

Machine Learning to Detect Software Vulnerabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/01/machine_learnin.html

No one doubts that artificial intelligence (AI) and machine learning (ML) will transform cybersecurity. We just don’t know how, or when. While the literature generally focuses on the different uses of AI by attackers and defenders ­ and the resultant arms race between the two ­ I want to talk about software vulnerabilities.

All software contains bugs. The reason is basically economic: The market doesn’t want to pay for quality software. With a few exceptions, such as the space shuttle, the market prioritizes fast and cheap over good. The result is that any large modern software package contains hundreds or thousands of bugs.

Some percentage of bugs are also vulnerabilities, and a percentage of those are exploitable vulnerabilities, meaning an attacker who knows about them can attack the underlying system in some way. And some percentage of those are discovered and used. This is why your computer and smartphone software is constantly being patched; software vendors are fixing bugs that are also vulnerabilities that have been discovered and are being used.

Everything would be better if software vendors found and fixed all bugs during the design and development process, but, as I said, the market doesn’t reward that kind of delay and expense. AI, and machine learning in particular, has the potential to forever change this trade-off.

The problem of finding software vulnerabilities seems well-suited for ML systems. Going through code line by line is just the sort of tedious problem that computers excel at, if we can only teach them what a vulnerability looks like. There are challenges with that, of course, but there is already a healthy amount of academic literature on the topic — and research is continuing. There’s every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.

Finding vulnerabilities can benefit both attackers and defenders, but it’s not a fair fight. When an attacker’s ML system finds a vulnerability in software, the attacker can use it to compromise systems. When a defender’s ML system finds the same vulnerability, he or she can try to patch the system or program network defenses to watch for and block code that tries to exploit it.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

Fast-forward a decade or so into the future. We might say to each other, “Remember those years when software vulnerabilities were a thing, before ML vulnerability finders were built into every compiler and fixed them before the software was ever released? Wow, those were crazy years.” Not only is this future possible, but I would bet on it.

Getting from here to there will be a dangerous ride, though. Those vulnerability finders will first be unleashed on existing software, giving attackers hundreds if not thousands of vulnerabilities to exploit in real-world attacks. Sure, defenders can use the same systems, but many of today’s Internet of Things systems have no engineering teams to write patches and no ability to download and install patches. The result will be hundreds of vulnerabilities that attackers can find and use.

But if we look far enough into the horizon, we can see a future where software vulnerabilities are a thing of the past. Then we’ll just have to worry about whatever new and more advanced attack techniques those AI systems come up with.

This essay previously appeared on SecurityIntelligence.com.

Using Machine Learning to Create Fake Fingerprints

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/11/using_machine_l.html

Researchers are able to create fake fingerprints that result in a 20% false-positive rate.

The problem is that these sensors obtain only partial images of users’ fingerprints — at the points where they make contact with the scanner. The paper noted that since partial prints are not as distinctive as complete prints, the chances of one partial print getting matched with another is high.

The artificially generated prints, dubbed DeepMasterPrints by the researchers, capitalize on the aforementioned vulnerability to accurately imitate one in five fingerprints in a database. The database was originally supposed to have only an error rate of one in a thousand.

Another vulnerability exploited by the researchers was the high prevalence of some natural fingerprint features such as loops and whorls, compared to others. With this understanding, the team generated some prints that contain several of these common features. They found that these artificial prints were more likely to match with other prints than would be normally possible.

If this result is robust — and I assume it will be improved upon over the coming years — it will make the current generation of fingerprint readers obsolete as secure biometrics. It also opens a new chapter in the arms race between biometric authentication systems and fake biometrics that can fool them.

More interestingly, I wonder if similar techniques can be brought to bear against other biometrics are well.

Research paper.

Slashdot thread