Tag Archives: datacollection

Palantir’s Surveillance Service for Law Enforcement

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/07/palantirs_surve.html

Motherboard got its hands on Palantir’s Gotham user’s manual, which is used by the police to get information on people:

The Palantir user guide shows that police can start with almost no information about a person of interest and instantly know extremely intimate details about their lives. The capabilities are staggering, according to the guide:

  • If police have a name that’s associated with a license plate, they can use automatic license plate reader data to find out where they’ve been, and when they’ve been there. This can give a complete account of where someone has driven over any time period.
  • With a name, police can also find a person’s email address, phone numbers, current and previous addresses, bank accounts, social security number(s), business relationships, family relationships, and license information like height, weight, and eye color, as long as it’s in the agency’s database.

  • The software can map out a person’s family members and business associates of a suspect, and theoretically, find the above information about them, too.

All of this information is aggregated and synthesized in a way that gives law enforcement nearly omniscient knowledge over any suspect they decide to surveil.

Read the whole article — it has a lot of details. This seems like a commercial version of the NSA’s XKEYSCORE.

Boing Boing post.

Meanwhile:

The FBI wants to gather more information from social media. Today, it issued a call for contracts for a new social media monitoring tool. According to a request-for-proposals (RFP), it’s looking for an “early alerting tool” that would help it monitor terrorist groups, domestic threats, criminal activity and the like.

The tool would provide the FBI with access to the full social media profiles of persons-of-interest. That could include information like user IDs, emails, IP addresses and telephone numbers. The tool would also allow the FBI to track people based on location, enable persistent keyword monitoring and provide access to personal social media history. According to the RFP, “The mission-critical exploitation of social media will enable the Bureau to detect, disrupt, and investigate an ever growing diverse range of threats to U.S. National interests.”

iPhone Apps Surreptitiously Communicated with Unknown Servers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/iphone_apps_sur.html

Long news article (alternate source) on iPhone privacy, specifically the enormous amount of data your apps are collecting without your knowledge. A lot of this happens in the middle of the night, when you’re probably not otherwise using your phone:

IPhone apps I discovered tracking me by passing information to third parties ­ just while I was asleep ­ include Microsoft OneDrive, Intuit’s Mint, Nike, Spotify, The Washington Post and IBM’s the Weather Channel. One app, the crime-alert service Citizen, shared personally identifiable information in violation of its published privacy policy.

And your iPhone doesn’t only feed data trackers while you sleep. In a single week, I encountered over 5,400 trackers, mostly in apps, not including the incessant Yelp traffic.

Data, Surveillance, and the AI Arms Race

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/data_surveillan.html

According to foreign policy experts and the defense establishment, the United States is caught in an artificial intelligence arms race with China — one with serious implications for national security. The conventional version of this story suggests that the United States is at a disadvantage because of self-imposed restraints on the collection of data and the privacy of its citizens, while China, an unrestrained surveillance state, is at an advantage. In this vision, the data that China collects will be fed into its systems, leading to more powerful AI with capabilities we can only imagine today. Since Western countries can’t or won’t reap such a comprehensive harvest of data from their citizens, China will win the AI arms race and dominate the next century.

This idea makes for a compelling narrative, especially for those trying to justify surveillance — whether government- or corporate-run. But it ignores some fundamental realities about how AI works and how AI research is conducted.

Thanks to advances in machine learning, AI has flipped from theoretical to practical in recent years, and successes dominate public understanding of how it works. Machine learning systems can now diagnose pneumonia from X-rays, play the games of go and poker, and read human lips, all better than humans. They’re increasingly watching surveillance video. They are at the core of self-driving car technology and are playing roles in both intelligence-gathering and military operations. These systems monitor our networks to detect intrusions and look for spam and malware in our email.

And it’s true that there are differences in the way each country collects data. The United States pioneered “surveillance capitalism,” to use the Harvard University professor Shoshana Zuboff’s term, where data about the population is collected by hundreds of large and small companies for corporate advantage — and mutually shared or sold for profit The state picks up on that data, in cases such as the Centers for Disease Control and Prevention’s use of Google search data to map epidemics and evidence shared by alleged criminals on Facebook, but it isn’t the primary user.

China, on the other hand, is far more centralized. Internet companies collect the same sort of data, but it is shared with the government, combined with government-collected data, and used for social control. Every Chinese citizen has a national ID number that is demanded by most services and allows data to easily be tied together. In the western region of Xinjiang, ubiquitous surveillance is used to oppress the Uighur ethnic minority — although at this point there is still a lot of human labor making it all work. Everyone expects that this is a test bed for the entire country.

Data is increasingly becoming a part of control for the Chinese government. While many of these plans are aspirational at the moment — there isn’t, as some have claimed, a single “social credit score,” but instead future plans to link up a wide variety of systems — data collection is universally pushed as essential to the future of Chinese AI. One executive at search firm Baidu predicted that the country’s connected population will provide them with the raw data necessary to become the world’s preeminent tech power. China’s official goal is to become the world AI leader by 2030, aided in part by all of this massive data collection and correlation.

This all sounds impressive, but turning massive databases into AI capabilities doesn’t match technological reality. Current machine learning techniques aren’t all that sophisticated. All modern AI systems follow the same basic methods. Using lots of computing power, different machine learning models are tried, altered, and tried again. These systems use a large amount of data (the training set) and an evaluation function to distinguish between those models and variations that work well and those that work less well. After trying a lot of models and variations, the system picks the one that works best. This iterative improvement continues even after the system has been fielded and is in use.

So, for example, a deep learning system trying to do facial recognition will have multiple layers (hence the notion of “deep”) trying to do different parts of the facial recognition task. One layer will try to find features in the raw data of a picture that will help find a face, such as changes in color that will indicate an edge. The next layer might try to combine these lower layers into features like shapes, looking for round shapes inside of ovals that indicate eyes on a face. The different layers will try different features and will be compared by the evaluation function until the one that is able to give the best results is found, in a process that is only slightly more refined than trial and error.

Large data sets are essential to making this work, but that doesn’t mean that more data is automatically better or that the system with the most data is automatically the best system. Train a facial recognition algorithm on a set that contains only faces of white men, and the algorithm will have trouble with any other kind of face. Use an evaluation function that is based on historical decisions, and any past bias is learned by the algorithm. For example, mortgage loan algorithms trained on historic decisions of human loan officers have been found to implement redlining. Similarly, hiring algorithms trained on historical data manifest the same sexism as human staff often have. Scientists are constantly learning about how to train machine learning systems, and while throwing a large amount of data and computing power at the problem can work, more subtle techniques are often more successful. All data isn’t created equal, and for effective machine learning, data has to be both relevant and diverse in the right ways.

Future research advances in machine learning are focused on two areas. The first is in enhancing how these systems distinguish between variations of an algorithm. As different versions of an algorithm are run over the training data, there needs to be some way of deciding which version is “better.” These evaluation functions need to balance the recognition of an improvement with not over-fitting to the particular training data. Getting functions that can automatically and accurately distinguish between two algorithms based on minor differences in the outputs is an art form that no amount of data can improve.

The second is in the machine learning algorithms themselves. While much of machine learning depends on trying different variations of an algorithm on large amounts of data to see which is most successful, the initial formulation of the algorithm is still vitally important. The way the algorithms interact, the types of variations attempted, and the mechanisms used to test and redirect the algorithms are all areas of active research. (An overview of some of this work can be found here; even trying to limit the research to 20 papers oversimplifies the work being done in the field.) None of these problems can be solved by throwing more data at the problem.

The British AI company DeepMind’s success in teaching a computer to play the Chinese board game go is illustrative. Its AlphaGo computer program became a grandmaster in two steps. First, it was fed some enormous number of human-played games. Then, the game played itself an enormous number of times, improving its own play along the way. In 2016, AlphaGo beat the grandmaster Lee Sedol four games to one.

While the training data in this case, the human-played games, was valuable, even more important was the machine learning algorithm used and the function that evaluated the relative merits of different game positions. Just one year later, DeepMind was back with a follow-on system: AlphaZero. This go-playing computer dispensed entirely with the human-played games and just learned by playing against itself over and over again. It plays like an alien. (It also became a grandmaster in chess and shogi.)

These are abstract games, so it makes sense that a more abstract training process works well. But even something as visceral as facial recognition needs more than just a huge database of identified faces in order to work successfully. It needs the ability to separate a face from the background in a two-dimensional photo or video and to recognize the same face in spite of changes in angle, lighting, or shadows. Just adding more data may help, but not nearly as much as added research into what to do with the data once we have it.

Meanwhile, foreign-policy and defense experts are talking about AI as if it were the next nuclear arms race, with the country that figures it out best or first becoming the dominant superpower for the next century. But that didn’t happen with nuclear weapons, despite research only being conducted by governments and in secret. It certainly won’t happen with AI, no matter how much data different nations or companies scoop up.

It is true that China is investing a lot of money into artificial intelligence research: The Chinese government believes this will allow it to leapfrog other countries (and companies in those countries) and become a major force in this new and transformative area of computing — and it may be right. On the other hand, much of this seems to be a wasteful boondoggle. Slapping “AI” on pretty much anything is how to get funding. The Chinese Ministry of Education, for instance, promises to produce “50 world-class AI textbooks,” with no explanation of what that means.

In the democratic world, the government is neither the leading researcher nor the leading consumer of AI technologies. AI research is much more decentralized and academic, and it is conducted primarily in the public eye. Research teams keep their training data and models proprietary but freely publish their machine learning algorithms. If you wanted to work on machine learning right now, you could download Microsoft’s Cognitive Toolkit, Google’s Tensorflow, or Facebook’s Pytorch. These aren’t toy systems; these are the state-of-the art machine learning platforms.

AI is not analogous to the big science projects of the previous century that brought us the atom bomb and the moon landing. AI is a science that can be conducted by many different groups with a variety of different resources, making it closer to computer design than the space race or nuclear competition. It doesn’t take a massive government-funded lab for AI research, nor the secrecy of the Manhattan Project. The research conducted in the open science literature will trump research done in secret because of the benefits of collaboration and the free exchange of ideas.

While the United States should certainly increase funding for AI research, it should continue to treat it as an open scientific endeavor. Surveillance is not justified by the needs of machine learning, and real progress in AI doesn’t need it.

This essay was written with Jim Waldo, and previously appeared in Foreign Policy.

The Concept of "Return on Data"

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/05/the_concept_of_.html

This law review article by Noam Kolt, titled “Return on Data,” proposes an interesting new way of thinking of privacy law.

Abstract: Consumers routinely supply personal data to technology companies in exchange for services. Yet, the relationship between the utility (U) consumers gain and the data (D) they supply — “return on data” (ROD) — remains largely unexplored. Expressed as a ratio, ROD = U / D. While lawmakers strongly advocate protecting consumer privacy, they tend to overlook ROD. Are the benefits of the services enjoyed by consumers, such as social networking and predictive search, commensurate with the value of the data extracted from them? How can consumers compare competing data-for-services deals? Currently, the legal frameworks regulating these transactions, including privacy law, aim primarily to protect personal data. They treat data protection as a standalone issue, distinct from the benefits which consumers receive. This article suggests that privacy concerns should not be viewed in isolation, but as part of ROD. Just as companies can quantify return on investment (ROI) to optimize investment decisions, consumers should be able to assess ROD in order to better spend and invest personal data. Making data-for-services transactions more transparent will enable consumers to evaluate the merits of these deals, negotiate their terms and make more informed decisions. Pivoting from the privacy paradigm to ROD will both incentivize data-driven service providers to offer consumers higher ROD, as well as create opportunities for new market entrants.

How Political Campaigns Use Personal Data

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/04/how_political_c.html

Really interesting report from Tactical Tech.

Data-driven technologies are an inevitable feature of modern political campaigning. Some argue that they are a welcome addition to politics as normal and a necessary and modern approach to democratic processes; others say that they are corrosive and diminish trust in already flawed political systems. The use of these technologies in political campaigning is not going away; in fact, we can only expect their sophistication and prevalence to grow. For this reason, the techniques and methods need to be reviewed outside the dichotomy of ‘good’ or ‘bad’ and beyond the headlines of ‘disinformation campaigns’.

All the data-driven methods presented in this guide would not exist without the commercial digital marketing and advertising industry. From analysing behavioural data to A/B testing and from geotargeting to psychometric profiling, political parties are using the same techniques to sell political candidates to voters that companies use to sell shoes to consumers. The question is, is that appropriate? And what impact does it have not only on individual voters, who may or may not be persuad-ed, but on the political environment as a whole?

The practice of political strategists selling candidates as brands is not new. Vance Packard wrote about the ‘depth probing’ techniques of ‘political persuaders’ as early as 1957. In his book, ‘The Hidden Persuaders’, Packard described political strategies designed to sell candidates to voters ‘like toothpaste’, and how public relations directors at the time boasted that ‘scientific methods take the guesswork out of politics’.5 In this sense, what we have now is a logical progression of the digitisation of marketing techniques and political persuasion techniques.

Judging Facebook’s Privacy Shift

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/03/judging_faceboo.html

Facebook is making a new and stronger commitment to privacy. Last month, the company hired three of its most vociferous critics and installed them in senior technical positions. And on Wednesday, Mark Zuckerberg wrote that the company will pivot to focus on private conversations over the public sharing that has long defined the platform, even while conceding that “frankly we don’t currently have a strong reputation for building privacy protective services.”

There is ample reason to question Zuckerberg’s pronouncement: The company has made — and broken — many privacy promises over the years. And if you read his 3,000-word post carefully, Zuckerberg says nothing about changing Facebook’s surveillance capitalism business model. All the post discusses is making private chats more central to the company, which seems to be a play for increased market dominance and to counter the Chinese company WeChat.

In security and privacy, the devil is always in the details — and Zuckerberg’s post provides none. But we’ll take him at his word and try to fill in some of the details here. What follows is a list of changes we should expect if Facebook is serious about changing its business model and improving user privacy.

How Facebook treats people on its platform

Increased transparency over advertiser and app accesses to user data. Today, Facebook users can download and view much of the data the company has about them. This is important, but it doesn’t go far enough. The company could be more transparent about what data it shares with advertisers and others and how it allows advertisers to select users they show ads to. Facebook could use its substantial skills in usability testing to help people understand the mechanisms advertisers use to show them ads or the reasoning behind what it chooses to show in user timelines. It could deliver on promises in this area.

Better — and more usable — privacy options. Facebook users have limited control over how their data is shared with other Facebook users and almost no control over how it is shared with Facebook’s advertisers, which are the company’s real customers. Moreover, the controls are buried deep behind complex and confusing menu options. To be fair, some of this is because privacy is complex, and it’s hard to understand the results of different options. But much of this is deliberate; Facebook doesn’t want its users to make their data private from other users.

The company could give people better control over how — and whether — their data is used, shared, and sold. For example, it could allow users to turn off individually targeted news and advertising. By this, we don’t mean simply making those advertisements invisible; we mean turning off the data flows into those tailoring systems. Finally, since most users stick to the default options when it comes to configuring their apps, a changing Facebook could tilt those defaults toward more privacy, requiring less tailoring most of the time.

More user protection from stalking. “Facebook stalking” is often thought of as “stalking light,” or “harmless.” But stalkers are rarely harmless. Facebook should acknowledge this class of misuse and work with experts to build tools that protect all of its users, especially its most vulnerable ones. Such tools should guide normal people away from creepiness and give victims power and flexibility to enlist aid from sources ranging from advocates to police.

Fully ending real-name enforcement. Facebook’s real-names policy, requiring people to use their actual legal names on the platform, hurts people such as activists, victims of intimate partner violence, police officers whose work makes them targets, and anyone with a public persona who wishes to have control over how they identify to the public. There are many ways Facebook can improve on this, from ending enforcement to allowing verifying pseudonyms for everyone­ — not just celebrities like Lady Gaga. Doing so would mark a clear shift.

How Facebook runs its platform

Increased transparency of Facebook’s business practices. One of the hard things about evaluating Facebook is the effort needed to get good information about its business practices. When violations are exposed by the media, as they regularly are, we are all surprised at the different ways Facebook violates user privacy. Most recently, the company used phone numbers provided for two-factor authentication for advertising and networking purposes. Facebook needs to be both explicit and detailed about how and when it shares user data. In fact, a move from discussing “sharing” to discussing “transfers,” “access to raw information,” and “access to derived information” would be a visible improvement.

Increased transparency regarding censorship rules. Facebook makes choices about what content is acceptable on its site. Those choices are controversial, implemented by thousands of low-paid workers quickly implementing unclear rules. These are tremendously hard problems without clear solutions. Even obvious rules like banning hateful words run into challenges when people try to legitimately discuss certain important topics. Whatever Facebook does in this regard, the company needs be more transparent about its processes. It should allow regulators and the public to audit the company’s practices. Moreover, Facebook should share any innovative engineering solutions with the world, much as it currently shares its data center engineering.

Better security for collected user data. There have been numerous examples of attackers targeting cloud service platforms to gain access to user data. Facebook has a large and skilled product security team that says some of the right things. That team needs to be involved in the design trade-offs for features and not just review the near-final designs for flaws. Shutting down a feature based on internal security analysis would be a clear message.

Better data security so Facebook sees less. Facebook eavesdrops on almost every aspect of its users’ lives. On the other hand, WhatsApp — purchased by Facebook in 2014 — provides users with end-to-end encrypted messaging. While Facebook knows who is messaging whom and how often, Facebook has no way of learning the contents of those messages. Recently, Facebook announced plans to combine WhatsApp, Facebook Messenger, and Instagram, extending WhatsApp’s security to the consolidated system. Changing course here would be a dramatic and negative signal.

Collecting less data from outside of Facebook. Facebook doesn’t just collect data about you when you’re on the platform. Because its “like” button is on so many other pages, the company can collect data about you when you’re not on Facebook. It even collects what it calls “shadow profiles” — data about you even if you’re not a Facebook user. This data is combined with other surveillance data the company buys, including health and financial data. Collecting and saving less of this data would be a strong indicator of a new direction for the company.

Better use of Facebook data to prevent violence. There is a trade-off between Facebook seeing less and Facebook doing more to prevent hateful and inflammatory speech. Dozens of people have been killed by mob violence because of fake news spread on WhatsApp. If Facebook were doing a convincing job of controlling fake news without end-to-end encryption, then we would expect to hear how it could use patterns in metadata to handle encrypted fake news.

How Facebook manages for privacy

Create a team measured on privacy and trust. Where companies spend their money tells you what matters to them. Facebook has a large and important growth team, but what team, if any, is responsible for privacy, not as a matter of compliance or pushing the rules, but for engineering? Transparency in how it is staffed relative to other teams would be telling.

Hire a senior executive responsible for trust. Facebook’s current team has been focused on growth and revenue. Its one chief security officer, Alex Stamos, was not replaced when he left in 2018, which may indicate that having an advocate for security on the leadership team led to debate and disagreement. Retaining a voice for security and privacy issues at the executive level, before those issues affected users, was a good thing. Now that responsibility is diffuse. It’s unclear how Facebook measures and assesses its own progress and who might be held accountable for failings. Facebook can begin the process of fixing this by designating a senior executive who is responsible for trust.

Engage with regulators. Much of Facebook’s posturing seems to be an attempt to forestall regulation. Facebook sends lobbyists to Washington and other capitals, and until recently the company sent support staff to politician’s offices. It has secret lobbying campaigns against privacy laws. And Facebook has repeatedly violated a 2011 Federal Trade Commission consent order regarding user privacy. Regulating big technical projects is not easy. Most of the people who understand how these systems work understand them because they build them. Societies will regulate Facebook, and the quality of that regulation requires real education of legislators and their staffs. While businesses often want to avoid regulation, any focus on privacy will require strong government oversight. If Facebook is serious about privacy being a real interest, it will accept both government regulation and community input.

User privacy is traditionally against Facebook’s core business interests. Advertising is its business model, and targeted ads sell better and more profitably — and that requires users to engage with the platform as much as possible. Increased pressure on Facebook to manage propaganda and hate speech could easily lead to more surveillance. But there is pressure in the other direction as well, as users equate privacy with increased control over how they present themselves on the platform.

We don’t expect Facebook to abandon its advertising business model, relent in its push for monopolistic dominance, or fundamentally alter its social networking platforms. But the company can give users important privacy protections and controls without abandoning surveillance capitalism. While some of these changes will reduce profits in the short term, we hope Facebook’s leadership realizes that they are in the best long-term interest of the company.

Facebook talks about community and bringing people together. These are admirable goals, and there’s plenty of value (and profit) in having a sustainable platform for connecting people. But as long as the most important measure of success is short-term profit, doing things that help strengthen communities will fall by the wayside. Surveillance, which allows individually targeted advertising, will be prioritized over user privacy. Outrage, which drives engagement, will be prioritized over feelings of belonging. And corporate secrecy, which allows Facebook to evade both regulators and its users, will be prioritized over societal oversight. If Facebook now truly believes that these latter options are critical to its long-term success as a company, we welcome the changes that are forthcoming.

This essay was co-authored with Adam Shostack, and originally appeared on Medium OneZero. We wrote a similar essay in 2002 about judging Microsoft’s then newfound commitment to security.

Privacy and Security of Data at Universities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/11/privacy_and_sec.html

Interesting paper: “Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier,” by Christine Borgman:

Abstract: As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of “grey data” about individuals in their daily activities of research, teaching, learning, services, and administration. The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This Article explores the competing values inherent in data stewardship and makes recommendations for practice by drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.

Privacy for Tigers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/10/privacy_for_tig.html

Ross Anderson has some new work:

As mobile phone masts went up across the world’s jungles, savannas and mountains, so did poaching. Wildlife crime syndicates can not only coordinate better but can mine growing public data sets, often of geotagged images. Privacy matters for tigers, for snow leopards, for elephants and rhinos ­ and even for tortoises and sharks. Animal data protection laws, where they exist at all, are oblivious to these new threats, and no-one seems to have started to think seriously about information security.

Video here.

California Passes New Privacy Law

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/07/california_pass.html

The California legislature unanimously passed the strongest data privacy law in the nation. This is great news, but I have a lot of reservations. The Internet tech companies pressed to get this law passed out of self-defense. A ballot initiative was already going to be voted on in November, one with even stronger data privacy protections. The author of that initiative agreed to pull it if the legislature passed something similar, and that’s why it did. This law doesn’t take effect until 2020, and that gives the legislature a lot of time to amend the law before it actually protects anyone’s privacy. And a conventional law is much easier to amend than a ballot initiative. Just as the California legislature gutted its net neutrality law in committee at the behest of the telcos, I expect it to do the same with this law at the behest of the Internet giants.

So: tentative hooray, I guess.

Facebook and Cambridge Analytica

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/03/facebook_and_ca.html

In the wake of the Cambridge Analytica scandal, news articles and commentators have focused on what Facebook knows about us. A lot, it turns out. It collects data from our posts, our likes, our photos, things we type and delete without posting, and things we do while not on Facebook and even when we’re offline. It buys data about us from others. And it can infer even more: our sexual orientation, political beliefs, relationship status, drug use, and other personality traits — even if we didn’t take the personality test that Cambridge Analytica developed.

But for every article about Facebook’s creepy stalker behavior, thousands of other companies are breathing a collective sigh of relief that it’s Facebook and not them in the spotlight. Because while Facebook is one of the biggest players in this space, there are thousands of other companies that spy on and manipulate us for profit.

Harvard Business School professor Shoshana Zuboff calls it “surveillance capitalism.” And as creepy as Facebook is turning out to be, the entire industry is far creepier. It has existed in secret far too long, and it’s up to lawmakers to force these companies into the public spotlight, where we can all decide if this is how we want society to operate and — if not — what to do about it.

There are 2,500 to 4,000 data brokers in the United States whose business is buying and selling our personal data. Last year, Equifax was in the news when hackers stole personal information on 150 million people, including Social Security numbers, birth dates, addresses, and driver’s license numbers.

You certainly didn’t give it permission to collect any of that information. Equifax is one of those thousands of data brokers, most of them you’ve never heard of, selling your personal information without your knowledge or consent to pretty much anyone who will pay for it.

Surveillance capitalism takes this one step further. Companies like Facebook and Google offer you free services in exchange for your data. Google’s surveillance isn’t in the news, but it’s startlingly intimate. We never lie to our search engines. Our interests and curiosities, hopes and fears, desires and sexual proclivities, are all collected and saved. Add to that the websites we visit that Google tracks through its advertising network, our Gmail accounts, our movements via Google Maps, and what it can collect from our smartphones.

That phone is probably the most intimate surveillance device ever invented. It tracks our location continuously, so it knows where we live, where we work, and where we spend our time. It’s the first and last thing we check in a day, so it knows when we wake up and when we go to sleep. We all have one, so it knows who we sleep with. Uber used just some of that information to detect one-night stands; your smartphone provider and any app you allow to collect location data knows a lot more.

Surveillance capitalism drives much of the internet. It’s behind most of the “free” services, and many of the paid ones as well. Its goal is psychological manipulation, in the form of personalized advertising to persuade you to buy something or do something, like vote for a candidate. And while the individualized profile-driven manipulation exposed by Cambridge Analytica feels abhorrent, it’s really no different from what every company wants in the end. This is why all your personal information is collected, and this is why it is so valuable. Companies that can understand it can use it against you.

None of this is new. The media has been reporting on surveillance capitalism for years. In 2015, I wrote a book about it. Back in 2010, the Wall Street Journal published an award-winning two-year series about how people are tracked both online and offline, titled “What They Know.”

Surveillance capitalism is deeply embedded in our increasingly computerized society, and if the extent of it came to light there would be broad demands for limits and regulation. But because this industry can largely operate in secret, only occasionally exposed after a data breach or investigative report, we remain mostly ignorant of its reach.

This might change soon. In 2016, the European Union passed the comprehensive General Data Protection Regulation, or GDPR. The details of the law are far too complex to explain here, but some of the things it mandates are that personal data of EU citizens can only be collected and saved for “specific, explicit, and legitimate purposes,” and only with explicit consent of the user. Consent can’t be buried in the terms and conditions, nor can it be assumed unless the user opts in. This law will take effect in May, and companies worldwide are bracing for its enforcement.

Because pretty much all surveillance capitalism companies collect data on Europeans, this will expose the industry like nothing else. Here’s just one example. In preparation for this law, PayPal quietly published a list of over 600 companies it might share your personal data with. What will it be like when every company has to publish this sort of information, and explicitly explain how it’s using your personal data? We’re about to find out.

In the wake of this scandal, even Mark Zuckerberg said that his industry probably should be regulated, although he’s certainly not wishing for the sorts of comprehensive regulation the GDPR is bringing to Europe.

He’s right. Surveillance capitalism has operated without constraints for far too long. And advances in both big data analysis and artificial intelligence will make tomorrow’s applications far creepier than today’s. Regulation is the only answer.

The first step to any regulation is transparency. Who has our data? Is it accurate? What are they doing with it? Who are they selling it to? How are they securing it? Can we delete it? I don’t see any hope of Congress passing a GDPR-like data protection law anytime soon, but it’s not too far-fetched to demand laws requiring these companies to be more transparent in what they’re doing.

One of the responses to the Cambridge Analytica scandal is that people are deleting their Facebook accounts. It’s hard to do right, and doesn’t do anything about the data that Facebook collects about people who don’t use Facebook. But it’s a start. The market can put pressure on these companies to reduce their spying on us, but it can only do that if we force the industry out of its secret shadows.

This essay previously appeared on CNN.com.

EDITED TO ADD (4/2): Slashdot thread.

The 600+ Companies PayPal Shares Your Data With

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/03/the_600_compani.html

One of the effects of GDPR — the new EU General Data Protection Regulation — is that we’re all going to be learning a lot more about who collects our data and what they do with it. Consider PayPal, that just released a list of over 600 companies they share customer data with. Here’s a good visualization of that data.

Is 600 companies unusual? Is it more than average? Less? We’ll soon know.

Can Consumers’ Online Data Be Protected?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/can_consumers_o.html

Everything online is hackable. This is true for Equifax’s data and the federal Office of Personal Management’s data, which was hacked in 2015. If information is on a computer connected to the Internet, it is vulnerable.

But just because everything is hackable doesn’t mean everything will be hacked. The difference between the two is complex, and filled with defensive technologies, security best practices, consumer awareness, the motivation and skill of the hacker and the desirability of the data. The risks will be different if an attacker is a criminal who just wants credit card details ­ and doesn’t care where he gets them from ­ or the Chinese military looking for specific data from a specific place.

The proper question isn’t whether it’s possible to protect consumer data, but whether a particular site protects our data well enough for the benefits provided by that site. And here, again, there are complications.

In most cases, it’s impossible for consumers to make informed decisions about whether their data is protected. We have no idea what sorts of security measures Google uses to protect our highly intimate Web search data or our personal e-mails. We have no idea what sorts of security measures Facebook uses to protect our posts and conversations.

We have a feeling that these big companies do better than smaller ones. But we’re also surprised when a lone individual publishes personal data hacked from the infidelity site AshleyMadison.com, or when the North Korean government does the same with personal information in Sony’s network.

Think about all the companies collecting personal data about you ­ the websites you visit, your smartphone and its apps, your Internet-connected car — and how little you know about their security practices. Even worse, credit bureaus and data brokers like Equifax collect your personal information without your knowledge or consent.

So while it might be possible for companies to do a better job of protecting our data, you as a consumer are in no position to demand such protection.

Government policy is the missing ingredient. We need standards and a method for enforcement. We need liabilities and the ability to sue companies that poorly secure our data. The biggest reason companies don’t protect our data online is that it’s cheaper not to. Government policy is how we change that.

This essay appeared as half of a point/counterpoint with Priscilla Regan, in a CQ Researcher report titled “Privacy and the Internet.”

Locating Secret Military Bases via Fitness Data

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/locating_secret.html

In November, the company Strava released an anonymous data-visualization map showing all the fitness activity by everyone using the app.

Over this weekend, someone realized that it could be used to locate secret military bases: just look for repeated fitness activity in the middle of nowhere.

News article.

Tracking People Without GPS

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/tracking_people_5.html

Interesting research:

The trick in accurately tracking a person with this method is finding out what kind of activity they’re performing. Whether they’re walking, driving a car, or riding in a train or airplane, it’s pretty easy to figure out when you know what you’re looking for.

The sensors can determine how fast a person is traveling and what kind of movements they make. Moving at a slow pace in one direction indicates walking. Going a little bit quicker but turning at 90-degree angles means driving. Faster yet, we’re in train or airplane territory. Those are easy to figure out based on speed and air pressure.

After the app determines what you’re doing, it uses the information it collects from the sensors. The accelerometer relays your speed, the magnetometer tells your relation to true north, and the barometer offers up the air pressure around you and compares it to publicly available information. It checks in with The Weather Channel to compare air pressure data from the barometer to determine how far above sea level you are. Google Maps and data offered by the US Geological Survey Maps provide incredibly detailed elevation readings.

Once it has gathered all of this information and determined the mode of transportation you’re currently taking, it can then begin to narrow down where you are. For flights, four algorithms begin to estimate the target’s location and narrows down the possibilities until its error rate hits zero.

If you’re driving, it can be even easier. The app knows the time zone you’re in based on the information your phone has provided to it. It then accesses information from your barometer and magnetometer and compares it to information from publicly available maps and weather reports. After that, it keeps track of the turns you make. With each turn, the possible locations whittle down until it pinpoints exactly where you are.

To demonstrate how accurate it is, researchers did a test run in Philadelphia. It only took 12 turns before the app knew exactly where the car was.

This is a good example of how powerful synthesizing information from disparate data sources can be. We spend too much time worried about individual data collection systems, and not enough about analysis techniques of those systems.

Research paper.

What the NSA Collects via 702

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/what_the_nsa_co.html

New York Times reporter Charlie Savage writes about some bad statistics we’re all using:

Among surveillance legal policy specialists, it is common to cite a set of statistics from an October 2011 opinion by Judge John Bates, then of the FISA Court, about the volume of internet communications the National Security Agency was collecting under the FISA Amendments Act (“Section 702”) warrantless surveillance program. In his opinion, declassified in August 2013, Judge Bates wrote that the NSA was collecting more than 250 million internet communications a year, of which 91 percent came from its Prism system (which collects stored e-mails from providers like Gmail) and 9 percent came from its upstream system (which collects transmitted messages from network operators like AT&T).

These numbers are wrong. This blog post will address, first, the widespread nature of this misunderstanding; second, how I came to FOIA certain documents trying to figure out whether the numbers really added up; third, what those documents show; and fourth, what I further learned in talking to an intelligence official. This is far too dense and weedy for a New York Times article, but should hopefully be of some interest to specialists.

Worth reading for the details.

On the Equifax Data Breach

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/on_the_equifax_.html

Last Thursday, Equifax reported a data breach that affects 143 million US customers, about 44% of the population. It’s an extremely serious breach; hackers got access to full names, Social Security numbers, birth dates, addresses, driver’s license numbers — exactly the sort of information criminals can use to impersonate victims to banks, credit card companies, insurance companies, and other businesses vulnerable to fraud.

Many sites posted guides to protecting yourself now that it’s happened. But if you want to prevent this kind of thing from happening again, your only solution is government regulation (as unlikely as that may be at the moment).

The market can’t fix this. Markets work because buyers choose between sellers, and sellers compete for buyers. In case you didn’t notice, you’re not Equifax’s customer. You’re its product.

This happened because your personal information is valuable, and Equifax is in the business of selling it. The company is much more than a credit reporting agency. It’s a data broker. It collects information about all of us, analyzes it all, and then sells those insights.

Its customers are people and organizations who want to buy information: banks looking to lend you money, landlords deciding whether to rent you an apartment, employers deciding whether to hire you, companies trying to figure out whether you’d be a profitable customer — everyone who wants to sell you something, even governments.

It’s not just Equifax. It might be one of the biggest, but there are 2,500 to 4,000 other data brokers that are collecting, storing, and selling information about you — almost all of them companies you’ve never heard of and have no business relationship with.

Surveillance capitalism fuels the Internet, and sometimes it seems that everyone is spying on you. You’re secretly tracked on pretty much every commercial website you visit. Facebook is the largest surveillance organization mankind has created; collecting data on you is its business model. I don’t have a Facebook account, but Facebook still keeps a surprisingly complete dossier on me and my associations — just in case I ever decide to join.

I also don’t have a Gmail account, because I don’t want Google storing my e-mail. But my guess is that it has about half of my e-mail anyway, because so many people I correspond with have accounts. I can’t even avoid it by choosing not to write to gmail.com addresses, because I have no way of knowing if [email protected] is hosted at Gmail.

And again, many companies that track us do so in secret, without our knowledge and consent. And most of the time we can’t opt out. Sometimes it’s a company like Equifax that doesn’t answer to us in any way. Sometimes it’s a company like Facebook, which is effectively a monopoly because of its sheer size. And sometimes it’s our cell phone provider. All of them have decided to track us and not compete by offering consumers privacy. Sure, you can tell people not to have an e-mail account or cell phone, but that’s not a realistic option for most people living in 21st-century America.

The companies that collect and sell our data don’t need to keep it secure in order to maintain their market share. They don’t have to answer to us, their products. They know it’s more profitable to save money on security and weather the occasional bout of bad press after a data loss. Yes, we are the ones who suffer when criminals get our data, or when our private information is exposed to the public, but ultimately why should Equifax care?

Yes, it’s a huge black eye for the company — this week. Soon, another company will have suffered a massive data breach and few will remember Equifax’s problem. Does anyone remember last year when Yahoo admitted that it exposed personal information of a billion users in 2013 and another half billion in 2014?

This market failure isn’t unique to data security. There is little improvement in safety and security in any industry until government steps in. Think of food, pharmaceuticals, cars, airplanes, restaurants, workplace conditions, and flame-retardant pajamas.

Market failures like this can only be solved through government intervention. By regulating the security practices of companies that store our data, and fining companies that fail to comply, governments can raise the cost of insecurity high enough that security becomes a cheaper alternative. They can do the same thing by giving individuals affected by these breaches the ability to sue successfully, citing the exposure of personal data itself as a harm.

By all means, take the recommended steps to protect yourself from identity theft in the wake of Equifax’s data breach, but recognize that these steps are only effective on the margins, and that most data security is out of your hands. Perhaps the Federal Trade Commission will get involved, but without evidence of “unfair and deceptive trade practices,” there’s nothing it can do. Perhaps there will be a class-action lawsuit, but because it’s hard to draw a line between any of the many data breaches you’re subjected to and a specific harm, courts are not likely to side with you.

If you don’t like how careless Equifax was with your data, don’t waste your breath complaining to Equifax. Complain to your government.

This essay previously appeared on CNN.com.

EDITED TO ADD: In the early hours of this breach, I did a radio interview where I minimized the ramifications of this. I didn’t know the full extent of the breach, and thought it was just another in an endless string of breaches. I wondered why the press was covering this one and not many of the others. I don’t remember which radio show interviewed me. I kind of hope it didn’t air.

Me on Restaurant Surveillance Technology

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/07/me_on_restauran.html

I attended the National Restaurant Association exposition in Chicago earlier this year, and looked at all the ways modern restaurant IT is spying on people.

But there’s also a fundamentally creepy aspect to much of this. One of the prime ways to increase value for your brand is to use the Internet to practice surveillance of both your customers and employees. The customer side feels less invasive: Loyalty apps are pretty nice, if in fact you generally go to the same place, as is the ability to place orders electronically or make reservations with a click. The question, Schneier asks, is “who owns the data?” There’s value to collecting data on spending habits, as we’ve seen across e-commerce. Are restaurants fully aware of what they are giving away? Schneier, a critic of data mining, points out that it becomes especially invasive through “secondary uses,” when the “data is correlated with other data and sold to third parties.” For example, perhaps you’ve entered your name, gender, and age into a taco loyalty app (12th taco free!). Later, the vendors of that app sell your data to other merchants who know where and when you eat, whether you are a vegetarian, and lots of other data that you have accidentally shed. Is that what customers really want?