Tag Archives: propaganda

Influence Operations Kill Chain

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/08/influence_opera.html

Influence operations are elusive to define. The Rand Corp.’s definition is as good as any: “the collection of tactical information about an adversary as well as the dissemination of propaganda in pursuit of a competitive advantage over an opponent.” Basically, we know it when we see it, from bots controlled by the Russian Internet Research Agency to Saudi attempts to plant fake stories and manipulate political debate. These operations have been run by Iran against the United States, Russia against Ukraine, China against Taiwan, and probably lots more besides.

Since the 2016 US presidential election, there have been an endless series of ideas about how countries can defend themselves. It’s time to pull those together into a comprehensive approach to defending the public sphere and the institutions of democracy.

Influence operations don’t come out of nowhere. They exploit a series of predictable weaknesses — and fixing those holes should be the first step in fighting them. In cybersecurity, this is known as a “kill chain.” That can work in fighting influence operations, too­ — laying out the steps of an attack and building the taxonomy of countermeasures.

In an exploratory blog post, I first laid out a straw man information operations kill chain. I started with the seven commandments, or steps, laid out in a 2018 New York Times opinion video series on “Operation Infektion,” a 1980s Russian disinformation campaign. The information landscape has changed since the 1980s, and these operations have changed as well. Based on my own research and feedback from that initial attempt, I have modified those steps to bring them into the present day. I have also changed the name from “information operations” to “influence operations,” because the former is traditionally defined by the US Department of Defense in ways that don’t really suit these sorts of attacks.

Step 1: Find the cracks in the fabric of society­ — the social, demographic, economic, and ethnic divisions. For campaigns that just try to weaken collective trust in government’s institutions, lots of cracks will do. But for influence operations that are more directly focused on a particular policy outcome, only those related to that issue will be effective.

Countermeasures: There will always be open disagreements in a democratic society, but one defense is to shore up the institutions that make that society possible. Elsewhere I have written about the “common political knowledge” necessary for democracies to function. That shared knowledge has to be strengthened, thereby making it harder to exploit the inevitable cracks. It needs to be made unacceptable — or at least costly — for domestic actors to use these same disinformation techniques in their own rhetoric and political maneuvering, and to highlight and encourage cooperation when politicians honestly work across party lines. The public must learn to become reflexively suspicious of information that makes them angry at fellow citizens. These cracks can’t be entirely sealed, as they emerge from the diversity that makes democracies strong, but they can be made harder to exploit. Much of the work in “norms” falls here, although this is essentially an unfixable problem. This makes the countermeasures in the later steps even more important.

Step 2: Build audiences, either by directly controlling a platform (like RT) or by cultivating relationships with people who will be receptive to those narratives. In 2016, this consisted of creating social media accounts run either by human operatives or automatically by bots, making them seem legitimate, gathering followers. In the years following, this has gotten subtler. As social media companies have gotten better at deleting these accounts, two separate tactics have emerged. The first is microtargeting, where influence accounts join existing social circles and only engage with a few different people. The other is influencer influencing, where these accounts only try to affect a few proxies (see step 6) — either journalists or other influencers — who can carry their message for them.

Countermeasures: This is where social media companies have made all the difference. By allowing groups of like-minded people to find and talk to each other, these companies have given propagandists the ability to find audiences who are receptive to their messages. Social media companies need to detect and delete accounts belonging to propagandists as well as bots and groups run by those propagandists. Troll farms exhibit particular behaviors that the platforms need to be able to recognize. It would be best to delete accounts early, before those accounts have the time to establish themselves.

This might involve normally competitive companies working together, since operations and account names often cross platforms, and cross-platform visibility is an important tool for identifying them. Taking down accounts as early as possible is important, because it takes time to establish the legitimacy and reach of any one account. The NSA and US Cyber Command worked with the FBI and social media companies to take down Russian propaganda accounts during the 2018 midterm elections. It may be necessary to pass laws requiring Internet companies to do this. While many social networking companies have reversed their “we don’t care” attitudes since the 2016 election, there’s no guarantee that they will continue to remove these accounts — especially since their profits depend on engagement and not accuracy.

Step 3: Seed distortion by creating alternative narratives. In the 1980s, this was a single “big lie,” but today it is more about many contradictory alternative truths — a “firehose of falsehood” — that distort the political debate. These can be fake or heavily slanted news stories, extremist blog posts, fake stories on real-looking websites, deepfake videos, and so on.

Countermeasures: Fake news and propaganda are viruses; they spread through otherwise healthy populations. Fake news has to be identified and labeled as such by social media companies and others, including recognizing and identifying manipulated videos known as deepfakes. Facebook is already making moves in this direction. Educators need to teach better digital literacy, as Finland is doing. All of this will help people recognize propaganda campaigns when they occur, so they can inoculate themselves against their effects. This alone cannot solve the problem, as much sharing of fake news is about social signaling, and those who share it care more about how it demonstrates their core beliefs than whether or not it is true. Still, it is part of the solution.

Step 4: Wrap those narratives in kernels of truth. A core of fact makes falsehoods more believable and helps them spread. Releasing stolen emails from Hillary Clinton’s campaign chairman John Podesta and the Democratic National Committee, or documents from Emmanuel Macron’s campaign in France, were both an example of that kernel of truth. Releasing stolen emails with a few deliberate falsehoods embedded among them is an even more effective tactic.

Countermeasures: Defenses involve exposing the untruths and distortions, but this is also complicated to put into practice. Fake news sows confusion just by being there. Psychologists have demonstrated that an inadvertent effect of debunking a piece of fake news is to amplify the message of that debunked story. Hence, it is essential to replace the fake news with accurate narratives that counter the propaganda. That kernel of truth is part of a larger true narrative. The media needs to learn skepticism about the chain of information and to exercise caution in how they approach debunked stories.

Step 5: Conceal your hand. Make it seem as if the stories came from somewhere else.

Countermeasures: Here the answer is attribution, attribution, attribution. The quicker an influence operation can be pinned on an attacker, the easier it is to defend against it. This will require efforts by both the social media platforms and the intelligence community, not just to detect influence operations and expose them but also to be able to attribute attacks. Social media companies need to be more transparent about how their algorithms work and make source publications more obvious for online articles. Even small measures like the Honest Ads Act, requiring transparency in online political ads, will help. Where companies lack business incentives to do this, regulation will be the only answer.

Step 6: Cultivate proxies who believe and amplify the narratives. Traditionally, these people have been called “useful idiots.” Encourage them to take action outside of the Internet, like holding political rallies, and to adopt positions even more extreme than they would otherwise.

Countermeasures: We can mitigate the influence of people who disseminate harmful information, even if they are unaware they are amplifying deliberate propaganda. This does not mean that the government needs to regulate speech; corporate platforms already employ a variety of systems to amplify and diminish particular speakers and messages. Additionally, the antidote to the ignorant people who repeat and amplify propaganda messages is other influencers who respond with the truth — in the words of one report, we must “make the truth louder.” Of course, there will always be true believers for whom no amount of fact-checking or counter-speech will suffice; this is not intended for them. Focus instead on persuading the persuadable.

Step 7: Deny involvement in the propaganda campaign, even if the truth is obvious. Although since one major goal is to convince people that nothing can be trusted, rumors of involvement can be beneficial. The first was Russia’s tactic during the 2016 US presidential election; it employed the second during the 2018 midterm elections.

Countermeasures: When attack attribution relies on secret evidence, it is easy for the attacker to deny involvement. Public attribution of information attacks must be accompanied by convincing evidence. This will be difficult when attribution involves classified intelligence information, but there is no alternative. Trusting the government without evidence, as the NSA’s Rob Joyce recommended in a 2016 talk, is not enough. Governments will have to disclose.

Step 8: Play the long game. Strive for long-term impact over immediate effects. Engage in multiple operations; most won’t be successful, but some will.

Countermeasures: Counterattacks can disrupt the attacker’s ability to maintain influence operations, as US Cyber Command did during the 2018 midterm elections. The NSA’s new policy of “persistent engagement” (see the article by, and interview with, US Cyber Command Commander Paul Nakasone here) is a strategy to achieve this. So are targeted sanctions and indicting individuals involved in these operations. While there is little hope of bringing them to the United States to stand trial, the possibility of not being able to travel internationally for fear of being arrested will lead some people to refuse to do this kind of work. More generally, we need to better encourage both politicians and social media companies to think beyond the next election cycle or quarterly earnings report.

Permeating all of this is the importance of deterrence. Deterring them will require a different theory. It will require, as the political scientist Henry Farrell and I have postulated, thinking of democracy itself as an information system and understanding “Democracy’s Dilemma“: how the very tools of a free and open society can be subverted to attack that society. We need to adjust our theories of deterrence to the realities of the information age and the democratization of attackers. If we can mitigate the effectiveness of influence operations, if we can publicly attribute, if we can respond either diplomatically or otherwise — we can deter these attacks from nation-states.

None of these defensive actions is sufficient on its own. Steps overlap and in some cases can be skipped. Steps can be conducted simultaneously or out of order. A single operation can span multiple targets or be an amalgamation of multiple attacks by multiple actors. Unlike a cyberattack, disrupting will require more than disrupting any particular step. It will require a coordinated effort between government, Internet platforms, the media, and others.

Also, this model is not static, of course. Influence operations have already evolved since the 2016 election and will continue to evolve over time — especially as countermeasures are deployed and attackers figure out how to evade them. We need to be prepared for wholly different kinds of influencer operations during the 2020 US presidential election. The goal of this kill chain is to be general enough to encompass a panoply of tactics but specific enough to illuminate countermeasures. But even if this particular model doesn’t fit every influence operation, it’s important to start somewhere.

Others have worked on similar ideas. Anthony Soules, a former NSA employee who now leads cybersecurity strategy for Amgen, presented this concept at a private event. Clint Watts of the Alliance for Securing Democracy is thinking along these lines as well. The Credibility Coalition’s Misinfosec Working Group proposed a “misinformation pyramid.” The US Justice Department developed a “Malign Foreign Influence Campaign Cycle,” with associated countermeasures.

The threat from influence operations is real and important, and it deserves more study. At the same time, there’s no reason to panic. Just as overly optimistic technologists were wrong that the Internet was the single technology that was going to overthrow dictators and liberate the planet, so pessimists are also probably wrong that it is going to empower dictators and destroy democracy. If we deploy countermeasures across the entire kill chain, we can defend ourselves from these attacks.

But Russian interference in the 2016 presidential election shows not just that such actions are possible but also that they’re surprisingly inexpensive to run. As these tactics continue to be democratized, more people will attempt them. And as more people, and multiple parties, conduct influence operations, they will increasingly be seen as how the game of politics is played in the information age. This means that the line will increasingly blur between influence operations and politics as usual, and that domestic influencers will be using them as part of campaigning. Defending democracy against foreign influence also necessitates making our own political debate healthier.

This essay previously appeared in Foreign Policy.

Fake News and Pandemics

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/06/fake_news_and_p.html

When the next pandemic strikes, we’ll be fighting it on two fronts. The first is the one you immediately think about: understanding the disease, researching a cure and inoculating the population. The second is new, and one you might not have thought much about: fighting the deluge of rumors, misinformation and flat-out lies that will appear on the internet.

The second battle will be like the Russian disinformation campaigns during the 2016 presidential election, only with the addition of a deadly health crisis and possibly without a malicious government actor. But while the two problems — misinformation affecting democracy and misinformation affecting public health — will have similar solutions, the latter is much less political. If we work to solve the pandemic disinformation problem, any solutions are likely to also be applicable to the democracy one.

Pandemics are part of our future. They might be like the 1968 Hong Kong flu, which killed a million people, or the 1918 Spanish flu, which killed over 40 million. Yes, modern medicine makes pandemics less likely and less deadly. But global travel and trade, increased population density, decreased wildlife habitats, and increased animal farming to satisfy a growing and more affluent population have made them more likely. Experts agree that it’s not a matter of if — it’s only a matter of when.

When the next pandemic strikes, accurate information will be just as important as effective treatments. We saw this in 2014, when the Nigerian government managed to contain a subcontinentwide Ebola epidemic to just 20 infections and eight fatalities. Part of that success was because of the ways officials communicated health information to all Nigerians, using government-sponsored videos, social media campaigns and international experts. Without that, the death toll in Lagos, a city of 21 million people, would have probably been greater than the 11,000 the rest of the continent experienced.

There’s every reason to expect misinformation to be rampant during a pandemic. In the early hours and days, information will be scant and rumors will abound. Most of us are not health professionals or scientists. We won’t be able to tell fact from fiction. Even worse, we’ll be scared. Our brains work differently when we are scared, and they latch on to whatever makes us feel safer — even if it’s not true.

Rumors and misinformation could easily overwhelm legitimate news channels, as people share tweets, images and videos. Much of it will be well-intentioned but wrong — like the misinformation spread by the anti-vaccination community today ­– but some of it may be malicious. In the 1980s, the KGB ran a sophisticated disinformation campaign ­– Operation Infektion ­– to spread the rumor that HIV/AIDS was a result of an American biological weapon gone awry. It’s reasonable to assume some group or country would deliberately spread intentional lies in an attempt to increase death and chaos.

It’s not just misinformation about which treatments work (and are safe), and which treatments don’t work (and are unsafe). Misinformation can affect society’s ability to deal with a pandemic at many different levels. Right now, Ebola relief efforts in the Democratic Republic of Congo are being stymied by mistrust of health workers and government officials.

It doesn’t take much to imagine how this can lead to disaster. Jay Walker, curator of the TEDMED conferences, laid out some of the possibilities in a 2016 essay: people overwhelming and even looting pharmacies trying to get some drug that is irrelevant or nonexistent, people needlessly fleeing cities and leaving them paralyzed, health workers not showing up for work, truck drivers and other essential people being afraid to enter infected areas, official sites like CDC.gov being hacked and discredited. This kind of thing can magnify the health effects of a pandemic many times over, and in extreme cases could lead to a total societal collapse.

This is going to be something that government health organizations, medical professionals, social media companies and the traditional media are going to have to work out together. There isn’t any single solution; it will require many different interventions that will all need to work together. The interventions will look a lot like what we’re already talking about with regard to government-run and other information influence campaigns that target our democratic processes: methods of visibly identifying false stories, the identification and deletion of fake posts and accounts, ways to promote official and accurate news, and so on. At the scale these are needed, they will have to be done automatically and in real time.

Since the 2016 presidential election, we have been talking about propaganda campaigns, and about how social media amplifies fake news and allows damaging messages to spread easily. It’s a hard discussion to have in today’s hyperpolarized political climate. After any election, the winning side has every incentive to downplay the role of fake news.

But pandemics are different; there’s no political constituency in favor of people dying because of misinformation. Google doesn’t want the results of peoples’ well-intentioned searches to lead to fatalities. Facebook and Twitter don’t want people on their platforms sharing misinformation that will result in either individual or mass deaths. Focusing on pandemics gives us an apolitical way to collectively approach the general problem of misinformation and fake news. And any solutions for pandemics are likely to also be applicable to the more general ­– and more political ­– problems.

Pandemics are inevitable. Bioterror is already possible, and will only get easier as the requisite technologies become cheaper and more common. We’re experiencing the largest measles outbreak in 25 years thanks to the anti-vaccination movement, which has hijacked social media to amplify its messages; we seem unable to beat back the disinformation and pseudoscience surrounding the vaccine. Those same forces will dramatically increase death and social upheaval in the event of a pandemic.

Let the Russian propaganda attacks on the 2016 election serve as a wake-up call for this and other threats. We need to solve the problem of misinformation during pandemics together –­ governments and industries in collaboration with medical officials, all across the world ­– before there’s a crisis. And the solutions will also help us shore up our democracy in the process.

This essay previously appeared in the New York Times.

Propaganda and the Weakening of Trust in Government

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/11/propaganda_and_.html

On November 4, 2016, the hacker “Guccifer 2.0,: a front for Russia’s military intelligence service, claimed in a blogpost that the Democrats were likely to use vulnerabilities to hack the presidential elections. On November 9, 2018, President Donald Trump started tweeting about the senatorial elections in Florida and Arizona. Without any evidence whatsoever, he said that Democrats were trying to steal the election through “FRAUD.”

Cybersecurity experts would say that posts like Guccifer 2.0’s are intended to undermine public confidence in voting: a cyber-attack against the US democratic system. Yet Donald Trump’s actions are doing far more damage to democracy. So far, his tweets on the topic have been retweeted over 270,000 times, eroding confidence far more effectively than any foreign influence campaign.

We need new ideas to explain how public statements on the Internet can weaken American democracy. Cybersecurity today is not only about computer systems. It’s also about the ways attackers can use computer systems to manipulate and undermine public expectations about democracy. Not only do we need to rethink attacks against democracy; we also need to rethink the attackers as well.

This is one key reason why we wrote a new research paper which uses ideas from computer security to understand the relationship between democracy and information. These ideas help us understand attacks which destabilize confidence in democratic institutions or debate.

Our research implies that insider attacks from within American politics can be more pernicious than attacks from other countries. They are more sophisticated, employ tools that are harder to defend against, and lead to harsh political tradeoffs. The US can threaten charges or impose sanctions when Russian trolling agencies attack its democratic system. But what punishments can it use when the attacker is the US president?

People who think about cybersecurity build on ideas about confrontations between states during the Cold War. Intellectuals such as Thomas Schelling developed deterrence theory, which explained how the US and USSR could maneuver to limit each other’s options without ever actually going to war. Deterrence theory, and related concepts about the relative ease of attack and defense, seemed to explain the tradeoffs that the US and rival states faced, as they started to use cyber techniques to probe and compromise each others’ information networks.

However, these ideas fail to acknowledge one key differences between the Cold War and today. Nearly all states — whether democratic or authoritarian — are entangled on the Internet. This creates both new tensions and new opportunities. The US assumed that the internet would help spread American liberal values, and that this was a good and uncontroversial thing. Illiberal states like Russia and China feared that Internet freedom was a direct threat to their own systems of rule. Opponents of the regime might use social media and online communication to coordinate among themselves, and appeal to the broader public, perhaps toppling their governments, as happened in Tunisia during the Arab Spring.

This led illiberal states to develop new domestic defenses against open information flows. As scholars like Molly Roberts have shown, states like China and Russia discovered how they could "flood" internet discussion with online nonsense and distraction, making it impossible for their opponents to talk to each other, or even to distinguish between truth and falsehood. These flooding techniques stabilized authoritarian regimes, because they demoralized and confused the regime’s opponents. Libertarians often argue that the best antidote to bad speech is more speech. What Vladimir Putin discovered was that the best antidote to more speech was bad speech.

Russia saw the Arab Spring and efforts to encourage democracy in its neighborhood as direct threats, and began experimenting with counter-offensive techniques. When a Russia-friendly government in Ukraine collapsed due to popular protests, Russia tried to destabilize new, democratic elections by hacking the system through which the election results would be announced. The clear intention was to discredit the election results by announcing fake voting numbers that would throw public discussion into disarray.

This attack on public confidence in election results was thwarted at the last moment. Even so, it provided the model for a new kind of attack. Hackers don’t have to secretly alter people’s votes to affect elections. All they need to do is to damage public confidence that the votes were counted fairly. As researchers have argued, “simply put, the attacker might not care who wins; the losing side believing that the election was stolen from them may be equally, if not more, valuable.”

These two kinds of attacks — “flooding” attacks aimed at destabilizing public discourse, and “confidence” attacks aimed at undermining public belief in elections — were weaponized against the US in 2016. Russian social media trolls, hired by the “Internet Research Agency,” flooded online political discussions with rumors and counter-rumors in order to create confusion and political division. Peter Pomerantsev describes how in Russia, “one moment [Putin’s media wizard] Surkov would fund civic forums and human rights NGOs, the next he would quietly support nationalist movements that accuse the NGOs of being tools of the West.” Similarly, Russian trolls tried to get Black Lives Matter protesters and anti-Black Lives Matter protesters to march at the same time and place, to create conflict and the appearance of chaos. Guccifer 2.0’s blog post was surely intended to undermine confidence in the vote, preparing the ground for a wider destabilization campaign after Hillary Clinton won the election. Neither Putin nor anyone else anticipated that Trump would win, ushering in chaos on a vastly greater scale.

We do not know how successful these attacks were. A new book by John Sides, Michael Tesler and Lynn Vavreck suggests that Russian efforts had no measurable long-term consequences. Detailed research on the flow of news articles through social media by Yochai Benker, Robert Farris, and Hal Roberts agrees, showing that Fox News was far more influential in the spread of false news stories than any Russian effort.

However, global adversaries like the Russians aren’t the only actors who can use flooding and confidence attacks. US actors can use just the same techniques. Indeed, they can arguably use them better, since they have a better understanding of US politics, more resources, and are far more difficult for the government to counter without raising First Amendment issues.

For example, when the Federal Communication Commission asked for comments on its proposal to get rid of “net neutrality,” it was flooded by fake comments supporting the proposal. Nearly every real person who commented was in favor of net neutrality, but their arguments were drowned out by a flood of spurious comments purportedly made by identities stolen from porn sites, by people whose names and email addresses had been harvested without their permission, and, in some cases, from dead people. This was done not just to generate fake support for the FCC’s controversial proposal. It was to devalue public comments in general, making the general public’s support for net neutrality politically irrelevant. FCC decision making on issues like net neutrality used to be dominated by industry insiders, and many would like to go back to the old regime.

Trump’s efforts to undermine confidence in the Florida and Arizona votes work on a much larger scale. There are clear short-term benefits to asserting fraud where no fraud exists. This may sway judges or other public officials to make concessions to the Republicans to preserve their legitimacy. Yet they also destabilize American democracy in the long term. If Republicans are convinced that Democrats win by cheating, they will feel that their own manipulation of the system (by purging voter rolls, making voting more difficult and so on) are legitimate, and very probably cheat even more flagrantly in the future. This will trash collective institutions and leave everyone worse off.

It is notable that some Arizonan Republicans — including Martha McSally — have so far stayed firm against pressure from the White House and the Republican National Committee to claim that cheating is happening. They presumably see more long term value from preserving existing institutions than undermining them. Very plausibly, Donald Trump has exactly the opposite incentives. By weakening public confidence in the vote today, he makes it easier to claim fraud and perhaps plunge American politics into chaos if he is defeated in 2020.

If experts who see Russian flooding and confidence measures as cyberattacks on US democracy are right, then these attacks are just as dangerous — and perhaps more dangerous — when they are used by domestic actors. The risk is that over time they will destabilize American democracy so that it comes closer to Russia’s managed democracy — where nothing is real any more, and ordinary people feel a mixture of paranoia, helplessness and disgust when they think about politics. Paradoxically, Russian interference is far too ineffectual to get us there — but domestically mounted attacks by all-American political actors might.

To protect against that possibility, we need to start thinking more systematically about the relationship between democracy and information. Our paper provides one way to do this, highlighting the vulnerabilities of democracy against certain kinds of information attack. More generally, we need to build levees against flooding while shoring up public confidence in voting and other public information systems that are necessary to democracy.

The first may require radical changes in how we regulate social media companies. Modernization of government commenting platforms to make them robust against flooding is only a very minimal first step. Up until very recently, companies like Twitter have won market advantage from bot infestations — even when it couldn’t make a profit, it seemed that user numbers were growing. CEOs like Mark Zuckerberg have begun to worry about democracy, but their worries will likely only go so far. It is difficult to get a man to understand something when his business model depends on not understanding it. Sharp — and legally enforceable — limits on automated accounts are a first step. Radical redesign of networks and of trending indicators so that flooding attacks are less effective may be a second.

The second requires general standards for voting at the federal level, and a constitutional guarantee of the right to vote. Technical experts nearly universally favor robust voting systems that would combine paper records with random post-election auditing, to prevent fraud and secure public confidence in voting. Other steps to ensure proper ballot design, and standardize vote counting and reporting will take more time and discussion — yet the record of other countries show that they are not impossible.

The US is nearly unique among major democracies in the persistent flaws of its election machinery. Yet voting is not the only important form of democratic information. Apparent efforts to deliberately skew the US census against counting undocumented immigrants show the need for a more general audit of the political information systems that we need if democracy is to function properly.

It’s easier to respond to Russian hackers through sanctions, counter-attacks and the like than to domestic political attacks that undermine US democracy. To preserve the basic political freedoms of democracy requires recognizing that these freedoms are sometimes going to be abused by politicians such as Donald Trump. The best that we can do is to minimize the possibilities of abuse up to the point where they encroach on basic freedoms and harden the general institutions that secure democratic information against attacks intended to undermine them.

This essay was co-authored with Henry Farrell, and previously appeared on Motherboard, with a terrible headline that I was unable to get changed.

Some notes about the Kaspersky affair

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/some-notes-about-kaspersky-affair.html

I thought I’d write up some notes about Kaspersky, the Russian anti-virus vendor that many believe has ties to Russian intelligence.

There’s two angles to this story. One is whether the accusations are true. The second is the poor way the press has handled the story, with mainstream outlets like the New York Times more intent on pushing government propaganda than informing us what’s going on.

The press

Before we address Kaspersky, we need to talk about how the press covers this.
The mainstream media’s stories have been pure government propaganda, like this one from the New York Times. It garbles the facts of what happened, and relies primarily on anonymous government sources that cannot be held accountable. It’s so messed up that we can’t easily challenge it because we aren’t even sure exactly what it’s claiming.
The Society of Professional Journalists have a name for this abuse of anonymous sources, the “Washington Game“. Journalists can identify this as bad journalism, but the big newspapers like The New York Times continues to do it anyway, because how dare anybody criticize them?
For all that I hate the anti-American bias of The Intercept, at least they’ve had stories that de-garble what’s going on, that explain things so that we can challenge them.

Our Government

Our government can’t tell us everything, of course. But at the same time, they need to tell us something, to at least being clear what their accusations are. These vague insinuations through the media hurt their credibility, not help it. The obvious craptitude is making us in the cybersecurity community come to Kaspersky’s defense, which is not the government’s aim at all.
There are lots of issues involved here, but let’s consider the major one insinuated by the NYTimes story, that Kaspersky was getting “data” files along with copies of suspected malware. This is troublesome if true.
But, as Kaspersky claims today, it’s because they had detected malware within a zip file, and uploaded the entire zip — including the data files within the zip.
This is reasonable. This is indeed how anti-virus generally works. It completely defeats the NYTimes insinuations.
This isn’t to say Kaspersky is telling the truth, of course, but that’s not the point. The point is that we are getting vague propaganda from the government further garbled by the press, making Kaspersky’s clear defense the credible party in the affair.
It’s certainly possible for Kaspersky to write signatures to look for strings like “TS//SI/OC/REL TO USA” that appear in secret US documents, then upload them to Russia. If that’s what our government believes is happening, they need to come out and be explicit about it. They can easily setup honeypots, in the way described in today’s story, to confirm it. However, it seems the government’s description of honeypots is that Kaspersky only upload files that were clearly viruses, not data.

Kaspersky

I believe Kaspersky is guilty, that the company and Eugene himself, works directly with Russian intelligence.
That’s because on a personal basis, people in government have given me specific, credible stories — the sort of thing they should be making public. And these stories are wholly unrelated to stories that have been made public so far.
You shouldn’t believe me, of course, because I won’t go into details you can challenge. I’m not trying to convince you, I’m just disclosing my point of view.
But there are some public reasons to doubt Kaspersky. For example, when trying to sell to our government, they’ve claimed they can help us against terrorists. The translation of this is that they could help our intelligence services. Well, if they are willing to help our intelligence services against customers who are terrorists, then why wouldn’t they likewise help Russian intelligence services against their adversaries?
Then there is how Russia works. It’s a violent country. Most of the people mentioned in that “Steele Dossier” have died. In the hacker community, hackers are often coerced to help the government. Many have simply gone missing.
Being rich doesn’t make Kaspersky immune from this — it makes him more of a target. Russian intelligence knows he’s getting all sorts of good intelligence, such as malware written by foreign intelligence services. It’s unbelievable they wouldn’t put the screws on him to get this sort of thing.
Russia is our adversary. It’d be foolish of our government to buy anti-virus from Russian companies. Likewise, the Russian government won’t buy such products from American companies.

Conclusion

I have enormous disrespect for mainstream outlets like The New York Times and the way they’ve handled the story. It makes me want to come to Kaspersky’s defense.

I have enormous respect for Kaspersky technology. They do good work.

But I hear stories. I don’t think our government should be trusting Kaspersky at all. For that matter, our government shouldn’t trust any cybersecurity products from Russia, China, Iran, etc.

Tainted Leaks

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/tainted_leaks.html

Last year, I wrote about the potential for doxers to alter documents before they leaked them. It was a theoretical threat when I wrote it, but now Citizen Lab has documented this technique in the wild:

This report describes an extensive Russia-linked phishing and disinformation campaign. It provides evidence of how documents stolen from a prominent journalist and critic of Russia was tampered with and then “leaked” to achieve specific propaganda aims. We name this technique “tainted leaks.” The report illustrates how the twin strategies of phishing and tainted leaks are sometimes used in combination to infiltrate civil society targets, and to seed mistrust and disinformation. It also illustrates how domestic considerations, specifically concerns about regime security, can motivate espionage operations, particularly those targeting civil society.

The Quick vs. the Strong: Commentary on Cory Doctorow’s Walkaway

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/the_quick_vs_th.html

Technological advances change the world. That’s partly because of what they are, but even more because of the social changes they enable. New technologies upend power balances. They give groups new capabilities, increased effectiveness, and new defenses. The Internet decades have been a never-ending series of these upendings. We’ve seen existing industries fall and new industries rise. We’ve seen governments become more powerful in some areas and less in others. We’ve seen the rise of a new form of governance: a multi-stakeholder model where skilled individuals can have more power than multinational corporations or major governments.

Among the many power struggles, there is one type I want to particularly highlight: the battles between the nimble individuals who start using a new technology first, and the slower organizations that come along later.

In general, the unempowered are the first to benefit from new technologies: hackers, dissidents, marginalized groups, criminals, and so on. When they first encountered the Internet, it was transformative. Suddenly, they had access to technologies for dissemination, coordination, organization, and action — things that were impossibly hard before. This can be incredibly empowering. In the early decades of the Internet, we saw it in the rise of Usenet discussion forums and special-interest mailing lists, in how the Internet routed around censorship, and how Internet governance bypassed traditional government and corporate models. More recently, we saw it in the SOPA/PIPA debate of 2011-12, the Gezi protests in Turkey and the various “color” revolutions, and the rising use of crowdfunding. These technologies can invert power dynamics, even in the presence of government surveillance and censorship.

But that’s just half the story. Technology magnifies power in general, but the rates of adoption are different. Criminals, dissidents, the unorganized — all outliers — are more agile. They can make use of new technologies faster, and can magnify their collective power because of it. But when the already-powerful big institutions finally figured out how to use the Internet, they had more raw power to magnify.

This is true for both governments and corporations. We now know that governments all over the world are militarizing the Internet, using it for surveillance, censorship, and propaganda. Large corporations are using it to control what we can do and see, and the rise of winner-take-all distribution systems only exacerbates this.

This is the fundamental tension at the heart of the Internet, and information-based technology in general. The unempowered are more efficient at leveraging new technology, while the powerful have more raw power to leverage. These two trends lead to a battle between the quick and the strong: the quick who can make use of new power faster, and the strong who can make use of that same power more effectively.

This battle is playing out today in many different areas of information technology. You can see it in the security vs. surveillance battles between criminals and the FBI, or dissidents and the Chinese government. You can see it in the battles between content pirates and various media organizations. You can see it where social-media giants and Internet-commerce giants battle against new upstarts. You can see it in politics, where the newer Internet-aware organizations fight with the older, more established, political organizations. You can even see it in warfare, where a small cadre of military can keep a country under perpetual bombardment — using drones — with no risk to the attackers.

This battle is fundamental to Cory Doctorow’s new novel Walkaway. Our heroes represent the quick: those who have checked out of traditional society, and thrive because easy access to 3D printers enables them to eschew traditional notions of property. Their enemy is the strong: the traditional government institutions that exert their power mostly because they can. This battle rages through most of the book, as the quick embrace ever-new technologies and the strong struggle to catch up.

It’s easy to root for the quick, both in Doctorow’s book and in the real world. And while I’m not going to give away Doctorow’s ending — and I don’t know enough to predict how it will play out in the real world — right now, trends favor the strong.

Centralized infrastructure favors traditional power, and the Internet is becoming more centralized. This is true both at the endpoints, where companies like Facebook, Apple, Google, and Amazon control much of how we interact with information. It’s also true in the middle, where companies like Comcast increasingly control how information gets to us. It’s true in countries like Russia and China that increasingly legislate their own national agenda onto their pieces of the Internet. And it’s even true in countries like the US and the UK, that increasingly legislate more government surveillance capabilities.

At the 1996 World Economic Forum, cyber-libertarian John Perry Barlow issued his “Declaration of the Independence of Cyberspace,” telling the assembled world leaders and titans of Industry: “You have no moral right to rule us, nor do you possess any methods of enforcement that we have true reason to fear.” Many of us believed him a scant 20 years ago, but today those words ring hollow.

But if history is any guide, these things are cyclic. In another 20 years, even newer technologies — both the ones Doctorow focuses on and the ones no one can predict — could easily tip the balance back in favor of the quick. Whether that will result in more of a utopia or a dystopia depends partly on these technologies, but even more on the social changes resulting from these technologies. I’m short-term pessimistic but long-term optimistic.

This essay previously appeared on Crooked Timber.

Congress Removes FCC Privacy Protections on Your Internet Usage

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/congress_remove.html

Think about all of the websites you visit every day. Now imagine if the likes of Time Warner, AT&T, and Verizon collected all of your browsing history and sold it on to the highest bidder. That’s what will probably happen if Congress has its way.

This week, lawmakers voted to allow Internet service providers to violate your privacy for their own profit. Not only have they voted to repeal a rule that protects your privacy, they are also trying to make it illegal for the Federal Communications Commission to enact other rules to protect your privacy online.

That this is not provoking greater outcry illustrates how much we’ve ceded any willingness to shape our technological future to for-profit companies and are allowing them to do it for us.

There are a lot of reasons to be worried about this. Because your Internet service provider controls your connection to the Internet, it is in a position to see everything you do on the Internet. Unlike a search engine or social networking platform or news site, you can’t easily switch to a competitor. And there’s not a lot of competition in the market, either. If you have a choice between two high-speed providers in the US, consider yourself lucky.

What can telecom companies do with this newly granted power to spy on everything you’re doing? Of course they can sell your data to marketers — and the inevitable criminals and foreign governments who also line up to buy it. But they can do more creepy things as well.

They can snoop through your traffic and insert their own ads. They can deploy systems that remove encryption so they can better eavesdrop. They can redirect your searches to other sites. They can install surveillance software on your computers and phones. None of these are hypothetical.

They’re all things Internet service providers have done before, and they are some of the reasons the FCC tried to protect your privacy in the first place. And now they’ll be able to do all of these things in secret, without your knowledge or consent. And, of course, governments worldwide will have access to these powers. And all of that data will be at risk of hacking, either by criminals and other governments.

Telecom companies have argued that other Internet players already have these creepy powers — although they didn’t use the word “creepy” — so why should they not have them as well? It’s a valid point.

Surveillance is already the business model of the Internet, and literally hundreds of companies spy on your Internet activity against your interests and for their own profit.

Your e-mail provider already knows everything you write to your family, friends, and colleagues. Google already knows our hopes, fears, and interests, because that’s what we search for.

Your cellular provider already tracks your physical location at all times: it knows where you live, where you work, when you go to sleep at night, when you wake up in the morning, and — because everyone has a smartphone — who you spend time with and who you sleep with.

And some of the things these companies do with that power is no less creepy. Facebook has run experiments in manipulating your mood by changing what you see on your news feed. Uber used its ride data to identify one-night stands. Even Sony once installed spyware on customers’ computers to try and detect if they copied music files.

Aside from spying for profit, companies can spy for other purposes. Uber has already considered using data it collects to intimidate a journalist. Imagine what an Internet service provider can do with the data it collects: against politicians, against the media, against rivals.

Of course the telecom companies want a piece of the surveillance capitalism pie. Despite dwindling revenues, increasing use of ad blockers, and increases in clickfraud, violating our privacy is still a profitable business — especially if it’s done in secret.

The bigger question is: why do we allow for-profit corporations to create our technological future in ways that are optimized for their profits and anathema to our own interests?

When markets work well, different companies compete on price and features, and society collectively rewards better products by purchasing them. This mechanism fails if there is no competition, or if rival companies choose not to compete on a particular feature. It fails when customers are unable to switch to competitors. And it fails when what companies do remains secret.

Unlike service providers like Google and Facebook, telecom companies are infrastructure that requires government involvement and regulation. The practical impossibility of consumers learning the extent of surveillance by their Internet service providers, combined with the difficulty of switching them, means that the decision about whether to be spied on should be with the consumer and not a telecom giant. That this new bill reverses that is both wrong and harmful.

Today, technology is changing the fabric of our society faster than at any other time in history. We have big questions that we need to tackle: not just privacy, but questions of freedom, fairness, and liberty. Algorithms are making decisions about policing, healthcare.

Driverless vehicles are making decisions about traffic and safety. Warfare is increasingly being fought remotely and autonomously. Censorship is on the rise globally. Propaganda is being promulgated more efficiently than ever. These problems won’t go away. If anything, the Internet of things and the computerization of every aspect of our lives will make it worse.

In today’s political climate, it seems impossible that Congress would legislate these things to our benefit. Right now, regulatory agencies such as the FTC and FCC are our best hope to protect our privacy and security against rampant corporate power. That Congress has decided to reduce that power leaves us at enormous risk.

It’s too late to do anything about this bill — Trump will certainly sign it — but we need to be alert to future bills that reduce our privacy and security.

This post previously appeared on the Guardian.

EDITED TO ADD: Former FCC Commissioner Tom Wheeler wrote a good op-ed on the subject. And here’s an essay laying out what this all means to the average Internet user.

A Survey of Propaganda

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/a_survey_of_pro.html

This is an excellent survey article on modern propaganda techniques, how they work, and how we might defend ourselves against them.

Cory Doctorow summarizes the techniques on BoingBoing:

…in Russia, it’s about flooding the channel with a mix of lies and truth, crowding out other stories; in China, it’s about suffocating arguments with happy-talk distractions, and for trolls like Milo Yiannopoulos, it’s weaponizing hate, outraging people so they spread your message to the small, diffused minority of broken people who welcome your message and would otherwise be uneconomical to reach.

As to defense: “Debunking doesn’t work: provide an alternative narrative.”

Research into the Root Causes of Terrorism

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/research_into_t_1.html

Interesting article in Science discussing field research on how people are radicalized to become terrorists.

The potential for research that can overcome existing constraints can be seen in recent advances in understanding violent extremism and, partly, in interdiction and prevention. Most notable is waning interest in simplistic root-cause explanations of why individuals become violent extremists (e.g., poverty, lack of education, marginalization, foreign occupation, and religious fervor), which cannot accommodate the richness and diversity of situations that breed terrorism or support meaningful interventions. A more tractable line of inquiry is how people actually become involved in terror networks (e.g., how they radicalize and are recruited, move to action, or come to abandon cause and comrades).

Reports from the The Soufan Group, International Center for the Study of Radicalisation (King’s College London), and the Combating Terrorism Center (U.S. Military Academy) indicate that approximately three-fourths of those who join the Islamic State or al-Qaeda do so in groups. These groups often involve preexisting social networks and typically cluster in particular towns and neighborhoods.. This suggests that much recruitment does not need direct personal appeals by organization agents or individual exposure to social media (which would entail a more dispersed recruitment pattern). Fieldwork is needed to identify the specific conditions under which these processes play out. Natural growth models of terrorist networks then might be based on an epidemiology of radical ideas in host social networks rather than built in the abstract then fitted to data and would allow for a public health, rather than strictly criminal, approach to violent extremism.

Such considerations have implications for countering terrorist recruitment. The present USG focus is on “counternarratives,” intended as alternative to the “ideologies” held to motivate terrorists. This strategy treats ideas as disembodied from the human conditions in which they are embedded and given life as animators of social groups. In their stead, research and policy might better focus on personalized “counterengagement,” addressing and harnessing the fellowship, passion, and purpose of people within specific social contexts, as ISIS and al-Qaeda often do. This focus stands in sharp contrast to reliance on negative mass messaging and sting operations to dissuade young people in doubt through entrapment and punishment (the most common practice used in U.S. law enforcement) rather than through positive persuasion and channeling into productive life paths. At the very least, we need field research in communities that is capable of capturing evidence to reveal which strategies are working, failing, or backfiring.

1984 is the new Bible in the age of Trump

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/02/1984-is-new-bible.html

In the age of Trump, Orwell’s book 1984 is becoming the new Bible: a religious text which few read, but which many claim supports their beliefs. A good demonstration is this CNN op-ed, in which the author describes Trump as being Orwellian, but mostly just because Trump is a Republican.

Trump’s populist attacks against our (classically) liberal world order is indeed cause for concern. His assault on the truth is indeed a bit Orwellian. But it’s op-eds like this one at CNN that are part of the problem.
While the author of the op-ed spends much time talking about his dogs (“Winston”, “Julia”), and how much he hates Trump, he spends little time on the core thesis “Orwellianism”. When he does, it’s mostly about old political disagreements. For example, the op-ed calls Trump’s cabinet appointees Orwellian simply because they are Republicans:

He has provided us with Betsy DeVos, a secretary of education nominee who is widely believed to oppose public education, and who promotes the truly Orwellian-sounding concept of “school choice,” a plan that seems well-intentioned but which critics complain actually siphons much-needed funds from public to private education institutions.

Calling school-choice “Orwellian” is absurd. Republicans want to privatize more, and the Democrats want the state to run more of the economy. It’s the same disagreement that divides the two parties on almost any policy issue. When you call every little political disagreement “Orwellian” then you devalue the idea. I’m Republican, so of course I’d argue that the it’s the state-run education system giving parents zero choice that is the thing that’s Orwellian here. And now we bicker, both convinced that Orwell is on our side in this debate. #WhatWouldOrwellDo
If something is “Orwellian”, then you need to do a better job demonstrating this, making the analogy clear. For example, last year I showed how in response to a political disagreement, that Wikipedia and old newspaper articles were edited in order to conform to the new political reality. This is a clear example of Winston Smith’s job of changing the past in order to match the present.
But even such clear documentation is probably powerless to change anybody’s mind. Whether “changing the text of old newspaper articles to fit modern politics” is Orwellian depends entirely on your politics, whether the changes agree with your views. Go follow the link [*] and see for yourself and see if you agree with the change (replacing the word “refugee” in old articles with “asylee” instead).
It’s this that Orwell was describing. Doublethink wasn’t something forced onto us by a totalitarian government so much as something we willingly adopted ourselves. The target of Orwell’s criticism wasn’t them, the totalitarian government, but us, the people who willingly went along with it. Doublethink is what people in both parties (Democrats and Republicans) do equally, regardless of the who resides in the White House.
Trump is an alt-Putin. He certainly wants to become a totalitarian. But at this point, his lies are juvenile and transparent, which even his supporters find difficult believing [*]. The most Orwellian thing about him is what he inherits from Obama [*]: the two Party system, perpetual war, omnipresent surveillance, the propaganda system, and our nascent cyber-police-state [*].
Conclusion

Yes, people should read 1984 in the age of Trump, not because he’s created the Orwellian system, but because he’s trying to exploit the system that’s already there. If you believe he’s Orwellian because he’s Republican, as the foolish author of that CNN op-ed believes, then you’ve missed the point of Orwell’s novel completely.

Bonus: Doing a point-by-point rebuttal gets boring, and makes the post long, but ought to be done out of a sense of completeness. The following paragraph contains the most “Orwell” points, but it’s all essentially nonsense:

We are living in this state of flux in real life. Russia was and likely is our nation’s fiercest rival, yet as a candidate, President Trump famously stated, “Russia, if you’re listening, I hope you’re able to find the 30,000 [Clinton] emails that are missing.” He praises Putin but states that perhaps he may not actually like him when they meet. WikiLeaks published DNC data alleged to have been obtained by Russian operatives, but the election was not “rigged.” A recount would be “ridiculous,” yet voter fraud was rampant. Trusted sources of information are “fake news,” and somehow Chelsea Manning, WikiLeaks’ most notable whistleblower, is now an “ungrateful traitor.”

Trump’s asking Russia to find the missing emails was clearly a joke. Trump’s speech is marked by exaggeration and jokes like this. That Trump’s rivals insist his jokes be taken seriously is the problem here, more than what he’s joking about.

The correct Orwellian analogy to draw here is is the Eurasia (Russia) and Eastasia (China) parallels. Under Obama, China was a close trading partner while Russia was sanctioned for invading the Ukraine. Under Trump, it’s China who is our top rival while Russia/Putin is more of our friends. What’s Orwellian is how polls [*] of what Republicans think of Russia have gone through a shift, “We’ve always been at war with Eastasia”.

The above paragraph implies Trump said the election wasn’t “rigged”. No, Trump still says the election was rigged, even after he won it. [*] It’s Democrats who’ve flip-flopped on their opinion whether the election was “rigged” after Trump’s win. Trump attacks the election system because that’s what illiberal totalitarians always do, not because it’s Orwellian.

“Recounts” and “fraudulent votes” aren’t the same thing. Somebody registered to vote, and voting, in multiple states is not something that’ll be detected with a “recount” in any one state, for example. Trump’s position on voter fraud is absurd, but it’s not Orwellian.

Instead of these small things, what’s Orwellian is Trump’s grander story of a huge popular “movement” behind him. That’s why his inauguration numbers are important. That’s why losing the popular vote is important. It’s why he keeps using the word “movement” in all his speeches. It’s the big lie he’s telling that makes him Orwellian, not all the small lies.

Trusted sources of news are indeed “fake news”. The mainstream media has problems, whether it’s their tendency to sensationalism, or the way they uncritically repeat government propaganda (“according to senior government officials”) regardless of which Party controls the White House. Indeed, Orwell himself was a huge critic of the press — sometimes what they report is indeed “fake news”, not simply a mistake but something that violates the press’s own standards.

Yes, the President or high-level government officials have no business attacking the press the way Trump does, regardless if they deserve it. Trump indeed had a few legitimate criticism of the press, but his attacks have quickly devolved to attacking the press whenever it’s simply Truth disagreeing with Trump’s lies. It’s all attacks against the independent press that are the problem, not the label “fake news”.

As Wikipedia documents, “the term “traitor” has been used as a political epithet, regardless of any verifiable treasonable action”. Despite being found not guilty of “aiding the enemy”, Chelsea Manning was convicted of espionage. Reasonable people can disagree about Manning’s action — while you may not like the “traitor” epithet, it’s not an Orwellian term.

Instead, what is Orwellian is insisting Manning was a “whistleblower”. Reasonable people disagree with that description. Manning didn’t release specific diplomatic cables demonstrative of official wrongdoing, but the entire dump of all cables going back more than a decade. It’s okay to call Manning a whistleblower (I might describe her as such), but it’s absurd to claim this is some objective truth. For example, the Wikipedia article [*] on Chelsea Manning documents several people calling her a whistleblower, but does not itself use that term to describe Manning. The struggle between objective and subjective “Truth” is a big part of Orwell’s work.

What I’m demonstrating here in this bonus section is the foolishness of that CNN op-ed. He hates Trump, but entirely misunderstands Orwell. He does a poor job pinning down Trump on exactly how he fits the Orwellian mode. He writes like somebody who hasn’t actually read the book at all.

Dear Obama, From Infosec

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/01/dear-obama-from-infosec.html

Dear President Obama:

We are more than willing to believe Russia was responsible for the hacked emails/records that influenced our election. We believe Russian hackers were involved. Even if these hackers weren’t under the direct command of Putin, we know he could put a stop to such hacking if he chose. It’s like harassment of journalists and diplomats. Putin encourages a culture of thuggery that attacks opposition, without his personal direction, but with his tacit approval.

Your lame attempts to convince us of what we already agree with has irretrievably damaged your message.

Instead of communicating with the America people, you worked through your typical system of propaganda, such as stories in the New York Times quoting unnamed “senior government officials”. We don’t want “unnamed” officials — we want named officials (namely you) who we can pin down and question. When you work through this system of official leaks, we believe you have something to hide, that the evidence won’t stand on its own.

We still don’t believe the CIA’s conclusions because we don’t know, precisely, what those conclusions are. Are they derived purely from companies like FireEye and CrowdStrike based on digital forensics? Or do you have spies in Russian hacker communities that give better information? This is such an important issue that it’s worth degrading sources of information in order to tell us, the American public, the truth.

You had the DHS and US-CERT issue the “GRIZZLY-STEPPE”[*] report “attributing those compromises to Russian malicious cyber activity“. It does nothing of the sort. It’s full of garbage. It contains signatures of viruses that are publicly available, used by hackers around the world, not just Russia. It contains a long list of IP addresses from perfectly normal services, like Tor, Google, Dropbox, Yahoo, and so forth.

Yes, hackers use Yahoo for phishing and malvertising. It doesn’t mean every access of Yahoo is an “Indicator of Compromise”.

For example, I checked my web browser [chrome://net-internals/#dns] and found that last year on November 20th, it accessed two IP addresses that are on the Grizzley-Steppe list:

No, this doesn’t mean I’ve been hacked. It means I just had a normal interaction with Yahoo. It means the Grizzley-Steppe IoCs are garbage.

If your intent was to show technical information to experts to confirm Russia’s involvement, you’ve done the precise opposite. Grizzley-Steppe proves such enormous incompetence that we doubt all the technical details you might have. I mean, it’s possible that you classified the important details and de-classified the junk, but even then, that junk isn’t worth publishing. There’s no excuse for those Yahoo addresses to be in there, or the numerous other problems.

Among the consequences is that Washington Post story claiming Russians hacked into the Vermont power grid. What really happened is that somebody just checked their Yahoo email, thereby accessing one of the same IP addresses I did. How they get from the facts (one person accessed Yahoo email) to the story (Russians hacked power grid) is your responsibility. This misinformation is your fault.

You announced sanctions for the Russian hacking [*]. At the same time, you announced sanctions for Russian harassment of diplomatic staff. These two events are confused in the press, with most stories reporting you expelled 35 diplomats for hacking, when that appears not to be the case.

Your list of individuals/organizations is confusing. It makes sense to name the GRU, FSB, and their officers. But why name “ZorSecurity” but not sole proprietor “Alisa Esage Shevchenko”? It seems a minor target, and you give no information why it was selected. Conversely, you ignore the APT28/APT29 Dukes/CozyBear groups that feature so prominently in your official leaks. You also throw in a couple extra hackers, for finance hacks rather than election hacks. Again, this causes confusion in the press about exactly who you are sanctioning and why. It seems as slipshod as the DHS/US-CERT report.

Mr President, you’ve got two weeks left in office. Russia’s involvement is a huge issue, especially given President-Elect Trump’s pro-Russia stance. If you’ve got better information than this, I beg you to release it. As it stands now, all you’ve done is support Trump’s narrative, making this look like propaganda — and bad propaganda at that. Give us, the infosec/cybersec community, technical details we can look at, analyze, and confirm.

Regards,
Infosec

The Future of Faking Audio and Video

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/12/the_future_of_f.html

This Verge article isn’t great, but we are certainly moving into a future where audio and video will be easy to fake, and easier to fake undetectably. This is going to make propaganda easier, with all of the ill effects we’ve already seen turned up to eleven.

I don’t have a good solution for this.

"From Putin with Love" – a novel by the New York Times

Post Syndicated from Robert Graham original http://blog.erratasec.com/2016/12/from-putin-with-love-novel-by-new-york.html

In recent weeks, the New York Times has written many stories on Russia’s hacking of the Trump election. This front page piece [*] alone takes up 9,000 words. Combined, the NYTimes coverage on this topic exceeds the length of a novel. Yet, for all this text, the number of verifiable facts also equals that of a novel, namely zero. There’s no evidence this was anything other than an undirected, Anonymous-style op based on a phishing campaign.

The question that drives us

It’s not that Russia isn’t involved, it’s that the exact nature of their involvement is complicated. Just because the hackers live in Russia doesn’t automatically mean their attacks are directed by the government.

It’s like the recent Islamic terrorist attacks in Europe and America. Despite ISIS claiming credit, and the perpetrators crediting ISIS, we are loathe to actually blame the attacks directly on ISIS. Overwhelmingly, it’s individuals who finance and plan their attacks, with no ISIS organizational involvement other than inspiration.

The same goes for Russian hacks. The Russian hacker community is complicated. There are lots of actors with various affiliations with the government. They are almost always nationalistic, almost always pro-Putin. There are many individuals and groups who act to the benefit of Putin/Russia with no direct affiliation with the government. Others do have ties with the government, but these are often informal relationships, sustained by patronage and corruption.

Evidence tying Russian attacks to the Russian government is thus the most important question of all — and it’s one that the New York Times is failing to answer. The fewer facts they have, the more they fill the void with vast amounts of verbiage.

Sustaining the narrative

Here’s a trick when reading New York Times articles: when they switch to passive voice, they are covering up a lie. An example is this paragraph from the above story [*]:

The Russians were also quicker to turn their attacks to political purposes. A 2007 cyberattack on Estonia, a former Soviet republic that had joined NATO, sent a message that Russia could paralyze the country without invading it. The next year cyberattacks were used during Russia’s war with Georgia.

Normally, editors would switch this to the active voice, or:

The next year, Russia used cyberattacks in their war against Georgia.

But that would be factually wrong. Yes, cyberattacks happened during the conflicts with Estonia and Georgia, but the evidence in both cases points to targets and tools going viral on social media and web forums. It was the people who conducted the attacks, not the government. Whether it was the government who encouraged the people is the big question — to which we have no answer. Since the NYTimes has no evidence pointing to the Russian government, they switch to the passive voice, hoping you’ll assume they meant the government was to blame.

It’s a clear demonstration that the NYTimes is pushing a narrative, rather than reporting just the facts allowing you to decide for yourself.

Tropes and cliches

The NYTimes story is dominated by cliches or “tropes”.

One such trope is how hackers are always “sophisticated”, which leads to the conclusion they must be state-sponsored, not simple like the Anonymous collective. Amusingly, the New York Times tries to give two conflicting “sophisticated” narratives at once. Their article [*] has a section titled “Honing Stealthy Tactics”, which ends with describing the attacks as “brazen”, full of “boldness”. In other words, sophisticated Russian hackers are marked by “brazen stealthiness”, a contradiction in terms. In reality, the DNC/DCCC/Podesta attacks were no more sophisticated than any other Anonymous attack, such as the one against Stratfor.

A related trope is the sophistication of defense. For example, the NYTimes describes [*] how the DNC is a non-profit that could not afford “the most advanced systems in place” to stop phishing emails. After the hacks, they installed the “robust set of monitoring tools”. This trope imagines there’s a magic pill that victims can use to defend themselves against hackers. Experts know this isn’t how cybersecurity works — the amount of money spent, or the advancement of technology, has little impact on an organization’s ability to defend itself.

Another trope is the word “target” that imagines that every effect from a hacker was the original intention. In other words, it’s the trope that tornados target trailer parks. As part of the NYTimes “narrative” is this story that “House candidates were also targets of Russian hacking” [*]. This is post-factual fake-news. Guccifer2.0 targeted the DCCC, not individual House candidates. Sure, at the request of some bloggers, Guccifer2.0 release part of their treasure trove for some specific races, but the key here is the information withheld, not the information released. Guccifer2.0 made bloggers beg for it, dribbling out bits at a time, keeping themselves in the news, wrapped in an aura of mysteriousness. If their aim was to influence House races, they’d’ve dumped info on all the races.

In other words, the behavior is that of an Anonymous-style hacker which the NYTimes twists into behavior of Russian intelligence.

The word “trope” is normally applied to fiction. When the NYTimes devolves into hacking tropes, like the “targets” of “sophisticated” hackers, you know their news story is fiction, too.

Anonymous government officials

In the end, the foundation of the NYTimes narrative relies upon leaked secret government documents and quotes by anonymous government officials [*]. This is otherwise known as “propaganda”.

The senior government officials are probably the Democrat senators who were briefed by the CIA. These senators leak their version of the CIA briefing, cherry picking the bits that support their story, removing the nuanced claims that were undoubtedly part of the original document.

It’s what the Society of Professional Journalists call the “Washington Game“. Everyone knows how this game is played. That’s why Marcy Wheeler (@emptywheel) [*] and Glenn Greenwald (@ggreenwald) [*] dissected that NYTimes piece. They are both as anti-Trump/anti-Russia as they come, so it’s not their political biases that lead them to challenge that piece. Instead, it’s their knowledge of what bad journalism looks like that motivated their criticisms.

If the above leaks weren’t authorized by Obama, the administration would be announcing an investigation into who is leaking major secrets. Thus, we know the leaks were “authorized”. Obama’s willingness to release the information unofficially, but not officially, means there are holes in it somewhere. There’s something he’s hiding, covering up. Otherwise, he’d have a press conference and field questions from reporters on the topic.

Conclusion

The issue of Russia’s involvement in the election is so important that we should demand real facts, real statements from the government that we can question and challenge. It’s too important to leave up to propaganda. If Putin is involved, we deserve to understand it, and not simply get the “made for TV” version given us by the NYTimes.

Propaganda is what we have here. The NYTimes has written a novel that delivers the message while protecting the government from being questioned. Facts are replaced with distorted narrative, worn tropes, and quotes from anonymous government officials.

The facts we actually see is an attack no more sophisticated than those conducted by LulzSec and Anonymous. We see an attack that is disorganized and opportunistic, exactly what we’d expect from an Anonymous-style attack. Putin’s regime may be involved, and they may have a plan, but the current evidence looks like casual hackers, not professional hackers working for an intelligence service.

This artsy stock photo of FSB headquarters is not evidence.

Note: many ideas in this piece come from a discussion with a friend who doesn’t care to be credited

Auditing Elections for Signs of Hacking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/12/auditing_electi.html

Excellent essay pointing out that election security is a national security issue, and that we need to perform random ballot audits on every future election:

The good news is that we know how to solve this problem. We need to audit computers by manually examining randomly selected paper ballots and comparing the results to machine results. Audits require a voter-verified paper ballot, which the voter inspects to confirm that his or her selections have been correctly and indelibly recorded. Since 2003, an active community of academics, lawyers, election officials and activists has urged states to adopt paper ballots and robust audit procedures. This campaign has had significant, but slow, success. As of now, about three quarters of U.S. voters vote on paper ballots. Twenty-six states do some type of manual audit, but none of their procedures are adequate. Auditing methods have recently been devised that are much more efficient than those used in any state. It is important that audits be performed on every contest in every election, so that citizens do not have to request manual recounts to feel confident about election results. With high-quality audits, it is very unlikely that election fraud will go undetected whether perpetrated by another country or a political party.

Another essay along similar lines.

Related: there is some information about Russian political hacking this election cycle that is classified. My guess is that it has nothing to do with hacking the voting machines — the NSA was on high alert for anything, and I have it on good authority that they found nothing — but something related to either the political-organization hacking, the propaganda machines, or something else before Election Day.

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.