Tag Archives: sabotage

Glitter Bomb against Package Thieves

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/12/glitter_bomb_ag.html

Stealing packages from unattended porches is a rapidly rising crime, as more of us order more things by mail. One person hid a glitter bomb and a video recorder in a package, posting the results when thieves opened the box. At least, that’s what might have happened. At least some of the video was faked, which puts the whole thing into question.

That’s okay, though. Santa is faked, too. Happy whatever you’re celebrating.

Fraudulent Tactics on Amazon Marketplace

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/12/fraudulent_tact.html

Fascinating article about the many ways Amazon Marketplace sellers sabotage each other and defraud customers. The opening example: framing a seller for false advertising by buying fake five-star reviews for their products.

Defacement: Sellers armed with the accounts of Amazon distributors (sometimes legitimately, sometimes through the black market) can make all manner of changes to a rival’s listings, from changing images to altering text to reclassifying a product into an irrelevant category, like “sex toys.”

Phony fires: Sellers will buy their rival’s product, light it on fire, and post a picture to the reviews, claiming it exploded. Amazon is quick to suspend sellers for safety claims.

[…]

Over the following days, Harris came to realize that someone had been targeting him for almost a year, preparing an intricate trap. While he had trademarked his watch and registered his brand, Dead End Survival, with Amazon, Harris hadn’t trademarked the name of his Amazon seller account, SharpSurvival. So the interloper did just that, submitting to the patent office as evidence that he owned the goods a photo taken from Harris’ Amazon listings, including one of Harris’ own hands lighting a fire using the clasp of his survival watch. The hijacker then took that trademark to Amazon and registered it, giving him the power to kick Harris off his own listings and commandeer his name.

[…]

There are more subtle methods of sabotage as well. Sellers will sometimes buy Google ads for their competitors for unrelated products — say, a dog food ad linking to a shampoo listing — so that Amazon’s algorithm sees the rate of clicks converting to sales drop and automatically demotes their product.

What’s also interesting is how Amazon is basically its own government — with its own rules that its suppliers have no choice but to follow. And, of course, increasingly there is no option but to sell your stuff on Amazon.

The Effects of the Spectre and Meltdown Vulnerabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/the_effects_of_3.html

On January 3, the world learned about a series of major security vulnerabilities in modern microprocessors. Called Spectre and Meltdown, these vulnerabilities were discovered by several different researchers last summer, disclosed to the microprocessors’ manufacturers, and patched­ — at least to the extent possible.

This news isn’t really any different from the usual endless stream of security vulnerabilities and patches, but it’s also a harbinger of the sorts of security problems we’re going to be seeing in the coming years. These are vulnerabilities in computer hardware, not software. They affect virtually all high-end microprocessors produced in the last 20 years. Patching them requires large-scale coordination across the industry, and in some cases drastically affects the performance of the computers. And sometimes patching isn’t possible; the vulnerability will remain until the computer is discarded.

Spectre and Meltdown aren’t anomalies. They represent a new area to look for vulnerabilities and a new avenue of attack. They’re the future of security­ — and it doesn’t look good for the defenders.

Modern computers do lots of things at the same time. Your computer and your phone simultaneously run several applications — ­or apps. Your browser has several windows open. A cloud computer runs applications for many different computers. All of those applications need to be isolated from each other. For security, one application isn’t supposed to be able to peek at what another one is doing, except in very controlled circumstances. Otherwise, a malicious advertisement on a website you’re visiting could eavesdrop on your banking details, or the cloud service purchased by some foreign intelligence organization could eavesdrop on every other cloud customer, and so on. The companies that write browsers, operating systems, and cloud infrastructure spend a lot of time making sure this isolation works.

Both Spectre and Meltdown break that isolation, deep down at the microprocessor level, by exploiting performance optimizations that have been implemented for the past decade or so. Basically, microprocessors have become so fast that they spend a lot of time waiting for data to move in and out of memory. To increase performance, these processors guess what data they’re going to receive and execute instructions based on that. If the guess turns out to be correct, it’s a performance win. If it’s wrong, the microprocessors throw away what they’ve done without losing any time. This feature is called speculative execution.

Spectre and Meltdown attack speculative execution in different ways. Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it. Spectre is worse; it’s a flaw in the very concept of speculative execution. There’s no way to patch that vulnerability; the chips need to be redesigned in such a way as to eliminate it.

Since the announcement, manufacturers have been rolling out patches to these vulnerabilities to the extent possible. Operating systems have been patched so that attackers can’t make use of the vulnerabilities. Web browsers have been patched. Chips have been patched. From the user’s perspective, these are routine fixes. But several aspects of these vulnerabilities illustrate the sorts of security problems we’re only going to be seeing more of.

First, attacks against hardware, as opposed to software, will become more common. Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate. Looking for vulnerabilities on computer chips is new. Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.

Second, because microprocessors are fundamental parts of computers, patching requires coordination between many companies. Even when manufacturers like Intel and AMD can write a patch for a vulnerability, computer makers and application vendors still have to customize and push the patch out to the users. This makes it much harder to keep vulnerabilities secret while patches are being written. Spectre and Meltdown were announced prematurely because details were leaking and rumors were swirling. Situations like this give malicious actors more opportunity to attack systems before they’re guarded.

Third, these vulnerabilities will affect computers’ functionality. In some cases, the patches for Spectre and Meltdown result in significant reductions in speed. The press initially reported 30%, but that only seems true for certain servers running in the cloud. For your personal computer or phone, the performance hit from the patch is minimal. But as more vulnerabilities are discovered in hardware, patches will affect performance in noticeable ways.

And then there are the unpatchable vulnerabilities. For decades, the computer industry has kept things secure by finding vulnerabilities in fielded products and quickly patching them. Now there are cases where that doesn’t work. Sometimes it’s because computers are in cheap products that don’t have a patch mechanism, like many of the DVRs and webcams that are vulnerable to the Mirai (and other) botnets — ­groups of Internet-connected devices sabotaged for coordinated digital attacks. Sometimes it’s because a computer chip’s functionality is so core to a computer’s design that patching it effectively means turning the computer off. This, too, is becoming more common.

Increasingly, everything is a computer: not just your laptop and phone, but your car, your appliances, your medical devices, and global infrastructure. These computers are and always will be vulnerable, but Spectre and Meltdown represent a new class of vulnerability. Unpatchable vulnerabilities in the deepest recesses of the world’s computer hardware is the new normal. It’s going to leave us all much more vulnerable in the future.

This essay previously appeared on TheAtlantic.com.

The Fantasyland Code of Professionalism is an abuser’s fantasy

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/46791.html

The Fantasyland Institute of Learning is the organisation behind Lambdaconf, a functional programming conference perhaps best known for standing behind a racist they had invited as a speaker. The fallout of that has resulted in them trying to band together events in order to reduce disruption caused by sponsors or speakers declining to be associated with conferences that think inviting racists is more important than the comfort of non-racists, which is weird in all sorts of ways but not what I’m talking about here because they’ve also written a “Code of Professionalism” which is like a Code of Conduct except it protects abusers rather than minorities and no really it is genuinely as bad as it sounds.

The first thing you need to know is that the document uses its own jargon. Important here are the concepts of active and inactive participation – active participation is anything that you do within the community covered by a specific instance of the Code, inactive participation is anything that happens anywhere ever (ie, active participation is a subset of inactive participation). The restrictions based around active participation are broadly those that you’d expect in a very weak code of conduct – it’s basically “Don’t be mean”, but with some quirks. The most significant is that there’s a “Don’t moralise” provision, which as written means saying “I think people who support slavery are bad” in a community setting is a violation of the code, but the description of discrimination means saying “I volunteer to mentor anybody from a minority background” could also result in any community member not from a minority background complaining that you’ve discriminated against them. It’s just not very good.

Inactive participation is where things go badly wrong. If you engage in community or professional sabotage, or if you shame a member based on their behaviour inside the community, that’s a violation. Community sabotage isn’t defined and so basically allows a community to throw out whoever they want to. Professional sabotage means doing anything that can hurt a member’s professional career. Shaming is saying anything negative about a member to a non-member if that information was obtained from within the community.

So, what does that mean? Here are some things that you are forbidden from doing:

  • If a member says something racist at a conference, you are not permitted to tell anyone who is not a community member that this happened (shaming)
  • If a member tries to assault you, you are not allowed to tell the police (shaming)
  • If a member gives a horribly racist speech at another conference, you are not allowed to suggest that they shouldn’t be allowed to speak at your event (professional sabotage)
  • If a member of your community reports a violation and no action is taken, you are not allowed to warn other people outside the community that this is considered acceptable behaviour (community sabotage)

Now, clearly, some of these are unintentional – I don’t think the authors of this policy would want to defend the idea that you can’t report something to the police, and I’m sure they’d be willing to modify the document to permit this. But it’s indicative of the mindset behind it. This policy has been written to protect people who are accused of doing something bad, not to protect people who have something bad done to them.

There are other examples of this. For instance, violations are not publicised unless the verdict is that they deserve banishment. If a member harasses another member but is merely given a warning, the victim is still not permitted to tell anyone else that this happened. The perpetrator is then free to repeat their behaviour in other communities, and the victim has to choose between either staying silent or warning them and risk being banished from the community for shaming.

If you’re an abuser then this is perfect. You’re in a position where your victims have to choose between their career (which will be harmed if they’re unable to function in the community) and preventing the same thing from happening to others. Many will choose the former, which gives you far more freedom to continue abusing others. Which means that communities adopting the Fantasyland code will be more attractive to abusers, and become disproportionately populated by them.

I don’t believe this is the intent, but it’s an inevitable consequence of the priorities inherent in this code. No matter how many corner cases are cleaned up, if a code prevents you from saying bad things about people or communities it prevents people from being able to make informed choices about whether that community and its members are people they wish to associate with. When there are greater consequences to saying someone’s racist than them being racist, you’re fucking up badly.

comment count unavailable comments

The Fantasyland Code of Professionalism is an abuser’s fantasy

Post Syndicated from Matthew Garrett original http://mjg59.dreamwidth.org/46791.html

The Fantasyland Institute of Learning is the organisation behind Lambdaconf, a functional programming conference perhaps best known for standing behind a racist they had invited as a speaker. The fallout of that has resulted in them trying to band together events in order to reduce disruption caused by sponsors or speakers declining to be associated with conferences that think inviting racists is more important than the comfort of non-racists, which is weird in all sorts of ways but not what I’m talking about here because they’ve also written a “Code of Professionalism” which is like a Code of Conduct except it protects abusers rather than minorities and no really it is genuinely as bad as it sounds.

The first thing you need to know is that the document uses its own jargon. Important here are the concepts of active and inactive participation – active participation is anything that you do within the community covered by a specific instance of the Code, inactive participation is anything that happens anywhere ever (ie, active participation is a subset of inactive participation). The restrictions based around active participation are broadly those that you’d expect in a very weak code of conduct – it’s basically “Don’t be mean”, but with some quirks. The most significant is that there’s a “Don’t moralise” provision, which as written means saying “I think people who support slavery are bad” in a community setting is a violation of the code, but the description of discrimination means saying “I volunteer to mentor anybody from a minority background” could also result in any community member not from a minority background complaining that you’ve discriminated against them. It’s just not very good.

Inactive participation is where things go badly wrong. If you engage in community or professional sabotage, or if you shame a member based on their behaviour inside the community, that’s a violation. Community sabotage isn’t defined and so basically allows a community to throw out whoever they want to. Professional sabotage means doing anything that can hurt a member’s professional career. Shaming is saying anything negative about a member to a non-member if that information was obtained from within the community.

So, what does that mean? Here are some things that you are forbidden from doing:

  • If a member says something racist at a conference, you are not permitted to tell anyone who is not a community member that this happened (shaming)
  • If a member tries to assault you, you are not allowed to tell the police (shaming)
  • If a member gives a horribly racist speech at another conference, you are not allowed to suggest that they shouldn’t be allowed to speak at your event (professional sabotage)
  • If a member of your community reports a violation and no action is taken, you are not allowed to warn other people outside the community that this is considered acceptable behaviour (community sabotage)

Now, clearly, some of these are unintentional – I don’t think the authors of this policy would want to defend the idea that you can’t report something to the police, and I’m sure they’d be willing to modify the document to permit this. But it’s indicative of the mindset behind it. This policy has been written to protect people who are accused of doing something bad, not to protect people who have something bad done to them.

There are other examples of this. For instance, violations are not publicised unless the verdict is that they deserve banishment. If a member harasses another member but is merely given a warning, the victim is still not permitted to tell anyone else that this happened. The perpetrator is then free to repeat their behaviour in other communities, and the victim has to choose between either staying silent or warning them and risk being banished from the community for shaming.

If you’re an abuser then this is perfect. You’re in a position where your victims have to choose between their career (which will be harmed if they’re unable to function in the community) and preventing the same thing from happening to others. Many will choose the former, which gives you far more freedom to continue abusing others. Which means that communities adopting the Fantasyland code will be more attractive to abusers, and become disproportionately populated by them.

I don’t believe this is the intent, but it’s an inevitable consequence of the priorities inherent in this code. No matter how many corner cases are cleaned up, if a code prevents you from saying bad things about people or communities it prevents people from being able to make informed choices about whether that community and its members are people they wish to associate with. When there are greater consequences to saying someone’s racist than them being racist, you’re fucking up badly.

comment count unavailable comments

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.

Python FAQ: Why should I use Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/

Part of my Python FAQ, which is doomed to never be finished.

The short answer is: because it’s the actively-developed version of the language, and you should use it for the same reason you’d use 2.7 instead of 2.6.

If you’re here, I’m guessing that’s not enough. You need something to sweeten the deal. Well, friend, I have got a whole mess of sugar cubes just for you.

And once you’re convinced, you may enjoy the companion article, how to port to Python 3! It also has some more details on the diffences between Python 2 and 3, whereas this article doesn’t focus too much on the features removed in Python 3.

Some background

If you aren’t neck-deep in Python, you might be wondering what the fuss is all about, or why people keep telling you that Python 3 will set your computer on fire. (It won’t.)

Python 2 is a good language, but it comes with some considerable baggage. It has two integer types; it may or may not be built in a way that completely mangles 16/17 of the Unicode space; it has a confusing mix of lazy and eager functional tools; it has a standard library that takes “batteries included” to lengths beyond your wildest imagination; it boasts strong typing, then casually insists that None < 3 < "2"; overall, it’s just full of little dark corners containing weird throwbacks to the days of Python 1.

(If you’re really interested, Nick Coghlan has written an exhaustive treatment of the slightly different question of why Python 3 was created. This post is about why Python 3 is great, so let’s focus on that.)

Fixing these things could break existing code, whereas virtually all code written for 2.0 will still work on 2.7. So Python decided to fix them all at once, producing a not-quite-compatible new version of the language, Python 3.

Nothing like this has really happened with a mainstream programming language before, and it’s been a bit of a bumpy ride since then. Python 3 was (seemingly) designed with the assumption that everyone would just port to Python 3, drop Python 2, and that would be that. Instead, it’s turned out that most libraries want to continue to run on both Python 2 and Python 3, which was considerably difficult to make work at first. Python 2.5 was still in common use at the time, too, and it had none of the helpful backports that showed up in Python 2.6 and 2.7; likewise, Python 3.0 didn’t support u'' strings. Writing code that works on both 2.5 and 3.0 was thus a ridiculous headache.

The porting effort also had a dependency problem: if your library or app depends on library A, which depends on library B, which depends on C, which depends on D… then none of those projects can even think about porting until D’s porting effort is finished. Early days were very slow going.

Now, though, things are looking brighter. Most popular libraries work with Python 3, and those that don’t are working on it. Python 3’s Unicode handling, one of its most contentious changes, has had many of its wrinkles ironed out. Python 2.7 consists largely of backported Python 3 features, making it much simpler to target 2 and 3 with the same code — and both 2.5 and 2.6 are no longer supported.

Don’t get me wrong, Python 2 will still be around for a while. A lot of large applications have been written for Python 2 — think websites like Yelp, YouTube, Reddit, Dropbox — and porting them will take some considerable effort. I happen to know that at least one of those websites was still running 2.6 last year, years after 2.6 had been discontinued, if that tells you anything about the speed of upgrades for big lumbering software.

But if you’re just getting started in Python, or looking to start a new project, there aren’t many reasons not to use Python 3. There are still some, yes — but unless you have one specifically in mind, they probably won’t affect you.

I keep having Python beginners tell me that all they know about Python 3 is that some tutorial tried to ward them away from it for vague reasons. (Which is ridiculous, since especially for beginners, Python 2 and 3 are fundamentally not that different.) Even the #python IRC channel has a few people who react, ah, somewhat passive-aggressively towards mentions of Python 3. Most of the technical hurdles have long since been cleared; it seems like one of the biggest roadblocks now standing in the way of Python 3 adoption is the community’s desire to sabotage itself.

I think that’s a huge shame. Not many people seem to want to stand up for Python 3, either.

Well, here I am, standing up for Python 3. I write all my new code in Python 3 now — because Python 3 is great and you should use it. Here’s why.

Hang on, let’s be real for just a moment

None of this is going to 💥blow your mind💥. It’s just a programming language. I mean, the biggest change to Python 2 in the last decade was probably the addition of the with statement, which is nice, but hardly an earth-shattering innovation. The biggest changes in Python 3 are in the same vein: they should smooth out some points of confusion, help avoid common mistakes, and maybe give you a new toy to play with.

Also, if you’re writing a library that needs to stay compatible with Python 2, you won’t actually be able to use any of this stuff. Sorry. In that case, the best reason to port is so application authors can use this stuff, rather than citing your library as the reason they’re trapped on Python 2 forever. (But hey, if you’re starting a brand new library that will blow everyone’s socks off, do feel free to make it Python 3 exclusive.)

Application authors, on the other hand, can go wild.

Unicode by default

Let’s get the obvious thing out of the way.

In Python 2, there are two string types: str is a sequence of bytes (which I would argue makes it not a string), and unicode is a sequence of Unicode codepoints. A literal string in source code is a str, a bytestring. Reading from a file gives you bytestrings. Source code is assumed ASCII by default. It’s an 8-bit world.

If you happen to be an English speaker, it’s very easy to write Python 2 code that seems to work perfectly, but chokes horribly if fed anything outside of ASCII. The right thing involves carefully specifying encodings everywhere and using u'' for virtually all your literal strings, but that’s very tedious and easily forgotten.

Python 3 reshuffles this to put full Unicode support front and center.

Most obviously, the str type is a real text type, similar to Python 2’s unicode. Literal strings are still str, but now that makes them Unicode strings. All of the “structural” strings — names of types, functions, modules, etc. — are likewise Unicode strings. Accordingly, identifiers are allowed to contain any Unicode “letter” characters. repr() no longer escapes printable Unicode characters, though there’s a new ascii() (and corresponding !a format cast and %a placeholder) that does. Unicode completely pervades the language, for better or worse.

And just for the record: this is way better. It is so much better. It is incredibly better. Do you know how much non-ASCII garbage I type? Every single em dash in this damn post was typed by hand, and Python 2 would merrily choke on them.

Source files are now assumed to be UTF-8 by default, so adding an em dash in a comment will no longer break your production website. (I have seen this happen.) You’re still free to specify another encoding explicitly if you want, using a magic comment.

There is no attempted conversion between bytes and text, as in Python 2; b'a' + 'b' is a TypeError. Some modules require you to know what you’re dealing with: zlib.compress only accepts bytes, because zlib is defined in terms of bytes; json.loads only accepts str, because JSON is defined in terms of Unicode codepoints. Calling str() on some bytes will defer to repr, producing something like "b'hello'". (But see -b and -bb below.) Overall it’s pretty obvious when you’ve mixed bytes with text.

Oh, and two huge problem children are fixed: both the csv module and urllib.parse (formerly urlparse) can handle text. If you’ve never tried to make those work, trust me, this is miraculous.

I/O does its best to make everything Unicode. On Unix, this is a little hokey, since the filesystem is explicitly bytes with no defined encoding; Python will trust the various locale environment variables, which on most systems will make everything UTF-8. The default encoding of text-mode file I/O is derived the same way and thus usually UTF-8. (If it’s not what you expect, run locale and see what you get.) Files opened in binary mode, with a 'b', will still read and write bytes.

Python used to come in “narrow” and “wide” builds, where “narrow” builds actually stored Unicode as UTF-16, and this distinction could leak through to user code in subtle ways. On a narrow build, unichr(0x1F4A3) raises ValueError, and the length of u'💣' is 2. Surprise! Maybe your code will work on someone else’s machine, or maybe it won’t. Python 3.3 eliminated narrow builds.

I think those are the major points. For the most part, you should be able to write code as though encodings don’t exist, and the right thing will happen more often. And the wrong thing will immediately explode in your face. It’s good for you.

If you work with binary data a lot, you might be frowning at me at this point; it was a bit of a second-class citizen in Python 3.0. I think things have improved, though: a number of APIs support both bytes and text, the bytes-to-bytes codec issue has largely been resolved, we have bytes.hex() and bytes.fromhex(), bytes and bytearray both support % now, and so on. They’re listening!

Refs: Python 3.0 release notes; myriad mentions all over the documentation

Backported features

Python 3.0 was released shortly after Python 2.6, and a number of features were then backported to Python 2.7. You can use these if you’re only targeting Python 2.7, but if you were stuck with 2.6 for a long time, you might not have noticed them.

  • Set literals:

    1
    {1, 2, 3}
    
  • Dict and set comprehensions:

    1
    2
    {word.lower() for word in words}
    {value: key for (key, value) in dict_to_invert.items()}
    
  • Multi-with:

    1
    2
    with open("foo") as f1, open("bar") as f2:
        ...
    
  • print is now a function, with a couple bells and whistles added: you can change the delimiter with the sep argument, you can change the terminator to whatever you want (including nothing) with the end argument, and you can force a flush with the flush argument. In Python 2.6 and 2.7, you still have to opt into this with from __future__ import print_function.

  • The string representation of a float now uses the shortest decimal number that has the same underlying value — for example, repr(1.1) was '1.1000000000000001' in Python 2.6, but is just '1.1' in Python 2.7 and 3.1+, because both are represented the same way in a 64-bit float.

  • collections.OrderedDict is a dict-like type that remembers the order of its keys.

    Note that you cannot do OrderedDict(a=1, b=2), because the constructor still receives its keyword arguments in a regular dict, losing the order. You have to pass in a sequence of 2-tuples or assign keys one at a time.

  • collections.Counter is a dict-like type for counting a set of things. It has some pretty handy operations that allow it to be used like a multiset.

  • The entire argparse module is a backport from 3.2.

  • str.format learned a , formatting specifier for numbers, which always uses commas and groups of three digits. This is wrong for many countries, and the correct solution involves using the locale module, but it’s useful for quick output of large numbers.

  • re.sub, re.subn, and re.split accept a flags argument. Minor, but, thank fucking God.

Ref: Python 2.7 release notes

Iteration improvements

Everything is lazy

Python 2 has a lot of pairs of functions that do the same thing, except one is eager and one is lazy: range and xrange, map and itertools.imap, dict.keys and dict.iterkeys, and so on.

Python 3.0 eliminated all of the lazy variants and instead made the default versions lazy. Iterating over them works exactly the same way, but no longer creates an intermediate list — for example, range(1000000000) won’t eat all your RAM. If you need to index them or store them for later, you can just wrap them in list(...).

Even better, the dict methods are now “views“. You can keep them around, and they’ll reflect any changes to the underlying dict. They also act like sets, so you can do a.keys() & b.keys() to get the set of keys that exist in both dicts.

Refs: dictionary view docs; Python 3.0 release notes

Unpacking

Unpacking got a huge boost. You could always do stuff like this in Python 2:

1
a, b, c = range(3)  # a = 0, b = 1, c = 2

Python 3.0 introduces:

1
2
a, b, *c = range(5)  # a = 0, b = 1, c = [2, 3, 4]
a, *b, c = range(5)  # a = 0, b = [1, 2, 3], c = 4

Python 3.5 additionally allows use of the * and ** unpacking operators in literals, or multiple times in function calls:

1
2
3
4
5
print(*range(3), *range(3))  # 0 1 2 0 1 2

x = [*range(3), *range(3)]  # x = [0, 1, 2, 0, 1, 2]
y = {*range(3), *range(3)}  # y = {0, 1, 2}  (it's a set, remember!)
z = {**dict1, **dict2}  # finally, syntax for dict merging!

Refs: Python 3.0 release notes; PEP 3132; Python 3.5 release notes; PEP 448

yield from

yield from is an extension of yield. Where yield produces a single value, yield from yields an entire sequence.

1
2
3
4
5
def flatten(*sequences):
    for seq in sequences:
        yield from seq

list(flatten([1, 2], [3, 4]))  # [1, 2, 3, 4]

Of course, for a simple example like that, you could just do some normal yielding in a for loop. The magic of yield from is that it can also take another generator or other lazy iterable, and it’ll effectively pause the current generator until the given one has been exhausted. It also takes care of passing values back into the generator using .send() or .throw().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def foo():
    a = yield 1
    b = yield from bar(a)
    print("foo got back", b)
    yield 4

def bar(a):
    print("in bar", a)
    x = yield 2
    y = yield 3
    print("leaving bar")
    return x + y

gen = foo()
val = None
while True:
    try:
        newval = gen.send(val)
    except StopIteration:
        break
    print("yielded", newval)
    val = newval * 10

# yielded 1
# in bar 10
# yielded 2
# yielded 3
# leaving bar
# foo got back 50
# yielded 4

Oh yes, and you can now return a value from a generator. The return value becomes the result of a yield from, or if the caller isn’t using yield from, it’s available as the argument to the StopIteration exception.

A small convenience, perhaps. The real power here isn’t in the use of generators as lazy iterators, but in the use of generators as coroutines.

A coroutine is a function that can “suspend” itself, like yield does, allowing other code to run until the function is resumed. It’s kind of like an alternative to threading, but only one function is actively running at any given time, and that function has to delierately relinquish control (or end) before anything else can run.

Generators could do this already, more or less, but only one stack frame deep. That is, you can yield from a generator to suspend it, but if the generator calls another function, that other function has no way to suspend the generator. This is still useful, but significantly less powerful than the coroutine functionality in e.g. Lua, which lets any function yield anywhere in the call stack.

With yield from, you can create a whole chain of generators that yield from one another, and as soon as the one on the bottom does a regular yield, the entire chain will be suspended.

This laid the groundwork for making the asyncio module possible. I’ll get to that later.

Refs: docs; Python 3.3 release notes; PEP 380

Syntactic sugar

Keyword-only arguments

Python 3.0 introduces “keyword-only” arguments, which must be given by name. As a corollary, you can now accept a list of args and have more arguments afterwards. The full syntax now looks something like this:

1
2
def foo(a, b=None, *args, c=None, d, **kwargs):
    ...

Here, a and d are required, b and c are optional. c and d must be given by name.

1
2
3
4
5
6
7
8
foo(1)                      # TypeError: missing d
foo(1, 2)                   # TypeError: missing d
foo(d=4)                    # TypeError: missing a
foo(1, d=4)                 # a = 1, d = 4
foo(1, 2, d=4)              # a = 1, b = 2, d = 4
foo(1, 2, 3, d=4)           # a = 1, b = 2, args = (3,), d = 4
foo(1, 2, c=3, d=4)         # a = 1, b = 2, c = 3, d = 4
foo(1, b=2, c=3, d=4, e=5)  # a = 1, b = 2, c = 3, d = f, kwargs = {'e': 5}

This is extremely useful for functions with a lot of arguments, functions with boolean arguments, functions that accept *args (or may do so in the future) but also want some options, etc. I use it a lot!

If you want keyword-only arguments, but you don’t want to accept *args, you just leave off the variable name:

1
2
def foo(*, arg=None):
    ...

Refs: Python 3.0 release notes; PEP 3102

Format strings

Python 3.6 (not yet out) will finally bring us string interpolation, more or less, using the str.format() syntax:

1
2
3
a = 0x133
b = 0x352
print(f"The answer is {a + b:04x}.")

It’s pretty much the same as str.format(), except that instead of a position or name, you can give an entire expression. The formatting suffixes with : still work, the special built-in conversions like !r still work, and __format__ is still invoked.

Refs: docs; Python 3.6 release notes; PEP 498

async and friends

Right, so, about coroutines.

Python 3.4 introduced the asyncio module, which offers building blocks for asynchronous I/O (and bringing together the myriad third-party modules that do it already).

The design is based around coroutines, which are really generators using yield from. The idea, as I mentioned above, is that you can create a stack of generators that all suspend at once:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@coroutine
def foo():
    # do some stuff
    yield from bar()
    # do more stuff

@coroutine
def bar():
    # do some stuff
    response = yield from get_url("https://eev.ee/")
    # do more stuff

When this code calls get_url() (not actually a real function, but see aiohttp), get_url will send a request off into the æther, and then yield. The entire stack of generators — get_url, bar, and foo — will all suspend, and control will return to whatever first called foo, which with asyncio will be an “event loop”.

The event loop’s entire job is to notice that get_url yielded some kind of “I’m doing a network request” thing, remember it, and resume other coroutines in the meantime. (Or just twiddle its thumbs, if there’s nothing else to do.) When a response comes back, the event loop will resume get_url and send it the response. get_url will do some stuff and return it up to bar, who continues on, none the wiser that anything unusual happened.

The magic of this is that you can call get_url several times, and instead of having to wait for each request to completely finish before the next one can even start, you can do other work while you’re waiting. No threads necessary; this is all one thread, with functions cooperatively yielding control when they’re waiting on some external thing to happen.

Now, notice that you do have to use yield from each time you call another coroutine. This is nice in some ways, since it lets you see exactly when and where your function might be suspended out from under you, which can be important in some situations. There are also arguments about why this is bad, and I don’t care about them.

However, yield from is a really weird phrase to be sprinkling all over network-related code. It’s meant for use with iterables, right? Lists and tuples and things. get_url is only one thing. What are we yielding from it? Also, what’s this @coroutine decorator that doesn’t actually do anything?

Python 3.5 smoothed over this nonsense by introducing explicit syntax for these constructs, using new async and await keywords:

1
2
3
4
5
6
7
8
9
async def foo():
    # do some stuff
    await bar()
    # do more stuff

async def bar():
    # do some stuff
    response = await get_url("https://eev.ee/")
    # do more stuff

async def clearly identifies a coroutine, even one that returns immediately. (Before, you’d have a generator with no yield, which isn’t actually a generator, which causes some problems.) await explains what’s actually happening: you’re just waiting for another function to be done.

async for and async with are also available, replacing some particularly clumsy syntax you’d need to use before. And, handily, you can only use any of these things within an async def.

The new syntax comes with corresponding new special methods like __await__, whereas the previous approach required doing weird things with __iter__, which is what yield from ultimately calls.

I could fill a whole post or three with stuff about asyncio, and can’t possibly give it justice in just a few paragraphs. The short version is: there’s built-in syntax for doing network stuff in parallel without threads, and that’s cool.

Refs for asyncio: docs (asyncio); Python 3.4 release notes; PEP 3156

Refs for async and await: docs (await); docs (async); docs (special methods); Python 3.5 release notes; PEP 492

Function annotations

Function arguments and return values can have annotations:

1
2
def foo(a: "hey", b: "what's up") -> "whoa":
    ...

The annotations are accessible via the function’s __annotations__ attribute. They have no special meaning to Python, so you’re free to experiment with them.

Well…

You were free to experiment with them, but the addition of the typing module (mentioned below) has hijacked them for type hints. There’s no clear way to attach a type hint and some other value to the same argument, so you’ll have a tough time making function annotations part of your API.

There’s still no hard requirement that annotations be used exclusively for type hints (and it’s not like Python does anything with type hints, either), but the original PEP suggests it would like that to be the case someday. I guess we’ll see.

If you want to see annotations preserved for other uses as well, it would be a really good idea to do some creative and interesting things with them as soon as possible. Just saying.

Refs: docs; Python 3.0 release notes; PEP 3107

Matrix multiplication

Python 3.5 learned a new infix operator for matrix multiplication, spelled @. It doesn’t do anything for any built-in types, but it’s supported in NumPy. You can implement it yourself with the __matmul__ special method and its r and i variants.

Shh. Don’t tell anyone, but I suspect there are fairly interesting things you could do with an operator called @ — some of which have nothing to do with matrix multiplication at all!

Refs: Python 3.5 release notes; PEP 465

Ellipsis

... is now valid syntax everywhere. It evaluates to the Ellipsis singleton, which does nothing. (This exists in Python 2, too, but it’s only allowed when slicing.)

It’s not of much practical use, but you can use it to indicate an unfinished stub, in a way that’s clearly not intended to be final but will still parse and run:

1
2
3
class ReallyComplexFiddlyThing:
    # fuck it, do this later
    ...

Refs: docs; Python 3.0 release notes

Enhanced exceptions

A slightly annoying property of Python 2’s exception handling is that if you want to do your own error logging, or otherwise need to get at the traceback, you have to use the slightly funky sys.exc_info() API and carry the traceback around separately. As of Python 3.0, exceptions automatically have a __traceback__ attribute, as well as a .with_traceback() method that sets the traceback and returns the exception itself (so you can use it inline).

This makes some APIs a little silly — __exit__ still accepts the exception type and value and traceback, even though all three are readily available from just the exception object itself.

A much more annoying property of Python 2’s exception handling was that custom exception handling would lose track of where the problem actually occurred. Consider the following call stack.

1
2
3
4
5
A
B
C
D
E

Now say an exception happens in E, and it’s caught by code like this in C.

1
2
3
4
try:
    D()
except Exception as e:
    raise CustomError("Failed to call D")

Because this creates and raises a new exception, the traceback will start from this point and not even mention E. The best workaround for this involves manually creating a traceback between C and E, formatting it as a string, and then including that in the error message. Preposterous.

Python 3.0 introduced exception chaining, which allows you to do this:

1
raise CustomError("Failed to call D") from e

Now, if this exception reaches the top level, Python will format it as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Traceback (most recent call last):
File C, blah blah
File D, blah blah
File E, blah blah
SomeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File A, blah blah
File B, blah blah
File C, blah blah
CustomError: Failed to call D

The best part is that you don’t need to explicitly say from e at all — if you do a plain raise while there’s already an active exception, Python will automatically chain them together. Even internal Python exceptions will have this behavior, so a broken exception handler won’t lose the original exception. (In the implicit case, the intermediate text becomes “During handling of the above exception, another exception occurred:”.)

The chained exception is stored on the new exception as either __cause__ (if from an explicit raise ... from) or __context__ (if automatic).

If you direly need to hide the original exception, Python 3.3 introduced raise ... from None.

Speaking of exceptions, the error messages for missing arguments have been improved. Python 2 does this:

1
TypeError: foo() takes exactly 1 argument (0 given)

Python 3 does this:

1
TypeError: foo() missing 1 required positional argument: 'a'

Refs:

Cooler classes

super() with no arguments

You can call super() with no arguments. It Just Works. Hallelujah.

Also, you can call super() with no arguments. That’s so great that I could probably just fill the rest of this article with it and be satisfied.

Did I mention you can call super() with no arguments?

Refs: docs; Python 3.0 release notes; PEP 3135

New metaclass syntax and kwargs for classes

Compared to that, everything else in this section is going to sound really weird and obscure.

For example, __metaclass__ is gone. It’s now a keyword-only argument to the class statement.

1
2
class Foo(metaclass=FooMeta):
    ...

That doesn’t sound like much, right? Just some needless syntax change that makes porting harder, right?? Right??? Haha nope watch this because it’s amazing but it barely gets any mention at all.

1
2
class Foo(metaclass=FooMeta, a=1, b=2, c=3):
    ...

You can include arbitrary keyword arguments in the class statement, and they will be passed along to the metaclass call as keyword arguments. (You have to catch them in both __new__ and __init__, since they always get the same arguments.) (Also, the class statement now has the general syntax of a function call, so you can put *args and **kwargs in it.)

This is pretty slick. Consider SQLAlchemy, which uses a metaclass to let you declare a table with a class.

1
2
3
4
class SomeTable(TableBase):
    __tablename__ = 'some_table'
    id = Column()
    ...

Note that SQLAlchemy has you put the name of the table in the clumsy __tablename__ attribute, which it invented. Why not just name? Well, because then you couldn’t declare a column called name! Any “declarative” metaclass will have the same problem of separating the actual class contents from configuration. Keyword arguments offer an easy way out.

1
2
3
4
# only hypothetical, alas
class SomeTable(TableBase, name='some_table'):
    id = Column()
    ...

Refs: docs; Python 3.0 release notes; PEP 3115

__prepare__

Another new metaclass feature is the introduction of the __prepare__ method.

You may have noticed that the body of a class is just a regular block, which can contain whatever code you want. Before decorators were a thing, you’d actually declare class methods in two stages:

1
2
3
4
class Foo:
    def do_the_thing(cls):
        ...
    do_the_thing = classmethod(do_the_thing)

That’s not magical class-only syntax; that’s just regular code assigning to a variable. You can put ifs and fors and whiles and dels inside a class body, too; you just don’t see it very often because there aren’t very many useful reasons to do it.

A class body is a kind of weird pseudo-scope. It can create locals, and it can read values from outer scopes, but methods don’t see the class body as an outer scope. Once the class body reaches its end, any remaining locals are passed to the type constructor and become the new class’s attributes. (This is why, for example, you can’t refer to a class directly within its own body — the class doesn’t and can’t exist until after the body has executed.)

All of this is to say: __prepare__ is a new hook that returns the dict the class body’s locals go into.

Maybe that doesn’t sound particularly interesting, but consider: the value you return doesn’t have to be an actual dict. It can be anything that understands __setitem__. You could, say, use an OrderedDict, and keep track of the order your attributes were declared. That’s useful for declarative metaclasses, where the order of attributes may be important (consider a C struct).

But you can go further. You might allow more than one attribute of the same name. You might do something special with the attributes as soon as they’re assigned, rather than at the end of the body. You might predeclare some attributes. __prepare__ is passed the class’s kwargs, so you might alter the behavior based on those.

For a nice practical example, consider the new enum module, which I briefly mention later on. One drawback of this module is that you have to specify a value for every variant, since variants are defined as class attributes, which must have a value. There’s an example of automatic numbering, but it still requires assigning a dummy value like (). Clever use of __prepare__ would allow lifting this restriction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# XXX: Should prefer MutableMapping here, but the ultimate call to type()
# raises a TypeError if you pass a namespace object that doesn't inherit
# from dict!  Boo.
class EnumLocals(dict):
    def __init__(self):
        self.nextval = 1

    def __getitem__(self, key):
        if key not in self and not key.startswith('_') and not key.endswith('_'):
            self[key] = self.nextval
            self.nextval += 1
        return super().__getitem__(key)

class EnumMeta(type):
    @classmethod
    def __prepare__(meta, name, bases):
        return EnumLocals()

class Colors(metaclass=EnumMeta):
    red
    green
    blue

print(Colors.red, Colors.green, Colors.blue)
# 1 2 3

Deciding whether this is a good idea is left as an exercise.

This is an exceptionally obscure feature that gets very little attention — it’s not even mentioned explicitly in the 3.0 release notes — but there’s nothing else like it in the language. Between __prepare__ and keyword arguments, the class statement has transformed into a much more powerful and general tool for creating all kinds of objects. I almost wish it weren’t still called class.

Refs: docs; Python 3.0 release notes; PEP 3115

Attribute definition order

If that’s still too much work, don’t worry: a proposal was just accepted for Python 3.6 that makes this even easier. Now every class will have a __definition_order__ attribute, a tuple listing the names of all the attributes assigned within the class body, in order. (To make this possible, the default return value of __prepare__ will become an OrderedDict, but the __dict__ attribute will remain a regular dict.)

Now you don’t have to do anything at all: you can always check to see what order any class’s attributes were defined in.


Additionally, descriptors can now implement a __set_name__ method. When a class is created, any descriptor implementing the method will have it called with the containing class and the name of the descriptor.

I’m very excited about this, but let me try to back up. A descriptor is a special Python object that can be used to customize how a particular class attribute works. The built-in property decorator is a descriptor.

1
2
3
4
5
6
class MyClass:
    foo = SomeDescriptor()

c = MyClass()
c.foo = 5  # calls SomeDescriptor.__set__!
print(c.foo)  # calls SomeDescriptor.__get__!

This is super cool and can be used for all sorts of DSL-like shenanigans.

Now, most descriptors ultimately want to store a value somewhere, and the obvious place to do that is in the object’s __dict__. Above, SomeDescriptor might want to store its value in c.__dict__['foo'], which is fine since Python will still consult the descriptor first. If that weren’t fine, it could also use the key '_foo', or whatever. It probably wants to use its own name somehow, because otherwise… what would happen if you had two SomeDescriptors in the same class?

Therein lies the problem, and one of my long-running and extremely minor frustrations with Python. Descriptors have no way to know their own name! There are only really two solutions to this:

  1. Require the user to pass the name in as an argument, too: foo = SomeDescriptor('foo'). Blech!

  2. Also have a metaclass (or decorator, or whatever), which can iterate over all the class’s attributes, look for SomeDescriptor objects, and tell them what their names are. Needing a metaclass means you can’t make general-purpose descriptors meant for use in arbitrary classes; a decorator would work, but boy is that clumsy.

Both of these suck and really detract from what could otherwise be very neat-looking syntax trickery.

But now! Now, when MyClass is created, Python will have a look through its attributes. If it sees that the foo object has a __set_name__ method, it’ll call that method automatically, passing it both the owning class and the name 'foo'! Huzzah!

This is so great I am so happy you have no idea.


Lastly, there’s now an __init_subclass__ class method, which is called when the class is subclassed. A great many metaclasses exist just to do a little bit of work for each new subclass; now, you don’t need a metaclass at all in many simple cases. You want a plugin registry? No problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Plugin:
    _known_plugins = {}

    def __init_subclass__(cls, *, name, **kwargs):
        cls._known_plugins[name] = cls
        super().__init_subclass__(**kwargs)

    @classmethod
    def get_plugin(cls, name):
        return cls._known_plugins[name]

    # ...probably some interface stuff...

class FooPlugin(Plugin, name="foo"):
    ...

No metaclass needed at all.

Again, none of this stuff is available yet, but it’s all slated for Python 3.6, due out in mid-December. I am super pumped.

Refs: docs (customizing class creation); docs (descriptors); Python 3.6 release notes; PEP 520 (attribute definition order); PEP 487 (__init_subclass__ and __set_name__)

Math stuff

int and long have been merged, and there is no longer any useful distinction between small and very large integers. I’ve actually run into code that breaks if you give it 1 instead of 1L, so, good riddance. (Python 3.0 release notes; PEP 237)

The / operator always does “true” division, i.e., gives you a float. If you want floor division, use //. Accordingly, the __div__ magic method is gone; it’s split into two parts, __truediv__ and __floordiv__. (Python 3.0 release notes; PEP 238)

decimal.Decimal, fractions.Fraction, and floats now interoperate a little more nicely: numbers of different types hash to the same value; all three types can be compared with one another; and most notably, the Decimal and Fraction constructors can accept floats directly. (docs (decimal); docs (fractions); Python 3.2 release notes)

math.gcd returns the greatest common divisor of two integers. This existed before, but was in the fractions module, where nobody knew about it. (docs; Python 3.5 release notes)

math.inf is the floating-point infinity value. Previously, this was only available by writing float('inf'). There’s also a math.nan, but let’s not? (docs; Python 3.5 release notes)

math.isclose (and the corresponding complex version, cmath.isclose) determines whether two values are “close enough”. Intended to do the right thing when comparing floats. (docs; Python 3.5 release notes; PEP 485)

More modules

The standard library has seen quite a few improvements. In fact, Python 3.2 was developed with an explicit syntax freeze, so it consists almost entirely of standard library enhancements. There are far more changes across six and a half versions than I can possibly list here; these are the ones that stood out to me.

The module shuffle

Python 2, rather inexplicably, had a number of top-level modules that were named after the single class they contained, CamelCase and all. StringIO and SimpleHTTPServer are two obvious examples. In Python 3, the StringIO class lives in io (along with BytesIO), and SimpleHTTPServer has been renamed to http.server. If you’re anything like me, you’ll find this deeply satisfying.

Wait, wait, there’s a practical upside here. Python 2 had several pairs of modules that did the same thing with the same API, but one was pure Python and one was much faster C: pickle/cPickle, profile/cProfile, and StringIO/cStringIO. I’ve seen code (cough, older versions of Babel, cough) that spent a considerable amount of its startup time reading pickles with the pure Python version, because it did the obvious thing and used the pickle module. Now, these pairs have been merged: importing pickle gives you the faster C implementation, importing profile gives you the faster C implementation, and BytesIO/StringIO are the fast C implementations in the io module.

Refs: docs (sort of); Python 3.0 release notes; PEP 3108 (exhaustive list of removed and renamed modules)

Additions to existing modules

A number of file format modules, like bz2 and gzip, went through some cleanup and modernization in 3.2 through 3.4: some learned a more straightforward open function, some gained better support for the bytes/text split, and several learned to use their file types as context managers (i.e., with with).

collections.ChainMap is a mapping type that consults some number of underlying mappings in order, allowing for a “dict with defaults” without having to merge them together. (docs; Python 3.3 release notes)

configparser dropped its ridiculous distinction between ConfigParser and SafeConfigParser; there is now only ConfigParser, which is safe. The parsed data now preserves order by default and can be read or written using normal mapping syntax. Also there’s a fancier alternative interpolation parser. (docs; Python 3.2 release notes)

contextlib.ContextDecorator is some sort of devilry that allows writing a context manager which can also be used as a decorator. It’s used to implement the @contextmanager decorator, so those can be used as decorators as well. (docs; Python 3.2 release notes)

contextlib.ExitStack offers cleaner and more fine-grained handling of multiple context managers, as well as resources that don’t have their own context manager support. (docs; Python 3.3 release notes)

contextlib.suppress is a context manager that quietly swallows a given type of exception. (docs; Python 3.4 release notes)

contextlib.redirect_stdout is a context manager that replaces sys.stdout for the duration of a block. (docs; Python 3.4 release notes)

datetime.timedelta already existed, of course, but now it supports being multiplied and divided by numbers or divided by other timedeltas. The upshot of this is that timedelta finally, finally has a .total_seconds() method which does exactly what it says on the tin. (docs; Python 3.2 release notes)

datetime.timezone is a new concrete type that can represent fixed offsets from UTC. There has long been a datetime.tzinfo, but it was a useless interface, and you were left to write your own actual class yourself. datetime.timezone.utc is a pre-existing instance that represents UTC, an offset of zero. (docs; Python 3.2 release notes)

functools.lru_cache is a decorator that caches the results of a function, keyed on the arguments. It also offers cache usage statistics and a method for emptying the cache. (docs; Python 3.2 release notes)

functools.partialmethod is like functools.partial, but the resulting object can be used as a descriptor (read: method). (docs; Python 3.4 release notes)

functools.singledispatch allows function overloading, based on the type of the first argument. (docs; Python 3.4 release notes; PEP 443)

functools.total_ordering is a class decorator that allows you to define only __eq__ and __lt__ (or any other) and defines the other comparison methods in terms of them. Note that since Python 3.0, __ne__ is automatically the inverse of __eq__ and doesn’t need defining. Note also that total_ordering doesn’t correctly support NotImplemented until Python 3.4. For an even easier way to do this, consider my classtools.keyed_ordering decorator. (docs; Python 3.2 release notes)

inspect.getattr_static fetches an attribute like getattr but avoids triggering dynamic lookup like @property. (docs; Python 3.2 release notes)

inspect.signature fetches the signature of a function as the new and more featureful Signature object. It also knows to follow the __wrapped__ attribute set by functools.wraps since Python 3.2, so it can see through well-behaved wrapper functions to the “original” signature. (docs; Python 3.3 release notes; PEP 362)

The logging module can, finally, use str.format-style string formatting by passing style='{' to Formatter. (docs; Python 3.2 release notes)

The logging module spits warnings and higher to stderr if logging hasn’t been otherwise configured. This means that if your app doesn’t use logging, but it uses a library that does, you’ll get actual output rather than the completely useless “No handlers could be found for logger ‘foo’”. (docs; Python 3.2 release notes)

os.scandir lists the contents of a directory while avoiding stat calls as much as possible, making it significantly faster. (docs; Python 3.5 release notes; PEP 471)

re.fullmatch checks for a match against the entire input string, not just a substring. (docs; Python 3.4 release notes)

reprlib.recursive_repr is a decorator for __repr__ implementations that can detect recursive calls to the same object and replace them with ..., just like the built-in structures. Believe it or not, reprlib is an existing module, though in Python 2 it was called repr. (docs; Python 3.2 release notes)

shutil.disk_usage returns disk space statistics for a given path with no fuss. (docs; Python 3.3 release notes)

shutil.get_terminal_size tries very hard to detect the size of the terminal window. (docs; Python 3.3 release notes)

subprocess.run is a new streamlined function that consolidates several other helpers in the subprocess module. It returns an object that describes the final state of the process, and it accepts arguments for a timeout, requiring that the process return success, and passing data as stdin. This is now the recommended way to run a single subprocess. (docs; Python 3.5 release notes)

tempfile.TemporaryDirectory is a context manager that creates a temporary directory, then destroys it and its contents at the end of the block. (docs; Python 3.2 release notes)

textwrap.indent can add an arbitrary prefix to every line in a string. (docs; Python 3.3 release notes)

time.monotonic returns the value of a monotonic clock — i.e., it will never go backwards. You should use this for measuring time durations within your program; using time.time() will produce garbage results if the system clock changes due to DST, a leap second, NTP, manual intervention, etc. (docs; Python 3.3 release notes; PEP 418)

time.perf_counter returns the value of the highest-resolution clock available, but is only suitable for measuring a short duration. (docs; Python 3.3 release notes; PEP 418)

time.process_time returns the total system and user CPU time for the process, excluding sleep. Note that the starting time is undefined, so only durations are meaningful. (docs; Python 3.3 release notes; PEP 418)

traceback.walk_stack and traceback.walk_tb are small helper functions that walk back along a stack or traceback, so you can use simple iteration rather than the slightly clumsier linked-list approach. (docs; Python 3.5 release notes)

types.MappingProxyType offers a read-only proxy to a dict. Since it holds a reference to the dict in C, you can return MappingProxyType(some_dict) to effectively create a read-only dict, as the original dict will be inaccessible from Python code. This is the same type used for the __dict__ of an immutable object. Note that this has existed in various forms for a while, but wasn’t publicly exposed or documented; see my module dictproxyhack for something that does its best to work on every Python version. (docs; Python 3.3 release notes)

types.SimpleNamespace is a blank type for sticking arbitrary unstructed attributes to. Previously, you would have to make a dummy subclass of object to do this. (docs; Python 3.3 release notes)

weakref.finalize allows you to add a finalizer function to an arbitrary (weakrefable) object from the “outside”, without needing to add a __del__. The finalize object will keep itself alive, so there’s no need to hold onto it. (docs; Python 3.4 release notes)

New modules with backports

These are less exciting, since they have backports on PyPI that work in Python 2 just as well. But they came from Python 3 development, so I credit Python 3 for them, just like I credit NASA for inventing the microwave.

asyncio is covered above, but it’s been backported as trollius for 2.6+, with the caveat that Pythons before 3.3 don’t have yield from and you have to use yield From(...) as a workaround. That caveat means that third-party asyncio libraries will almost certainly not work with trollius! For this and other reasons, the maintainer is no longer supporting it. Alas. Guess you’ll have to upgrade to Python 3, then.

enum finally provides an enumeration type, something which has long been desired in Python and solved in myriad ad-hoc ways. The variants become instances of a class, can be compared by identity, can be converted between names and values (but only explicitly), can have custom methods, and can implement special methods as usual. There’s even an IntEnum base class whose values end up as subclasses of int (!), making them perfectly compatible with code expecting integer constants. Enums have a surprising amount of power, far more than any approach I’ve seen before; I heartily recommend that you skim the examples in the documentation. Backported as enum34 for 2.4+. (docs; Python 3.4 release notes; PEP 435)

ipaddress offers types for representing IPv4 and IPv6 addresses and subnets. They can convert between several representations, perform a few set-like operations on subnets, identify special addresses, and so on. Backported as ipaddress for 2.6+. (There’s also a py2-ipaddress, but its handling of bytestrings differs from Python 3’s built-in module, which is likely to cause confusing compatibility problems.) (docs; Python 3.3 release notes; PEP 3144)

pathlib provides the Path type, representing a filesystem path that you can manipulate with methods rather than the mountain of functions in os.path. It also overloads / so you can do path / 'file.txt', which is kind of cool. PEP 519 intends to further improve interoperability of Paths with classic functions for the not-yet-released Python 3.6. Backported as pathlib2 for 2.6+; there’s also a pathlib, but it’s no longer maintained, and I don’t know what happened there. (docs; Python 3.4 release notes; PEP 428)

selectors (created as part of the work on asyncio) attempts to wrap select in a high-level interface that doesn’t make you want to claw your eyes out. A noble pursuit. Backported as selectors34 for 2.6+. (docs; Python 3.4 release notes)

statistics contains a number of high-precision statistical functions. Backported as backports.statistics for 2.6+. (docs; Python 3.4 release notes; PEP 450)

unittest.mock provides multiple ways for creating dummy objects, temporarily (with a context manager or decorator) replacing an object or some of its attributes, and verifying that some sequence of operations was performed on a dummy object. I’m not a huge fan of mocking so much that your tests end up mostly testing that your source code hasn’t changed, but if you have to deal with external resources or global state, some light use of unittest.mock can be very handy — even if you aren’t using the rest of unittest. Backported as mock for 2.6+. (docs; Python 3.3, but no release notes)

New modules without backports

Perhaps more exciting because they’re Python 3 exclusive! Perhaps less exciting because they’re necessarily related to plumbing.

faulthandler

faulthandler is a debugging aid that can dump a Python traceback during a segfault or other fatal signal. It can also be made to hook on an arbitrary signal, and can intervene even when Python code is deadlocked. You can use the default behavior with no effort by passing -X faulthandler on the command line, by setting the PYTHONFAULTHANDLER environment variable, or by using the module API manually.

I think -X itself is new as of Python 3.2, though it’s not mentioned in the release notes. It’s reserved for implementation-specific options; there are a few others defined for CPython, and the options can be retrieved from Python code via sys._xoptions.

Refs: docs; Python 3.3 release notes

importlib

importlib is the culmination of a whole lot of work, performed in multiple phases across numerous Python releases, to extend, formalize, and cleanly reimplement the entire import process.

I can’t possibly describe everything the import system can do and what Python versions support what parts of it. Suffice to say, it can do a lot of things: Python has built-in support for importing from zip files, and I’ve seen third-party import hooks that allow transparently importing modules written in another programming language.

If you want to mess around with writing your own custom importer, importlib has a ton of tools for helping you do that. It’s possible in Python 2, too, using the imp module, but that’s a lot rougher around the edges.

If not, the main thing of interest is the import_module function, which imports a module by name without all the really weird semantics of __import__. Seriously, don’t use __import__. It’s so weird. It probably doesn’t do what you think. importlib.import_module even exists in Python 2.7.

Refs: docs; Python 3.3 release notes; PEP 302?

tracemalloc

tracemalloc is another debugging aid which tracks Python’s memory allocations. It can also compare two snapshots, showing how much memory has been allocated or released between two points in time, and who was responsible. If you have rampant memory use issues, this is probably more helpful than having Python check its own RSS.

Technically, tracemalloc can be used with Python 2.7… but that involves patching and recompiling Python, so I hesitate to call it a backport. Still, if you really need it, give it a whirl.

Refs: docs; Python 3.4 release notes; PEP 454

typing

typing offers a standard way to declare type hints — the expected types of arguments and return values. Type hints are given using the function annotation syntax.

Python itself doesn’t do anything with the annotations, though they’re accessible and inspectable at runtime. An external tool like mypy can perform static type checking ahead of time, using these standard types. mypy is an existing project that predates typing (and works with Python 2), but the previous syntax relied on magic comments; typing formalizes the constructs and puts them in the standard library.

I haven’t actually used either the type hints or mypy myself, so I can’t comment on how helpful or intrusive they are. Give them a shot if they sound useful to you.

Refs: docs; Python 3.5 release notes; PEP 484

venv and ensurepip

I mean, yes, of course, virtualenv and pip are readily available in Python 2. The whole point of these is that they are bundled with Python, so you always have them at your fingertips and never have to worry about installing them yourself.

Installing Python should now give you pipX and pipX.Y commands automatically, corresponding to the latest stable release of pip when that Python version was first released. You’ll also get pyvenv, which is effectively just virtualenv.

There’s also a module interface: python -m ensurepip will install pip (hopefully not necessary), python -m pip runs pip with a specific Python version (a feature of pip and not new to the bundling), and python -m venv runs the bundled copy of virtualenv with a specific Python version.

There was a time where these were completely broken on Debian, because Debian strongly opposes vendoring (the rationale being that it’s easiest to push out updates if there’s only one copy of a library in the Debian package repository), so they just deleted ensurepip and venv? Which completely defeated the point of having them in the first place? I think this has been fixed by now, but it might still bite you if you’re on the Ubuntu 14.04 LTS.

Refs: ensurepip docs; pyvenv docs; Python 3.4 release notes; PEP 453

zipapp

zipapp makes it easy to create executable zip applications, which have been a thing since 2.6 but have languished in obscurity. Well, no longer.

This wasn’t particularly difficult before: you just zip up some code, make sure there’s a __main__.py in the root, and pass it to Python. Optionally, you can set it executable and add a shebang line, since the ZIP format ignores any leading junk in the file. That’s basically all zipapp does. (It does not magically infer your dependencies and bundle them as well; you’re on your own there.)

I can’t find a backport, which is a little odd, since I don’t think this module does anything too special.

Refs: docs; Python 3.5 release notes; PEP 441

Miscellaneous nice enhancements

There were a lot of improvements to language semantics that don’t fit anywhere else above, but make me a little happier.

The interactive interpreter does tab-completion by default. I say “by default” because I’ve been told that it was supported before, but you had to do some kind of goat blood sacrifice to get it to work. Also, command history persists between runs. (docs; Python 3.4 release notes)

The -b command-line option produces a warning when calling str() on a bytes or bytearray, or when comparing text to bytes. -bb produces an error. (docs)

The -I command-like option runs Python in “isolated mode”: it ignores all PYTHON* environment variables and leaves the current directory and user site-packages directories off of sys.path. The idea is to use this when running a system script (or in the shebang line of a system script) to insulate it from any weird user-specific stuff. (docs; Python 3.4 release notes)

Functions and classes learned a __qualname__ attribute, which is a dotted name describing (lexically) where they were defined. For example, a method’s __name__ might be foo, but its __qualname__ would be something like SomeClass.foo. Similarly, a class or function defined within another function will list that containing function in its __qualname__. (docs; Python 3.3 release notes; PEP 3155)

Generators signal their end by raising StopIteration internally, but it was also possible to raise StopIteration directly within a generator — most notably, when calling next() on an exhausted iterator. This would cause the generator to end prematurely and silently. Now, raising StopIteration inside a generator will produce a warning, which will become a RuntimeError in Python 3.7. You can opt into the fatal behavior early with from __future__ import generator_stop. (Python 3.5 release notes; PEP 479)

Implicit namespace packages allow a package to span multiple directories. The most common example is a plugin system, foo.plugins.*, where plugins may come from multiple libraries, but all want to share the foo.plugins namespace. Previously, they would collide, and some sys.path tricks were necessary to make it work; now, support is built in. (This feature also allows you to have a regular package without an __init__.py, but I’d strongly recommend still having one.) (Python 3.3 release notes; PEP 420)

Object finalization behaves in less quirky ways when destroying an isolated reference cycle. Also, modules no longer have their contents changed to None during shutdown, which fixes a long-running type of error when a __del__ method tries to call, say, os.path.join() — if you were unlucky, os.path would have already have had its contents replaced with Nones, and you’d get an extremely confusing TypeError from trying to call a standard library function. (Python 3.4 release notes; PEP 442)

str.format_map is like str.format, but it accepts a mapping object directly (instead of having to flatten it with **kwargs). This allows some fancy things that weren’t previously possible, like passing a fake map that creates values on the fly based on the keys looked up in it. (docs; Python 3.2 release notes)

When a blocking system call is interrupted by a signal, it returns EINTR, indicating that the calling code should try the same system call again. In Python, this becomes OSError or InterruptedError. I have never in my life seen any C or Python code that actually deals with this correctly. Now, Python will do it for you: all the built-in and standard library functions that make use of system calls will automatically retry themselves when interrupted. (Python 3.5 release notes; PEP 475)

File descriptors created by Python code are now flagged “non-inheritable”, meaning they’re closed automatically when spawning a child process. (docs; Python 3.4 release notes; PEP 446)

A number of standard library functions now accept file descriptors in addition to paths. (docs; Python 3.3 release notes)

Several different OS and I/O exceptions were merged into a single and more fine-grained hierarchy, rooted at OSError. Code can now catch a specific subclass in most cases, rather than examine .errno. (docs; Python 3.3 release notes; PEP 3151)

ResourceWarning is a new kind of warning for issues with resource cleanup. One is produced if a file object is destroyed, but was never closed, which can cause issues on Windows or with garbage-collected Python implementations like PyPy; one is also produced if uncollectable objects still remain when Python shuts down, indicating some severe finalization problems. The warning is ignored by default, but can be enabled with -W default on the command line. (Python 3.2 release notes)

hasattr() only catches (and returns False for) AttributeErrors. Previously, any exception would be considered a sign that the attribute doesn’t exist, even though an unusual exception like an OSError usually means the attribute is computed dynamically, and that code is broken somehow. Now, exceptions other than AttributeError are allowed to propagate to the caller. (docs; Python 3.2 release notes)

Hash randomization is on by default, meaning that dict and set iteration order is different per Python runs. This protects against some DoS attacks, but more importantly, it spitefully forces you not to rely on incidental ordering. (docs; Python 3.3 release notes)

List comprehensions no longer leak their loop variables into the enclosing scope. (Python 3.0 release notes)

nonlocal allows writing to a variable in an enclosing (but non-global) scope. (docs; Python 3.0 release notes; PEP 3104)

Comparing objects of incompatible types now produces a TypeError, rather than using Python 2’s very silly fallback. (Python 3.0 release notes)

!= defaults to returning the opposite of ==. (Python 3.0 release notes)

Accessing a method as a class attribute now gives you a regular function, not an “unbound method” object. (Python 3.0 release notes)

The input builtin no longer performs an eval (!), removing a huge point of confusion for beginners. This is the behavior of raw_input in Python 2. (docs; Python 3.0 release notes; PEP 3111)

Fast and furious

These aren’t necessarily compelling, and they may not even make any appreciable difference for your code, but I think they’re interesting technically.

Objects’ __dict__s can now share their key storage internally. Instances of the same type generally have the same attribute names, so this provides a modest improvement in speed and memory usage for programs that create a lot of user-defined objects. (Python 3.3 release notes; PEP 412)

OrderedDict is now implemented in C, making it “4 to 100” (!) times faster. Note that the backport in the 2.7 standard library is pure Python. So, there’s a carrot. (Python 3.5 release notes)

The GIL was made more predictable. My understanding is that the old behavior was to yield after some number of Python bytecode operations, which could take wildly varying amounts of time; the new behavior yields after a given duration, by default 5ms. (Python 3.2 release notes)

The io library was rewritten in C, making it more fast. Again, the Python 2.7 implementation is pure Python. (Python 3.1 release notes)

Tuples and dicts containing only immutable objects — i.e., objects that cannot possibly contain circular references — are ignored by the garbage collector. This was backported to Python 2.7, too, but I thought it was super interesting. (Python 3.1 release notes)

That’s all I’ve got

Huff, puff.

I hope something here appeals to you as a reason to at least experiment with Python 3. It’s fun over here. Give it a try.

Graphical fidelity is ruining video games

Post Syndicated from Eevee original https://eev.ee/blog/2016/06/22/graphical-fidelity-is-ruining-video-games/

I’m almost 30, so I have to start practicing being crotchety.

Okay, maybe not all video games, but something curious has definitely happened here. Please bear with me for a moment.

Discovering Doom

Surprise! This is about Doom again.

Last month, I sat down and played through the first episode of Doom 1 for the first time. Yep, the first time. I’ve mentioned before that I was introduced to Doom a bit late, and mostly via Doom 2. I’m familiar with a decent bit of Doom 1, but I’d never gotten around to actually playing through any of it.

I might be almost unique in playing Doom 1 for the first time decades after it came out, while already being familiar with the series overall. I didn’t experience Doom 1 only in contrast to modern games, but in contrast to later games using the same engine.

It was very interesting to experience Romero’s design sense in one big chunk, rather than sprinkled around as it is in Doom 2. Come to think of it, Doom 1’s first episode is the only contiguous block of official Doom maps to have any serious consistency: it sticks to a single dominant theme and expands gradually in complexity as you play through it. Episodes 2 and 3, as well of most of Doom 2, are dominated by Sandy Petersen’s more haphazard and bizarre style. Episode 4 and Final Doom, if you care to count them, are effectively just map packs.

It was also painfully obvious just how new this kind of game was. I’ve heard Romero stress the importance of contrast in floor height (among other things) so many times, and yet Doom 1 is almost comically flat. There’s the occasional lift or staircase, sure, but the spaces generally feel like they’re focused around a single floor height with the occasional variation. Remember, floor height was a new thing — id had just finished making Wolfenstein 3D, where the floor and ceiling were completely flat and untextured.

The game was also clearly designed for people who had never played this kind of game. There was much more ammo than I could possibly carry; I left multiple shell boxes behind on every map. The levels were almost comically easy, even on UV, and I’m not particularly good at shooters. It was a very stark contrast to when I played partway through The Plutonia Experiment a few years ago and had to rely heavily on quicksaving.

Seeing Doom 1 from a Doom 2 perspective got me thinking about how design sensibilities in shooters have morphed over time. And then I realized something: I haven’t enjoyed an FPS since Quake 2.

Or… hang on. That’s not true. I enjoy Splatoon (except when I lose). I loved the Metroid Prime series. I played Team Fortress 2 for quite a while.

On the other hand, I found Half-Life 2 a little boring, I lost interest in Doom 3 before even reaching Hell, and I bailed on Quake 4 right around the extremely hammy spoiler plot thing. I loved Fallout, but I couldn’t stand Fallout 3. Uncharted is pretty to watch, but looks incredibly tedious to play. I never cared about Halo. I don’t understand the appeal of Counterstrike or Call of Duty.

If I made a collage of screenshots of these two sets of games, you’d probably spot the pattern pretty quickly. It seems I can’t stand games with realistic graphics.

I have a theory about this.

The rise of realism

Quake introduced the world to “true” 3D — an environment made out of arbitrary shapes, not just floors and walls. (I’m sure there were other true-3D games before it, but I challenge you to name one off the top of your head.)

Before Quake, games couldn’t even simulate a two-story building, which ruled out most realistic architecture. Walls that slid sideways were virtually unique to Hexen (and, for some reason, the much earlier Wolfenstein 3D). So level designers built slightly more abstract spaces instead. Consider this iconic room from the very beginning of Doom’s E1M1.

What is this room? This is supposed to be a base of some kind, but who would build this room just to store a single armored vest? Up a flight of stairs, on a dedicated platform, and framed by glowing pillars? This is completely ridiculous.

But nobody thinks like that, and even the people who do, don’t really care too much. It’s a room with a clear design idea and a clear gameplay purpose: to house the green armor. It doesn’t matter that this would never be a real part of a base. The game exists in its own universe, and it establishes early on that these are the rules of that universe. Sometimes a fancy room exists just to give the player a thing.

At the same time, the room still resembles a base. I can take for granted, in the back of my head, that someone deliberately placed this armor here for storage. It’s off the critical path, too, so it doesn’t quit feel like it was left specifically for me to pick up. The world is designed for the player, but it doesn’t feel that way — the environment implies, however vaguely, that other stuff is going on here.


Fast forward twenty years. Graphics and physics technology have vastly improved, to the point that we can now roughly approximate a realistic aesthetic in real-time. A great many games thus strive to do exactly that.

And that… seems like a shame. The better a game emulates reality, the less of a style it has. I can’t even tell Call of Duty and Battlefield apart.

That’s fine, though, right? It’s just an aesthetic thing. It doesn’t really affect the game.

It totally affects the game

Everything looks the same

Realism” generally means “ludicrous amounts of detail” — even moreso if the environments are already partially-destroyed, which is a fairly common trope I’ll be touching on a lot here.

When everything is highly-detailed, screenshots may look very good, but gameplay suffers because the player can no longer tell what’s important. The tendency for everything to have a thick coating of sepia certainly doesn’t help.

Look at that Call of Duty screenshot again. What in this screenshot is actually important? What here matters to you as a player? As far as I can tell, the only critical objects are:

  • Your current weapon

That’s it. The rocks and grass and billboards and vehicles and Hollywood sign might look very nice (by which I mean, “look like those things look”), but they aren’t important to the game at all. This might as well be a completely empty hallway.

To be fair, I haven’t played the game, so for all I know there’s a compelling reason to collect traffic cones. Otherwise, this screenshot is 100% noise. Everything in it serves only to emphasize that you’re in a realistic environment.

Don’t get me wrong, setting the scene is important, but something has been missed here. Detail catches the eye, and this screenshot is nothing but detail. None of it is relevant. If there were ammo lying around, would you even be able to find it?

Ah, but then, modern realistic games either do away with ammo pickups entirely or make them glow so you can tell they’re there. You know, for the realism.

(Speaking of glowing: something I always found ridiculous was how utterly bland the imp fireballs look in Doom 3 and 4. We have these amazing lighting engines, and the best we can do for a fireball is a solid pale orange circle? How do modern fireballs look less interesting than a Doom 1 fireball sprite?)

Even Fallout 2 bugged me a little with this; the world was full of shelves and containers, but it seemed almost all of them were completely empty. Fallout 1 had tons of loot waiting to be swiped from shelves, but someone must’ve decided that was a little silly and cut down on it in Fallout 2. So then, what’s the point of having so many shelves? They encourage the player to explore, then offer no reward whatsoever most of the time.

Environments are boring and static

Fallout 3 went right off the rails, filling the world with tons of (gray) detail, none of which I could interact with. I was barely finished with the first settlement before I gave up on the game because of how empty it felt. Everywhere was detailed as though it were equally important, but most of it was static decorations. From what I’ve seen, Fallout 4 is even worse.

Our graphical capabilities have improved much faster than our ability to actually simulate all the junk we’re putting on the screen. Hey, there’s a car! Can I get in it? Can I drive it? No, I can only bump into an awkwardly-shaped collision box drawn around it. So what’s the point of having a car, an object that — in the real world — I’m accustomed to being able to use?

And yet… a game that has nothing to do with driving a car doesn’t need you to be able to drive a car. Games are games, not perfect simulations of reality. They have rules, a goal, and a set of things the player is able to do. There’s no reason to make the player able to do everything if it has no bearing on what the game’s about.

This puts “realistic” games in an awkward position. How do they solve it?

One good example that comes to mind is Portal, which was rendered realistically, but managed to develop a style from the limited palette it used in the actual play areas. It didn’t matter that you couldn’t interact with the world in any way other than portaling walls and lifting cubes, because for the vast majority of the game, you only encountered walls and cubes! Even the “behind the scenes” parts at the end were mostly architecture, not objects, and I’m not particularly bothered that I can’t interact with a large rusty pipe.

The standouts were the handful of offices you managed to finagle your way into, which were of course full of files and computers and other desktop detritus. Everything in an office is — necessarily! — something a human can meaningfully interact with, but the most you can do in Portal is drop a coffee cup on the floor. It’s all the more infuriating if you consider that the plot might have been explained by the information in those files or on those computers. Portal 2 was in fact a little worse about this, as you spent much more time outside of the controlled test areas.

I think Left 4 Dead may have also avoided this problem by forcing the players to be moving constantly — you don’t notice that you can’t get in a car if you’re running for your life. The only time the players can really rest is in a safe house, which are generally full of objects the players can pick up and use.

Progression feels linear and prescripted

Ah, but the main draw of Portal is one of my favorite properties of games: you could manipulate the environment itself. It’s the whole point of the game, even. And it seems to be conspicuously missing from many modern “realistic” games, partly because real environments are just static, but also in large part because… of the graphics!

Rendering a very complex scene is hard, so modern map formats do a whole lot of computing stuff ahead of time. (For similar reasons, albeit more primitive ones, vanilla Doom can’t move walls sideways.) Having any of the environment actually move or change is thus harder, so it tends to be reserved for fancy cutscenes when you press the button that lets you progress. And because grandiose environmental changes aren’t very realistic, that button often just opens a door or blows something up.

It feels hamfisted, like someone carefully set it all up just for me. Obviously someone did, but the last thing I want is to be reminded of that. I’m reminded very strongly of Half-Life 2, which felt like one very long corridor punctuated by the occasional overt physics puzzle. Contrast with Doom, where there are buttons all over the place and they just do things without drawing any particular attention to the results. Mystery switches are sometimes a problem, but for better or worse, Doom’s switches always feel like something I’m doing to the game, rather than the game waiting for me to come along so it can do some preordained song and dance.

I miss switches. Real switches, not touchscreens. Big chunky switches that take up half a wall.

It’s not just the switches, though. Several of Romero’s maps from episode 1 are shaped like a “horseshoe”, which more or less means that you can see the exit from the beginning (across some open plaza). More importantly, the enemies at the exit can see you, and will be shooting at you for much of the level.

That gives you choices, even within the limited vocabulary of Doom. Do you risk wasting ammo trying to take them out from a distance, or do you just dodge their shots all throughout the level? It’s up to you! You get to decide how to play the game, naturally, without choosing from a How Do You Want To Play The Game menu. Hell, Doom has entire speedrun categories focused around combat — Tyson for only using the fist and pistol, pacifist for never attacking a monster at all.

You don’t see a lot of that any more. Rendering an entire large area in a polygon-obsessed game is, of course, probably not going to happen — whereas the Doom engine can handle it just fine. I’ll also hazard a guess and say that having too much enemy AI going at once and/or rendering too many highly-detailed enemies at once is too intensive. Or perhaps balancing and testing multiple paths is too complicated.

Or it might be the same tendency I see in modding scenes: the instinct to obsessively control the player’s experience, to come up with a perfectly-crafted gameplay concept and then force the player to go through it exactly as it was conceived. Even Doom 4, from what I can see, has a shocking amount of “oh no the doors are locked, kill all the monsters to unlock them!” nonsense. Why do you feel the need to force the player to shoot the monsters? Isn’t that the whole point of the game? Either the player wants to do it and the railroading is pointless, or the player doesn’t want to do it and you’re making the game actively worse for them!

Something that struck me in Doom’s E1M7 was that, at a certain point, you run back across half the level and there are just straggler monsters all over the place. They all came out of closets when you picked up something, of course, but they also milled around while waiting for you to find them. They weren’t carefully scripted to teleport around you in a fixed pattern when you showed up; they were allowed to behave however they want, following the rules of the game.

Whatever the cause, something has been lost. The entire point of games is that they’re an interactive medium — the player has some input, too.

Exploration is discouraged

I haven’t played through too many recent single-player shooters, but I get the feeling that branching paths (true nonlinearity) and sprawling secrets have become less popular too. I’ve seen a good few people specifically praise Doom 4 for having them, so I assume the status quo is to… not.

That’s particularly sad off the back of Doom episode 1, which has sprawling secrets that often feel like an entire hidden part of the base. In several levels, merely getting outside qualifies as a secret. There are secrets within secrets. There are locked doors inside secrets. It’s great.

And these are real secrets, not three hidden coins in a level and you need to find so many of them to unlock more levels. The rewards are heaps of resources, not a fixed list of easter eggs to collect. Sometimes they’re not telegraphed at all; sometimes you need to do something strange to open them. Doom has a secret you open by walking up to one of two pillars with a heart on it. Doom 2 has a secret you open by run-jumping onto a light fixture, and another you open by “using” a torch and shooting some eyes in the wall.

I miss these, too. Finding one can be a serious advantage, and you can feel genuinely clever for figuring them out, yet at the same time you’re not permanently missing out on anything if you don’t find them all.

I can imagine why these might not be so common any more. If decorating an area is expensive and complicated, you’re not going to want to build large areas off the critical path. In Doom, though, you can make a little closet containing a powerup in about twenty seconds.

More crucially, many of the Doom secrets require the player to notice a detail that’s out of place — and that’s much easier to set up in a simple world like Doom. In a realistic world where every square inch is filled with clutter, how could anyone possibly notice a detail out of place? How can a designer lay any subtle hints at all, when even the core gameplay elements have to glow for anyone to pick them out from background noise?

This might be the biggest drawback to extreme detail: it ultimately teaches the player to ignore the detail, because very little of it is ever worth exploring. After running into enough invisible walls, you’re going to give up on straying from the beaten path.

We wind up with a world where players are trained to look for whatever glows, and completely ignore everything else. At which point… why are we even bothering?

There are no surprises

Realistic” graphics mean a “realistic” world, and let’s face it, the real world can be a little dull. That’s why we invented video games, right?

Doom has a very clear design vocabulary. Here are some demons. They throw stuff at you; don’t get hit by it. Here are some guns, which you can all hold at once, because those are the rules. Also here’s a glowing floating sphere that gives you a lot of health.

What is a megasphere, anyway? Does it matter? It’s a thing in the game with very clearly-defined rules. It’s good; pick it up.

You can’t do that in a “realistic” game. (Or maybe you can, but we seem to be trying to avoid it.) You can’t just pick up a pair of stereoscopic glasses to inexplicably get night vision for 30 seconds; you need to have some night-vision goggles with batteries and it’s a whole thing. You can’t pick up health kits that heal you; you have to be wearing regenerative power armor and pick up energy cells. Even Doom 4 seems to be uncomfortable leaving brightly flashing keycards lying around — instead you retrieve them from the corpses of people wearing correspondingly-colored armor.

Everything needs an explanation, which vastly reduces the chances of finding anything too surprising or new.

I’m told that Call of Duty is the most popular vidya among the millenials, so I went to look at its weapons:

  • Gun
  • Fast gun
  • Long gun
  • Different gun

How exciting! If you click through each of those gun categories, you can even see the list of unintelligible gun model numbers, which are exactly what gets me excited about a game.

I wonder if those model numbers are real or not. I’m not sure which would be worse.

Get off my lawn

So my problem is that striving for realism is incredibly boring and counter-productive. I don’t even understand the appeal; if I wanted reality, I could look out my window.

Realism” actively sabotages games. I can judge Doom or Mario or Metroid or whatever as independent universes with their own rules, because that’s what they are. A game that’s trying to mirror reality, I can only compare to reality — and it’ll be a very pale imitation.

It comes down to internal consistency. Doom and Team Fortress 2 and Portal and Splatoon and whatever else are pretty upfront about what they’re offering: you have a gun, you can shoot it, also you can run around and maybe press some buttons if you’re lucky. That’s exactly what you get. It’s right there on the box, even.

Then I load Fallout 3, and it tries to look like the real world, and it does a big song and dance asking me for my stats “in-world”, and it tries to imply I can roam this world and do anything I want and forge my own destiny. Then I get into the game, and it turns out I can pretty much just shoot, pick from dialogue trees, and make the occasional hamfisted moral choice. The gameplay doesn’t live up to what the environment tried to promise. The controls don’t even live up to what the environment tried to promise.

The great irony is that “realism” is harshly limiting, even as it grows ever more expensive and elaborate. I’m reminded of the Fat Man in Fallout 3, the gun that launches “mini nukes”. If that weapon had been in Fallout 1 or 2, I probably wouldn’t think twice about it. But in the attempted “realistic” world of Fallout 3, I have to judge it as though it were trying to be a real thing — because it is! — and that makes it sound completely ridiculous.

(It may sound like I’m picking on Fallout 3 a lot here, but to its credit, it actually had enough stuff going on that it stands out to me. I barely remember anything about Doom 3 or Quake 4, and when I think of Half-Life 2 I mostly imagine indistinct crumbling hallways or a grungy river that never ends.)

I’ve never felt this way about series that ignored realism and went for their own art style. Pikmin 3 looks very nice, but I never once felt that I ought to be able to do anything other than direct Pikmin around. Metroid Prime looks great too and has some “realistic” touches, but it still has a very distinct aesthetic, and it manages to do everything important with a relatively small vocabulary — even plentiful secrets.

I just don’t understand the game industry (and game culture)’s fanatical obsession with realistic graphics. They make games worse. It’s entirely possible to have an art style other than “get a lot of unpaid interns to model photos of rocks”, even for a mind-numbingly bland army man simulator. Please feel free to experiment a little more. I would love to see more weird and abstract worlds that follow their own rules and drag you down the rabbit hole with them.