Tag Archives: interpol

Grafana v5.1 Released

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/26/grafana-v5.1-released/

v5.1 Stable Release

The recent 5.0 major release contained a lot of new features so the Grafana 5.1 release is focused on smoothing out the rough edges and iterating over some of the new features.

Download Grafana 5.1 Now

Release Highlights

There are two new features included, Heatmap Support for Prometheus and a new core data source for Microsoft SQL Server.

Another highlight is the revamp of the Grafana docker container that makes it easier to run and control but be aware there is a breaking change to file permissions that will affect existing containers with data volumes.

We got tons of useful improvement suggestions, bug reports and Pull Requests from our amazing community. Thank you all! See the full changelog for more details.

Improved Scrolling Experience

In Grafana v5.0 we introduced a new scrollbar component. Unfortunately this introduced a lot of issues and in some scenarios removed
the native scrolling functionality. Grafana v5.1 ships with a native scrollbar for all pages together with a scrollbar component for
the dashboard grid and panels that does not override the native scrolling functionality. We hope that these changes and improvements should
make the Grafana user experience much better!

Improved Docker Image

Grafana v5.1 brings an improved official docker image which should make it easier to run and use the Grafana docker image and at the same time give more control to the user how to use/run it.

We have switched the id of the grafana user running Grafana inside a docker container. Unfortunately this means that files created prior to 5.1 will not have the correct permissions for later versions and thereby introduces a breaking change. We made this change so that it would be easier for you to control what user Grafana is executed as.

Please read the updated documentation which includes migration instructions and more information.

Heatmap Support for Prometheus

The Prometheus datasource now supports transforming Prometheus histograms to the heatmap panel. The Prometheus histogram is a powerful feature, and we’re
really happy to finally allow our users to render those as heatmaps. The Heatmap panel documentation
contains more information on how to use it.

Another improvement is that the Prometheus query editor now supports autocomplete for template variables. More information in the Prometheus data source documentation.

Microsoft SQL Server

Grafana v5.1 now ships with a built-in Microsoft SQL Server (MSSQL) data source plugin that allows you to query and visualize data from any
Microsoft SQL Server 2005 or newer, including Microsoft Azure SQL Database. Do you have metric or log data in MSSQL? You can now visualize
that data and define alert rules on it as with any of Grafana’s other core datasources.

The using Microsoft SQL Server in Grafana documentation has more detailed information on how to get started.

Adding New Panels to Dashboards

The control for adding new panels to dashboards now includes panel search and it is also now possible to copy and paste panels between dashboards.

By copying a panel in a dashboard it will be displayed in the Paste tab. When you switch to a new dashboard you can paste the
copied panel.

Align Zero-Line for Right and Left Y-axes

The feature request to align the zero-line for right and left Y-axes on the Graph panel is more than 3 years old. It has finally been implemented – more information in the Graph panel documentation.

Other Highlights

  • Table Panel: New enhancements includes support for mapping a numeric value/range to text and additional units. More information in the Table panel documentation.
  • New variable interpolation syntax: We now support a new option for rendering variables that gives the user full control of how the value(s) should be rendered. More details in the in the Variables documentation.
  • Improved workflow for provisioned dashboards. More details here.

Changelog

Checkout the CHANGELOG.md file for a complete list
of new features, changes, and bug fixes.

Invent new sounds with Google’s NSynth Super

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/google-nsynth-super/

Discover new sounds and explore the role of machine learning in music production and sound research with the NSynth Super, an ongoing project from Google’s Magenta research team that you can build at home.

Google Open NSynth Super Testing

Uploaded by AB Open on 2018-04-17.

What is the NSynth Super?

Part of the ongoing Magenta research project within Google, NSynth Super explores the ways in which machine learning tools help artists and musicians be creative.

Google Nsynth Super Raspberry Pi

“Technology has always played a role in creating new types of sounds that inspire musicians — from the sounds of distortion to the electronic sounds of synths,” explains the team behind the NSynth Super. “Today, advances in machine learning and neural networks have opened up new possibilities for sound generation.”

Using TensorFlow, the Magenta team builds tools and interfaces that let  artists and musicians use machine learning in their work. The NSynth Super AI algorithm uses deep neural networking to investigate the character of sounds. It then builds new sounds based on these characteristics instead of simply mixing sounds together.

Using an autoencoder, it extracts 16 defining temporal features from each input. These features are then interpolated linearly to create new embeddings (mathematical representations of each sound). These new embeddings are then decoded into new sounds, which have the acoustic qualities of both inputs.

The team publishes all hardware designs and software that are part of their ongoing research under open-source licences, allowing you to build your own synth.

Build your own NSynth Super

Using these open-source tools, Andrew Black has produced his own NSynth Super, demoed in the video above. Andrew’s list of build materials includes a Raspberry Pi 3, potentiometers, rotary encoders, and the Adafruit 1.3″ OLED display. Magenta also provides Gerber files for you to fabricate your own PCB.

Google Nsynth Super Raspberry Pi

Once fabricated, the PCB includes a table of contents for adding components.

The build isn’t easy — it requires soldering skills or access to someone who can assemble PCBs. Take a look at Andrew’s blog post and the official NSynth GitHub repo to see whether you’re up to the challenge.

Google Nsynth Super Raspberry Pi
Google Nsynth Super Raspberry Pi
Google Nsynth Super Raspberry Pi

Music and Raspberry Pi

The Raspberry Pi has been widely used for music production and music builds. Be it retrofitting a boombox, distributing music atop Table Mountain, or coding tracks with Sonic Pi, the Pi offers endless opportunities for musicians and music lovers to expand their repertoire of builds and instruments.

If you’d like to try more music-based projects using the Raspberry Pi, you can check out our free resources. And if you’ve used a Raspberry Pi in your own musical project, please share it with us in the comments or via our social network accounts.

The post Invent new sounds with Google’s NSynth Super appeared first on Raspberry Pi.

JavaScript got better while I wasn’t looking

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/07/javascript-got-better-while-i-wasnt-looking/

IndustrialRobot has generously donated in order to inquire:

In the last few years there seems to have been a lot of activity with adding emojis to Unicode. Has there been an equal effort to add ‘real’ languages/glyph systems/etc?

And as always, if you don’t have anything to say on that topic, feel free to choose your own. :p

Yes.

I mean, each release of Unicode lists major new additions right at the top — Unicode 10, Unicode 9, Unicode 8, etc. They also keep fastidious notes, so you can also dig into how and why these new scripts came from, by reading e.g. the proposal for the addition of Zanabazar Square. I don’t think I have much to add here; I’m not a real linguist, I only play one on TV.

So with that out of the way, here’s something completely different!

A brief history of JavaScript

JavaScript was created in seven days, about eight thousand years ago. It was pretty rough, and it stayed rough for most of its life. But that was fine, because no one used it for anything besides having a trail of sparkles follow your mouse on their Xanga profile.

Then people discovered you could actually do a handful of useful things with JavaScript, and it saw a sharp uptick in usage. Alas, it stayed pretty rough. So we came up with polyfills and jQuerys and all kinds of miscellaneous things that tried to smooth over the rough parts, to varying degrees of success.

And… that’s it. That’s pretty much how things stayed for a while.


I have complicated feelings about JavaScript. I don’t hate it… but I certainly don’t enjoy it, either. It has some pretty neat ideas, like prototypical inheritance and “everything is a value”, but it buries them under a pile of annoying quirks and a woefully inadequate standard library. The DOM APIs don’t make things much better — they seem to be designed as though the target language were Java, rarely taking advantage of any interesting JavaScript features. And the places where the APIs overlap with the language are a hilarious mess: I have to check documentation every single time I use any API that returns a set of things, because there are at least three totally different conventions for handling that and I can’t keep them straight.

The funny thing is that I’ve been fairly happy to work with Lua, even though it shares most of the same obvious quirks as JavaScript. Both languages are weakly typed; both treat nonexistent variables and keys as simply false values, rather than errors; both have a single data structure that doubles as both a list and a map; both use 64-bit floating-point as their only numeric type (though Lua added integers very recently); both lack a standard object model; both have very tiny standard libraries. Hell, Lua doesn’t even have exceptions, not really — you have to fake them in much the same style as Perl.

And yet none of this bothers me nearly as much in Lua. The differences between the languages are very subtle, but combined they make a huge impact.

  • Lua has separate operators for addition and concatenation, so + is never ambiguous. It also has printf-style string formatting in the standard library.

  • Lua’s method calls are syntactic sugar: foo:bar() just means foo.bar(foo). Lua doesn’t even have a special this or self value; the invocant just becomes the first argument. In contrast, JavaScript invokes some hand-waved magic to set its contextual this variable, which has led to no end of confusion.

  • Lua has an iteration protocol, as well as built-in iterators for dealing with list-style or map-style data. JavaScript has a special dedicated Array type and clumsy built-in iteration syntax.

  • Lua has operator overloading and (surprisingly flexible) module importing.

  • Lua allows the keys of a map to be any value (though non-scalars are always compared by identity). JavaScript implicitly converts keys to strings — and since there’s no operator overloading, there’s no way to natively fix this.

These are fairly minor differences, in the grand scheme of language design. And almost every feature in Lua is implemented in a ridiculously simple way; in fact the entire language is described in complete detail in a single web page. So writing JavaScript is always frustrating for me: the language is so close to being much more ergonomic, and yet, it isn’t.

Or, so I thought. As it turns out, while I’ve been off doing other stuff for a few years, browser vendors have been implementing all this pie-in-the-sky stuff from “ES5” and “ES6”, whatever those are. People even upgrade their browsers now. Lo and behold, the last time I went to write JavaScript, I found out that a number of papercuts had actually been solved, and the solutions were sufficiently widely available that I could actually use them in web code.

The weird thing is that I do hear a lot about JavaScript, but the feature I’ve seen raved the most about by far is probably… built-in types for working with arrays of bytes? That’s cool and all, but not exactly the most pressing concern for me.

Anyway, if you also haven’t been keeping tabs on the world of JavaScript, here are some things we missed.

let

MDN docs — supported in Firefox 44, Chrome 41, IE 11, Safari 10

I’m pretty sure I first saw let over a decade ago. Firefox has supported it for ages, but you actually had to opt in by specifying JavaScript version 1.7. Remember JavaScript versions? You know, from back in the days when people actually suggested you write stuff like this:

1
<SCRIPT LANGUAGE="JavaScript1.2" TYPE="text/javascript">

Yikes.

Anyway, so, let declares a variable — but scoped to the immediately containing block, unlike var, which scopes to the innermost function. The trouble with var was that it was very easy to make misleading:

1
2
3
4
5
6
// foo exists here
while (true) {
    var foo = ...;
    ...
}
// foo exists here too

If you reused the same temporary variable name in a different block, or if you expected to be shadowing an outer foo, or if you were trying to do something with creating closures in a loop, this would cause you some trouble.

But no more, because let actually scopes the way it looks like it should, the way variable declarations do in C and friends. As an added bonus, if you refer to a variable declared with let outside of where it’s valid, you’ll get a ReferenceError instead of a silent undefined value. Hooray!

There’s one other interesting quirk to let that I can’t find explicitly documented. Consider:

1
2
3
4
5
6
7
let closures = [];
for (let i = 0; i < 4; i++) {
    closures.push(function() { console.log(i); });
}
for (let j = 0; j < closures.length; j++) {
    closures[j]();
}

If this code had used var i, then it would print 4 four times, because the function-scoped var i means each closure is sharing the same i, whose final value is 4. With let, the output is 0 1 2 3, as you might expect, because each run through the loop gets its own i.

But wait, hang on.

The semantics of a C-style for are that the first expression is only evaluated once, at the very beginning. So there’s only one let i. In fact, it makes no sense for each run through the loop to have a distinct i, because the whole idea of the loop is to modify i each time with i++.

I assume this is simply a special case, since it’s what everyone expects. We expect it so much that I can’t find anyone pointing out that the usual explanation for why it works makes no sense. It has the interesting side effect that for no longer de-sugars perfectly to a while, since this will print all 4s:

1
2
3
4
5
6
7
8
9
closures = [];
let i = 0;
while (i < 4) {
    closures.push(function() { console.log(i); });
    i++;
}
for (let j = 0; j < closures.length; j++) {
    closures[j]();
}

This isn’t a problem — I’m glad let works this way! — it just stands out to me as interesting. Lua doesn’t need a special case here, since it uses an iterator protocol that produces values rather than mutating a visible state variable, so there’s no problem with having the loop variable be truly distinct on each run through the loop.

Classes

MDN docs — supported in Firefox 45, Chrome 42, Safari 9, Edge 13

Prototypical inheritance is pretty cool. The way JavaScript presents it is a little bit opaque, unfortunately, which seems to confuse a lot of people. JavaScript gives you enough functionality to make it work, and even makes it sound like a first-class feature with a property outright called prototype… but to actually use it, you have to do a bunch of weird stuff that doesn’t much look like constructing an object or type.

The funny thing is, people with almost any background get along with Python just fine, and Python uses prototypical inheritance! Nobody ever seems to notice this, because Python tucks it neatly behind a class block that works enough like a Java-style class. (Python also handles inheritance without using the prototype, so it’s a little different… but I digress. Maybe in another post.)

The point is, there’s nothing fundamentally wrong with how JavaScript handles objects; the ergonomics are just terrible.

Lo! They finally added a class keyword. Or, rather, they finally made the class keyword do something; it’s been reserved this entire time.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

This is all just sugar for existing features: creating a Vector function to act as the constructor, assigning a function to Vector.prototype.dot, and whatever it is you do to make a property. (Oh, there are properties. I’ll get to that in a bit.)

The class block can be used as an expression, with or without a name. It also supports prototypical inheritance with an extends clause and has a super pseudo-value for superclass calls.

It’s a little weird that the inside of the class block has its own special syntax, with function omitted and whatnot, but honestly you’d have a hard time making a class block without special syntax.

One severe omission here is that you can’t declare values inside the block, i.e. you can’t just drop a bar = 3; in there if you want all your objects to share a default attribute. The workaround is to just do this.bar = 3; inside the constructor, but I find that unsatisfying, since it defeats half the point of using prototypes.

Properties

MDN docs — supported in Firefox 4, Chrome 5, IE 9, Safari 5.1

JavaScript historically didn’t have a way to intercept attribute access, which is a travesty. And by “intercept attribute access”, I mean that you couldn’t design a value foo such that evaluating foo.bar runs some code you wrote.

Exciting news: now it does. Or, rather, you can intercept specific attributes, like in the class example above. The above magnitude definition is equivalent to:

1
2
3
4
5
6
7
Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});

Beautiful.

And what even are these configurable and enumerable things? It seems that every single key on every single object now has its own set of three Boolean twiddles:

  • configurable means the property itself can be reconfigured with another call to Object.defineProperty.
  • enumerable means the property appears in for..in or Object.keys().
  • writable means the property value can be changed, which only applies to properties with real values rather than accessor functions.

The incredibly wild thing is that for properties defined by Object.defineProperty, configurable and enumerable default to false, meaning that by default accessor properties are immutable and invisible. Super weird.

Nice to have, though. And luckily, it turns out the same syntax as in class also works in object literals.

1
2
3
4
5
6
Vector.prototype = {
    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
    ...
};

Alas, I’m not aware of a way to intercept arbitrary attribute access.

Another feature along the same lines is Object.seal(), which marks all of an object’s properties as non-configurable and prevents any new properties from being added to the object. The object is still mutable, but its “shape” can’t be changed. And of course you can just make the object completely immutable if you want, via setting all its properties non-writable, or just using Object.freeze().

I have mixed feelings about the ability to irrevocably change something about a dynamic runtime. It would certainly solve some gripes of former Haskell-minded colleagues, and I don’t have any compelling argument against it, but it feels like it violates some unwritten contract about dynamic languages — surely any structural change made by user code should also be able to be undone by user code?

Slurpy arguments

MDN docs — supported in Firefox 15, Chrome 47, Edge 12, Safari 10

Officially this feature is called “rest parameters”, but that’s a terrible name, no one cares about “arguments” vs “parameters”, and “slurpy” is a good word. Bless you, Perl.

1
2
3
function foo(a, b, ...args) {
    // ...
}

Now you can call foo with as many arguments as you want, and every argument after the second will be collected in args as a regular array.

You can also do the reverse with the spread operator:

1
2
3
4
5
let args = [];
args.push(1);
args.push(2);
args.push(3);
foo(...args);

It even works in array literals, even multiple times:

1
2
let args2 = [...args, ...args];
console.log(args2);  // [1, 2, 3, 1, 2, 3]

Apparently there’s also a proposal for allowing the same thing with objects inside object literals.

Default arguments

MDN docs — supported in Firefox 15, Chrome 49, Edge 14, Safari 10

Yes, arguments can have defaults now. It’s more like Sass than Python — default expressions are evaluated once per call, and later default expressions can refer to earlier arguments. I don’t know how I feel about that but whatever.

1
2
3
function foo(n = 1, m = n + 1, list = []) {
    ...
}

Also, unlike Python, you can have an argument with a default and follow it with an argument without a default, since the default default (!) is and always has been defined as undefined. Er, let me just write it out.

1
2
3
function bar(a = 5, b) {
    ...
}

Arrow functions

MDN docs — supported in Firefox 22, Chrome 45, Edge 12, Safari 10

Perhaps the most humble improvement is the arrow function. It’s a slightly shorter way to write an anonymous function.

1
2
3
(a, b, c) => { ... }
a => { ... }
() => { ... }

An arrow function does not set this or some other magical values, so you can safely use an arrow function as a quick closure inside a method without having to rebind this. Hooray!

Otherwise, arrow functions act pretty much like regular functions; you can even use all the features of regular function signatures.

Arrow functions are particularly nice in combination with all the combinator-style array functions that were added a while ago, like Array.forEach.

1
2
3
[7, 8, 9].forEach(value => {
    console.log(value);
});

Symbol

MDN docs — supported in Firefox 36, Chrome 38, Edge 12, Safari 9

This isn’t quite what I’d call an exciting feature, but it’s necessary for explaining the next one. It’s actually… extremely weird.

symbol is a new kind of primitive (like number and string), not an object (like, er, Number and String). A symbol is created with Symbol('foo'). No, not new Symbol('foo'); that throws a TypeError, for, uh, some reason.

The only point of a symbol is as a unique key. You see, symbols have one very special property: they can be used as object keys, and will not be stringified. Remember, only strings can be keys in JavaScript — even the indices of an array are, semantically speaking, still strings. Symbols are a new exception to this rule.

Also, like other objects, two symbols don’t compare equal to each other: Symbol('foo') != Symbol('foo').

The result is that symbols solve one of the problems that plauges most object systems, something I’ve talked about before: interfaces. Since an interface might be implemented by any arbitrary type, and any arbitrary type might want to implement any number of arbitrary interfaces, all the method names on an interface are effectively part of a single global namespace.

I think I need to take a moment to justify that. If you have IFoo and IBar, both with a method called method, and you want to implement both on the same type… you have a problem. Because most object systems consider “interface” to mean “I have a method called method, with no way to say which interface’s method you mean. This is a hard problem to avoid, because IFoo and IBar might not even come from the same library. Occasionally languages offer a clumsy way to “rename” one method or the other, but the most common approach seems to be for interface designers to avoid names that sound “too common”. You end up with redundant mouthfuls like IFoo.foo_method.

This incredibly sucks, and the only languages I’m aware of that avoid the problem are the ML family and Rust. In Rust, you define all the methods for a particular trait (interface) in a separate block, away from the type’s “own” methods. It’s pretty slick. You can still do obj.method(), and as long as there’s only one method among all the available traits, you’ll get that one. If not, there’s syntax for explicitly saying which trait you mean, which I can’t remember because I’ve never had to use it.

Symbols are JavaScript’s answer to this problem. If you want to define some interface, you can name its methods with symbols, which are guaranteed to be unique. You just have to make sure you keep the symbol around somewhere accessible so other people can actually use it. (Or… not?)

The interesting thing is that JavaScript now has several of its own symbols built in, allowing user objects to implement features that were previously reserved for built-in types. For example, you can use the Symbol.hasInstance symbol — which is simply where the language is storing an existing symbol and is not the same as Symbol('hasInstance')! — to override instanceof:

1
2
3
4
5
6
7
8
// oh my god don't do this though
class EvenNumber {
    static [Symbol.hasInstance](obj) {
        return obj % 2 == 0;
    }
}
console.log(2 instanceof EvenNumber);  // true
console.log(3 instanceof EvenNumber);  // false

Oh, and those brackets around Symbol.hasInstance are a sort of reverse-quoting — they indicate an expression to use where the language would normally expect a literal identifier. I think they work as object keys, too, and maybe some other places.

The equivalent in Python is to implement a method called __instancecheck__, a name which is not special in any way except that Python has reserved all method names of the form __foo__. That’s great for Python, but doesn’t really help user code. JavaScript has actually outclassed (ho ho) Python here.

Of course, obj[BobNamespace.some_method]() is not the prettiest way to call an interface method, so it’s not perfect. I imagine this would be best implemented in user code by exposing a polymorphic function, similar to how Python’s len(obj) pretty much just calls obj.__len__().

I only bring this up because it’s the plumbing behind one of the most incredible things in JavaScript that I didn’t even know about until I started writing this post. I’m so excited oh my gosh. Are you ready? It’s:

Iteration protocol

MDN docs — supported in Firefox 27, Chrome 39, Safari 10; still experimental in Edge

Yes! Amazing! JavaScript has first-class support for iteration! I can’t even believe this.

It works pretty much how you’d expect, or at least, how I’d expect. You give your object a method called Symbol.iterator, and that returns an iterator.

What’s an iterator? It’s an object with a next() method that returns the next value and whether the iterator is exhausted.

Wait, wait, wait a second. Hang on. The method is called next? Really? You didn’t go for Symbol.next? Python 2 did exactly the same thing, then realized its mistake and changed it to __next__ in Python 3. Why did you do this?

Well, anyway. My go-to test of an iterator protocol is how hard it is to write an equivalent to Python’s enumerate(), which takes a list and iterates over its values and their indices. In Python it looks like this:

1
2
3
4
5
for i, value in enumerate(['one', 'two', 'three']):
    print(i, value)
# 0 one
# 1 two
# 2 three

It’s super nice to have, and I’m always amazed when languages with “strong” “support” for iteration don’t have it. Like, C# doesn’t. So if you want to iterate over a list but also need indices, you need to fall back to a C-style for loop. And if you want to iterate over a lazy or arbitrary iterable but also need indices, you need to track it yourself with a counter. Ridiculous.

Here’s my attempt at building it in JavaScript.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function enumerate(iterable) {
    // Return a new iter*able* object with a Symbol.iterator method that
    // returns an iterator.
    return {
        [Symbol.iterator]: function() {
            let iterator = iterable[Symbol.iterator]();
            let i = 0;

            return {
                next: function() {
                    let nextval = iterator.next();
                    if (! nextval.done) {
                        nextval.value = [i, nextval.value];
                        i++;
                    }
                    return nextval;
                },
            };
        },
    };
}
for (let [i, value] of enumerate(['one', 'two', 'three'])) {
    console.log(i, value);
}
// 0 one
// 1 two
// 2 three

Incidentally, for..of (which iterates over a sequence, unlike for..in which iterates over keys — obviously) is finally supported in Edge 12. Hallelujah.

Oh, and let [i, value] is destructuring assignment, which is also a thing now and works with objects as well. You can even use the splat operator with it! Like Python! (And you can use it in function signatures! Like Python! Wait, no, Python decided that was terrible and removed it in 3…)

1
let [x, y, ...others] = ['apple', 'orange', 'cherry', 'banana'];

It’s a Halloween miracle. 🎃

Generators

MDN docs — supported in Firefox 26, Chrome 39, Edge 13, Safari 10

That’s right, JavaScript has goddamn generators now. It’s basically just copying Python and adding a lot of superfluous punctuation everywhere. Not that I’m complaining.

Also, generators are themselves iterable, so I’m going to cut to the chase and rewrite my enumerate() with a generator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function enumerate(iterable) {
    return {
        [Symbol.iterator]: function*() {
            let i = 0;
            for (let value of iterable) {
                yield [i, value];
                i++;
            }
        },
    };
}
for (let [i, value] of enumerate(['one', 'two', 'three'])) {
    console.log(i, value);
}
// 0 one
// 1 two
// 2 three

Amazing. function* is a pretty strange choice of syntax, but whatever? I guess it also lets them make yield only act as a keyword inside a generator, for ultimate backwards compatibility.

JavaScript generators support everything Python generators do: yield* yields every item from a subsequence, like Python’s yield from; generators can return final values; you can pass values back into the generator if you iterate it by hand. No, really, I wasn’t kidding, it’s basically just copying Python. It’s great. You could now built asyncio in JavaScript!

In fact, they did that! JavaScript now has async and await. An async function returns a Promise, which is also a built-in type now. Amazing.

Sets and maps

MDN docs for MapMDN docs for Set — supported in Firefox 13, Chrome 38, IE 11, Safari 7.1

I did not save the best for last. This is much less exciting than generators. But still exciting.

The only data structure in JavaScript is the object, a map where the strings are keys. (Or now, also symbols, I guess.) That means you can’t readily use custom values as keys, nor simulate a set of arbitrary objects. And you have to worry about people mucking with Object.prototype, yikes.

But now, there’s Map and Set! Wow.

Unfortunately, because JavaScript, Map couldn’t use the indexing operators without losing the ability to have methods, so you have to use a boring old method-based API. But Map has convenient methods that plain objects don’t, like entries() to iterate over pairs of keys and values. In fact, you can use a map with for..of to get key/value pairs. So that’s nice.

Perhaps more interesting, there’s also now a WeakMap and WeakSet, where the keys are weak references. I don’t think JavaScript had any way to do weak references before this, so that’s pretty slick. There’s no obvious way to hold a weak value, but I guess you could substitute a WeakSet with only one item.

Template literals

MDN docs — supported in Firefox 34, Chrome 41, Edge 12, Safari 9

Template literals are JavaScript’s answer to string interpolation, which has historically been a huge pain in the ass because it doesn’t even have string formatting in the standard library.

They’re just strings delimited by backticks instead of quotes. They can span multiple lines and contain expressions.

1
2
console.log(`one plus
two is ${1 + 2}`);

Someone decided it would be a good idea to allow nesting more sets of backticks inside a ${} expression, so, good luck to syntax highlighters.

However, someone also had the most incredible idea ever, which was to add syntax allowing user code to do the interpolation — so you can do custom escaping, when absolutely necessary, which is virtually never, because “escaping” means you’re building a structured format by slopping strings together willy-nilly instead of using some API that works with the structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// OF COURSE, YOU SHOULDN'T BE DOING THIS ANYWAY; YOU SHOULD BUILD HTML WITH
// THE DOM API AND USE .textContent FOR LITERAL TEXT.  BUT AS AN EXAMPLE:
function html(literals, ...values) {
    let ret = [];
    literals.forEach((literal, i) => {
        if (i > 0) {
            // Is there seriously still not a built-in function for doing this?
            // Well, probably because you SHOULDN'T BE DOING IT
            ret.push(values[i - 1]
                .replace(/&/g, '&amp;')
                .replace(/</g, '&lt;')
                .replace(/>/g, '&gt;')
                .replace(/"/g, '&quot;')
                .replace(/'/g, '&apos;'));
        }
        ret.push(literal);
    });
    return ret.join('');
}
let username = 'Bob<script>';
let result = html`<b>Hello, ${username}!</b>`;
console.log(result);
// <b>Hello, Bob&lt;script&gt;!</b>

It’s a shame this feature is in JavaScript, the language where you are least likely to need it.

Trailing commas

Remember how you couldn’t do this for ages, because ass-old IE considered it a syntax error and would reject the entire script?

1
2
3
4
5
{
    a: 'one',
    b: 'two',
    c: 'three',  // <- THIS GUY RIGHT HERE
}

Well now it’s part of the goddamn spec and if there’s anything in this post you can rely on, it’s this. In fact you can use AS MANY GODDAMN TRAILING COMMAS AS YOU WANT. But only in arrays.

1
[1, 2, 3,,,,,,,,,,,,,,,,,,,,,,,,,]

Apparently that has the bizarre side effect of reserving extra space at the end of the array, without putting values there.

And more, probably

Like strict mode, which makes a few silent “errors” be actual errors, forces you to declare variables (no implicit globals!), and forbids the completely bozotic with block.

Or String.trim(), which trims whitespace off of strings.

Or… Math.sign()? That’s new? Seriously? Well, okay.

Or the Proxy type, which lets you customize indexing and assignment and calling. Oh. I guess that is possible, though this is a pretty weird way to do it; why not just use symbol-named methods?

You can write Unicode escapes for astral plane characters in strings (or identifiers!), as \u{XXXXXXXX}.

There’s a const now? I extremely don’t care, just name it in all caps and don’t reassign it, come on.

There’s also a mountain of other minor things, which you can peruse at your leisure via MDN or the ECMAScript compatibility tables (note the links at the top, too).

That’s all I’ve got. I still wouldn’t say I’m a big fan of JavaScript, but it’s definitely making an effort to clean up some goofy inconsistencies and solve common problems. I think I could even write some without yelling on Twitter about it now.

On the other hand, if you’re still stuck supporting IE 10 for some reason… well, er, my condolences.

Grafana 4.5 Released

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/09/13/grafana-4.5-released/

Grafana v4.5 is now available for download. This release has some really significant improvements to Prometheus, Elasticsearch, MySQL and to the Table panel.

Prometheus Query Editor

The new query editor has full syntax highlighting. As well as auto complete for metrics, functions, and range vectors. There is also integrated function docs right from the query editor!

Elasticsearch: Add ad-hoc filters from the table panel

Create column styles that turn cells into links that use the value in the cell (or other other row values) to generate a url to another dashboard or system. Useful for
using the table panel as way to drilldown into dashboard with more detail or to ticket system for example.

Query Inspector

Query Inspector is a new feature that shows query requests and responses. This can be helpful if a graph is not shown or shows something very different than what you expected.
More information here.

Changelog

New Features

  • Table panel: Render cell values as links that can have an url template that uses variables from current table row. #3754
  • Elasticsearch: Add ad hoc filters directly by clicking values in table panel #8052.
  • MySQL: New rich query editor with syntax highlighting
  • Prometheus: New rich query editor with syntax highlighting, metric & range auto complete and integrated function docs. #5117

Enhancements

  • GitHub OAuth: Support for GitHub organizations with 100+ teams. #8846, thx @skwashd
  • Graphite: Calls to Graphite api /metrics/find now include panel or dashboad time range (from & until) in most cases, #8055
  • Graphite: Added new graphite 1.0 functions, available if you set version to 1.0.x in data source settings. New Functions: mapSeries, reduceSeries, isNonNull, groupByNodes, offsetToZero, grep, weightedAverage, removeEmptySeries, aggregateLine, averageOutsidePercentile, delay, exponentialMovingAverage, fallbackSeries, integralByInterval, interpolate, invert, linearRegression, movingMin, movingMax, movingSum, multiplySeriesWithWildcards, pow, powSeries, removeBetweenPercentile, squareRoot, timeSlice, closes #8261
  • Elasticsearch: Ad-hoc filters now use query phrase match filters instead of term filters, works on non keyword/raw fields #9095.

Breaking change

  • InfluxDB/Elasticsearch: The panel & data source option named “Group by time interval” is now named “Min time interval” and does now always define a lower limit for the auto group by time. Without having to use > prefix (that prefix still works). This should in theory have close to zero actual impact on existing dashboards. It does mean that if you used this setting to define a hard group by time interval of, say “1d”, if you zoomed to a time range wide enough the time range could increase above the “1d” range as the setting is now always considered a lower limit.

This option is now rennamed (and moved to Options sub section above your queries):
image|519x120

Datas source selection & options & help are now above your metric queries.
image|690x179

Minor Changes

  • InfluxDB: Change time range filter for absolute time ranges to be inclusive instead of exclusive #8319, thx @Oxydros
  • InfluxDB: Added paranthesis around tag filters in queries #9131

Bug Fixes

  • Modals: Maintain scroll position after opening/leaving modal #8800
  • Templating: You cannot select data source variables as data source for other template variables #7510
  • Security: Security fix for api vulnerability (in multiple org setups).

Download

Head to the v4.5 download page for download links & instructions.

Thanks

A big thanks to all the Grafana users who contribute by submitting PRs, bug reports, helping out on our community site and providing feedback!

Kotlin and Groovy JVM Languages with AWS Lambda

Post Syndicated from Juan Villa original https://aws.amazon.com/blogs/compute/kotlin-and-groovy-jvm-languages-with-aws-lambda/


Juan Villa – Partner Solutions Architect

 

When most people hear “Java” they think of Java the programming language. Java is a lot more than a programming language, it also implies a larger ecosystem including the Java Virtual Machine (JVM). Java, the programming language, is just one of the many languages that can be compiled to run on the JVM. Some of the most popular JVM languages, other than Java, are Clojure, Groovy, Scala, Kotlin, JRuby, and Jython (see this link for a list of more JVM languages).

Did you know that you can compile and subsequently run all these languages on AWS Lambda?

AWS Lambda supports the Java 8 runtime, but this does not mean you are limited to the Java language. The Java 8 runtime is capable of running JVM languages such as Kotlin and Groovy once they have been compiled and packaged as a “fat” JAR (a JAR file containing all necessary dependencies and classes bundled in).

In this blog post we’ll work through building AWS Lambda functions in both Kotlin and Groovy programming languages. To compile and package our projects we will use Gradle build tool.

To follow along, please clone the Git repository available at GitHub here. Also, I recommend using an Integrated Development Environment (IDE) such as JetBrain’s IntelliJ IDEA, this is the IDE I used while working on these projects.

Kotlin

Kotlin is a statically-typed JVM language designed and developed by JetBrains (one of our Amazon Partner Network Technology partners) and the open source community. Compared to Java the programming language, Kotlin has additional powerful language features such as: Data Classes, Default Arguments, Extensions, Elvis Operator, and Destructuring Declarations. This is a just a short list of Kotlin’s powerful language features. For a more thorough list of features, and how to use them, refer to the full documentation of the Kotlin language.

Let’s jump right into the code and see what an AWS Lambda function looks like in Kotlin.

package com.aws.blog.jvmlangs.kotlin

import java.io.*
import com.fasterxml.jackson.module.kotlin.*

data class HandlerInput(val who: String)
data class HandlerOutput(val message: String)

class Main {
    val mapper = jacksonObjectMapper()

    fun handler(input: InputStream, output: OutputStream): Unit {
        val inputObj = mapper.readValue<HandlerInput>(input)
        mapper.writeValue(output, HandlerOutput("Hello ${inputObj.who}"))
    }
}

The above example is a very simple Hello World application that accepts as an input a JSON object containing a key called “who” and returns a JSON object containing a key called “message” with a value of “Hello {who}”.

AWS Lambda does not support serializing JSON objects into Kotlin data classes, but don’t worry! AWS Lambda supports passing an input object as a Stream, and also supports an output Stream for returning a result (see this link for more information). Combined with the Input/Output Stream form of the handler function, we are using the Jackson library with a Kotlin extension function to support serialization and deserialization of Kotlin data class types.

To get started with this example, let’s first compile and package the Kotlin project.

git clone https://github.com/awslabs/lambda-kotlin-groovy-example
cd lambda-kotlin-groovy-example/kotlin
./gradlew shadowJar

Once packaged, a JAR file containing all necessary dependencies will be available at “build/libs/ jvmlangs-kotlin-1.0-SNAPSHOT-all.jar”. Now let’s deploy this package to AWS Lambda.

To deploy the lambda function, we will be using the AWS Command Line Interface (CLI). You can find information on how to set up the AWS CLI here. This tool allows you to set up and manage AWS services via the command line.

aws lambda create-function --region us-east-1 --function-name kotlin-hello \
--zip-file fileb://build/libs/jvmlangs-kotlin-1.0-SNAPSHOT-all.jar \
--role arn:aws:iam::<account_id>:role/lambda_basic_execution \
--handler com.aws.blog.jvmlangs.kotlin.Main::handler --runtime java8 \
--timeout 15 --memory-size 128

Once deployed, we can test the function by invoking the lambda function from the CLI.

aws lambda invoke --function-name kotlin-hello --payload '{"who": "AWS Fan"}' output.txt
cat output.txt

If successful, you’ll see an output of “{"message":"Hello AWS Fan"}”.

Groovy

Groovy is an optionally typed JVM language with both dynamic and static typing capabilities. Groovy is currently being supported by the Apache Software Foundation. Like Kotlin, Groovy also packs a lot of powerful features such as: Closures, Dynamic Typing, Collection Literals, String Interpolation, and Elvis Operator. This is just a short list, see the full documentation for a list of features and how to use them.

Once again, let’s jump right into the code.

package com.aws.blog.jvmlangs.groovy

class HandlerInput {
    String who
}
class HandlerOutput {
    String message
}

class Main {
    def handler(HandlerInput input) {
        return new HandlerOutput(message: "Hello ${input.who}")
    }
}

Just like the Kotlin example, we have defined a function that takes a simple JSON object containing a “who” key value and build a response containing a “message” key. Note that in this case we are not using the Input/Output Stream form of the handler function, but rather we are letting AWS Lambda serialize the input JSON object into the type HandlerInput. To accomplish this, AWS Lambda uses the Jackson library and handles the serialization for us.

Let’s go ahead and compile and package this Groovy example.

git clone https://github.com/awslabs/lambda-kotlin-groovy-example
cd lambda-kotlin-groovy-example/groovy
./gradlew shadowJar

Once packaged, a JAR file containing all necessary dependencies will be available at “build/libs/ jvmlangs-groovy-1.0-SNAPSHOT-all.jar”. Now let’s deploy this package to AWS Lambda.

aws lambda create-function --region us-east-1 --function-name groovy-hello \
--zip-file fileb://build/libs/jvmlangs-groovy-1.0-SNAPSHOT-all.jar \
--role arn:aws:iam::<account_id>:role/lambda_basic_execution \
--handler com.aws.blog.jvmlangs.groovy.Main::handler --runtime java8 \
--timeout 15 --memory-size 128

Once deployed, we can test the function by invoking the lambda function from the CLI.

aws lambda invoke --function-name groovy-hello --payload '{"who": "AWS Fan"}' output.txt
cat output.txt

If successful, you’ll see an output of “{"message":"Hello AWS Fan"}”.

Gradle Build Tool

Finally, let’s touch up on how we built the JAR package from the Kotlin and Groovy sources above. To build the JARs we used the Gradle build tool. Gradle builds a project by reading instructions from a file called “build.gradle”. This is a file written in Gradle’s Groovy Domain Specific Langauge (DSL). You can find more information on the gradle build file by looking at their documentation. Let’s take a look at the Gradle build files we used for this post.

For the Kotlin example, this is the build file we used.

buildscript {
    repositories {
        mavenCentral()
        jcenter()
    }
    dependencies {
        classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
        classpath "com.github.jengelman.gradle.plugins:shadow:1.2.3"
    }
}

group 'com.aws.blog.jvmlangs.kotlin'
version '1.0-SNAPSHOT'

apply plugin: 'kotlin'
apply plugin: 'com.github.johnrengelman.shadow'

repositories {
    mavenCentral()
}

dependencies {
    compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version"
    compile "com.fasterxml.jackson.module:jackson-module-kotlin:2.8.2"
}

For the Groovy example this is the build file we used.

buildscript {
    repositories {
        jcenter()
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:1.2.3'
    }
}

group 'com.aws.blog.jvmlangs.groovy'
version '1.0-SNAPSHOT'

apply plugin: 'groovy'
apply plugin: 'com.github.johnrengelman.shadow'

repositories {
    mavenCentral()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.11'
    testCompile group: 'junit', name: 'junit', version: '4.11'
}

As you can see, the build files for both Kotlin and Groovy files are very similar. For the Kotlin project we define a dependency on the Jackson Kotlin module. Also, for each respective language we include the language supporting libraries (kotlin-stdlib and groovy-all respectively).

In addition, you will notice that we are using a plugin called “shadow”. We use this plugin to package all the project dependencies into one JAR by using the Gradle task “shadowJar”. You can find more information on Shadow in their documentation.

Final Words

Don’t stop here though! Take a look at other JVM languages and get them running on AWS Lambda with the Java 8 runtime. Maybe start with Clojure? or Scala?

Also take a look AWS Lambda Java libraries provided by AWS. They provide interfaces and models to make handling events from event sources easier to handle.

Grafana 4.3 Release

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/05/23/grafana-4.3-release/

Grafana v4.3 is now available for download.

Release Highlights

  • New Heatmap Panel
  • Graph Panel Histogram Mode
  • Elasticsearch Histogram Aggregation
  • Prometheus Table data format
  • New MySQL Data Source (alpha version to get some early feedback)
  • Dashed lines in the Graph Panel
  • 60+ small fixes and improvements, most of them contributed by our fantastic community!

Check out the New Features in v4.3 Dashboard on the Grafana Play site for
a showcase of these new features.

Histogram Support

A Histogram is a kind of bar chart that groups numbers into ranges, often called buckets or bins. Taller bars show that more data falls in that range.

The Graph Panel now supports Histograms.

Histogram Aggregation Support for Elasticsearch

Elasticsearch is the only supported data source that can return pre-bucketed data (data that is already grouped into ranges). With other data sources there is a risk of returning inaccurate data in a histogram due to using already aggregated data rather than raw data. This release adds support for Elasticsearch pre-bucketed data that can be visualized with the new Heatmap Panel.

Heatmap Panel

The Histogram support in the Graph Panel does not show changes over time – it aggregates all the data together for the chosen time range. To visualize a histogram over time, we have built a new Heatmap Panel.

Every column in a Heatmap is a histogram snapshot. Instead of visualizing higher values with higher bars, a heatmap visualizes higher values with color. The histogram shown above is equivalent to one column in the heatmap shown below.

The Heatmap panel also works with Elasticsearch Histogram Aggregations for more accurate server side bucketing.

MySQL Data Source (alpha)

This release includes a new core data source for MySQL. You can write any possible MySQL query and format it as either Time Series or Table Data allowing it be used with the Graph Panel, Table Panel and SingleStat Panel.

We are still working on the MySQL data source. As it’s missing some important features, like templating and macros and future changes could be breaking, we are
labeling the state of the data source as Alpha. Instead of holding up the release of v4.3 we are including it in its current shape to get some early feedback. So please try it out and let us know what you think on twitter or on our community forum. Is this a feature that you would use? How can we make it better?

The query editor can show the generated and interpolated SQL that is sent to the MySQL server.

The query editor will also show any errors that resulted from running the query (very useful when you have a syntax error!).

Dashed Lines in the Graph Panel

A new Dashes option has been added to Series overrides in the Graph Panel.

Health Check Endpoint

Now you can monitor the monitoring with the Health Check Endpoint! The new /api/health endpoint returns HTTP 200 OK if everything is up and HTTP 503 Error if the Grafana database cannot be pinged.

Lazy Load Panels

Grafana now delays loading panels until they become visible (scrolled into view). This means panels out of view are not sending requests thereby reducing the load on your time series database.

Prometheus – Table Data (column per label)

The Prometheus data source now supports the Table Data format by automatically assigning a column to a label. This makes it really easy to browse data in the table panel.

Improved Alerting Annotation

When an alert is fired, the annotation on the graph now shows information about execution errors or if it was fired due to no data.

Other Highlights From The Changelog

Changes:

  • Table: Support to change column header text #3551
  • InfluxDB: influxdb query builder support for ORDER BY and LIMIT (allows TOPN queries) #6065 Support influxdb’s SLIMIT Feature #7232 thx @thuck
  • Graph: Support auto grid min/max when using log scale #3090, thx @bigbenhur
  • Prometheus: Make Prometheus query field a textarea #7663, thx @hagen1778
  • Server: Support listening on a UNIX socket #4030, thx @mitjaziv

Fixes:

  • MySQL: 4-byte UTF8 not supported when using MySQL database (allows Emojis in Dashboard Names) #7958
  • Dashboard: Description tooltip is not fully displayed #7970

Lots more enhancements and fixes can be found in the Changelog.

Download

Head to the v4.3 download page for download links & instructions.

Thanks

A big thanks to all the Grafana users who contribute by submitting PRs, bug reports, helping out on our community site and providing feedback!

Grafana 4.3 Beta Release

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/05/12/grafana-4.3-beta-release/

Grafana v4.3 Beta is now available for download.

Release Highlights

  • New Heatmap Panel
  • Graph Panel Histogram Mode
  • Elasticsearch Histogram Aggregation
  • Prometheus Table data format
  • New MySQL Data Source (alpha version to get some early feedback)
  • 60+ small fixes and improvements, most of them contributed by our fantastic community!

Check out the New Features in v4.3 Dashboard on the Grafana Play site for
a showcase of these new features.

Histogram Support

A Histogram is a kind of bar chart that groups numbers into ranges, often called buckets or bins. Taller bars show that more data falls in that range.

The Graph Panel now supports Histograms.

Histogram Aggregation Support for Elasticsearch

Elasticsearch is the only supported data source that can return pre-bucketed data (data that is already grouped into ranges). With other data sources there is a risk of returning inaccurate data in a histogram due to using already aggregated data rather than raw data. This release adds support for Elasticsearch pre-bucketed data that can be visualized with the new Heatmap Panel.

Heatmap Panel

The Histogram support in the Graph Panel does not show changes over time – it aggregates all the data together for the chosen time range. To visualize a histogram over time, we have built a new Heatmap Panel.

Every column in a Heatmap is a histogram snapshot. Instead of visualizing higher values with higher bars, a heatmap visualizes higher values with color. The histogram shown above is equivalent to one column in the heatmap shown below.

The Heatmap panel also works with Elasticsearch Histogram Aggregations for more accurate server side bucketing.

MySQL Data Source (alpha)

This release includes a new core data source for MySQL. You can write any possible MySQL query and format it as either Time Series or Table Data allowing it be used with the Graph Panel, Table Panel and SingleStat Panel.

We are still working on the MySQL data source. As it’s missing some important features, like templating and macros and future changes could be breaking, we are
labeling the state of the data source as Alpha. Instead of holding up the release of v4.3 we are including it in its current shape to get some early feedback. So please try it out and let us know what you think on twitter or on our community forum. Is this a feature that you would use? How can we make it better?

The query editor can show the generated and interpolated SQL that is sent to the MySQL server.

The query editor will also show any errors that resulted from running the query (very useful when you have a syntax error!).

Health Check Endpoint

Now you can monitor the monitoring with the Health Check Endpoint! The new /api/health endpoint returns HTTP 200 OK if everything is up and HTTP 503 Error if the Grafana database cannot be pinged.

Lazy Load Panels

Grafana now delays loading panels until they become visible (scrolled into view). This means panels out of view are not sending requests thereby reducing the load on your time series database.

Prometheus – Table Data (column per label)

The Prometheus data source now supports the Table Data format by automatically assigning a column to a label. This makes it really easy to browse data in the table panel.

Other Highlights From The Changelog

Changes:

  • Table: Support to change column header text #3551
  • InfluxDB: influxdb query builder support for ORDER BY and LIMIT (allows TOPN queries) #6065 Support influxdb’s SLIMIT Feature #7232 thx @thuck
  • Graph: Support auto grid min/max when using log scale #3090, thx @bigbenhur
  • Prometheus: Make Prometheus query field a textarea #7663, thx @hagen1778
  • Server: Support listening on a UNIX socket #4030, thx @mitjaziv

Fixes:

  • MySQL: 4-byte UTF8 not supported when using MySQL database (allows Emojis in Dashboard Names) #7958
  • Dashboard: Description tooltip is not fully displayed #7970

Lots more enhancements and fixes can be found in the Changelog.

Download

Head to the v4.3 download page for download links & instructions.

Thanks

A big thanks to all the Grafana users who contribute by submitting PRs, bug reports, helping out on our community site and providing feedback!

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.

A Rebuttal For Python 3

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/

Zed Shaw, of Learn Python the Hard Way fame, has now written The Case Against Python 3.

I’m not involved with core Python development. The only skin I have in this game is that I like Python 3. It’s a good language. And one of the big factors I’ve seen slowing its adoption is that respected people in the Python community keep grouching about it. I’ve had multiple newcomers tell me they have the impression that Python 3 is some kind of unusable disaster, though they don’t know exactly why; it’s just something they hear from people who sound like they know what they’re talking about. Then they actually use the language, and it’s fine.

I’m sad to see the Python community needlessly sabotage itself, but Zed’s contribution is beyond the pale. It’s not just making a big deal about changed details that won’t affect most beginners; it’s complete and utter nonsense, on a platform aimed at people who can’t yet recognize it as nonsense. I am so mad.

The Case Against Python 3

I give two sets of reasons as I see them now. One for total beginners, and another for people who are more knowledgeable about programming.

Just to note: the two sets of reasons are largely the same ideas presented differently, so I’ll just weave them together below.

The first section attempts to explain the case against starting with Python 3 in non-technical terms so a beginner can make up their own mind without being influenced by propaganda or social pressure.

Having already read through this once, this sentence really stands out to me. The author of a book many beginners read to learn Python in the first place is providing a number of reasons (some outright fabricated) not to use Python 3, often in terms beginners are ill-equipped to evaluate, but believes this is a defense against propaganda or social pressure.

The Most Important Reason

Before getting into the main technical reasons I would like to discuss the one most important social reason for why you should not use Python 3 as a beginner:

THERE IS A HIGH PROBABILITY THAT PYTHON 3 IS SUCH A FAILURE IT WILL KILL PYTHON.

Python 3’s adoption is really only at about 30% whenever there is an attempt to measure it.

Wait, really? Wow, that’s fantastic.

I mean, it would probably be higher if the most popular beginner resources were actually teaching Python 3, but you know.

Nobody is all that interested in finding out what the real complete adoption is, despite there being fairly simple ways to gather metrics on the adoption.

This accusatory sentence conspicuously neglects to mention what these fairly simple ways are, a pattern that repeats throughout. The trouble is that it’s hard to even define what “adoption” means — I write all my code in Python 3 now, but veekun is still Python 2 because it’s in maintenance mode, so what does that say about adoption? You could look at PyPI download stats, but those are thrown way off by caches and system package managers. You could look at downloads from the Python website, but a great deal of Python is written and used on Unix-likes, where Python itself is either bundled or installed from the package manager.

It’s as simple as that. If you learn Python 2, then you can still work with all the legacy Python 2 code in existence until Python dies or you (hopefully) move on. But if you learn Python 3 then your future is very uncertain. You could really be learning a dead language and end up having to learn Python 2 anyway.

You could use Python 2, until it dies… or you could use Python 3, which might die. What a choice.

By some definitions, Python 2 is already dead — it will not see another major release, only security fixes. Python 3 is still actively developed, and its seventh major release is next month. It even contains a new feature that Zed later mentions he prefers to Python 2’s offerings.

It may shock you to learn that I know both Python 2 and Python 3. Amazingly, two versions of the same language are much more similar than they are different. If you learned Python 3 and then a wizard cast a spell that made it vanish from the face of the earth, you’d just have to spend half an hour reading up on what had changed from Python 2.

Also, it’s been over a decade, maybe even multiple decades, and Python 3 still isn’t above about 30% in adoption. Even among the sciences where Python 3 is touted as a “success” it’s still only around 25-30% adoption. After that long it’s time to admit defeat and come up with a new plan.

Python 3.0 came out in 2008. The first couple releases ironed out some compatibility and API problems, so it didn’t start to gain much traction until Python 3.2 came out in 2011. Hell, Python 2.0 came out in 2000, so even Python 2 isn’t multiple decades old. It would be great if this trusted beginner reference could take two seconds to check details like this before using them to scaremonger.

The big early problem was library compatibility: it’s hard to justify switching to a new version of the language if none of the libraries work. Libraries could only port once their own dependencies had ported, of course, and it took a couple years to figure out the best way to maintain compatibility with both Python 2 and Python 3. I’d say we only really hit critical mass a few years ago — for instance, Django didn’t support Python 3 until 2013 — in which case that 30% is nothing to sneeze at.

There are more reasons beyond just the uncertain future of Python 3 even decades later.

In one paragraph, we’ve gone from “maybe even multiple decades” to just “decades”, which is a funny way to spell “eight years”.

Not In Your Best Interests

The Python project’s efforts to convince you to start with Python 3 are not in your best interest, but, rather, are only in the best interests of the Python project.

It’s bad, you see, for the Python project to want people to use the work it produced.

Anyway, please buy Zed Shaw’s book.

Anyway, please pledge to my Patreon.

Ultimately though, if Python 3 were good they wouldn’t need to do any convincing to get you to use it. It would just naturally work for you and you wouldn’t have any problems. Instead, there are serious issues with Python 3 for beginners, and rather than fix those issues the Python project uses propaganda, social pressure, and marketing to convince you to use it. In the world of technology using marketing and propaganda is immediately a sign that the technology is defective in some obvious way.

This use of social pressure and propaganda to convince you to use Python 3 despite its problems, in an attempt to benefit the Python project, is morally unconscionable to me.

Ten paragraphs in, Zed is telling me that I should be suspicious of anything that relies on marketing and propaganda. Meanwhile, there has yet to be a single concrete reason why Python 3 is bad for beginners — just several flat-out incorrect assertions and a lot of handwaving about how inexplicably nefarious the Python core developers are. You know, the same people who made Python 2. But they weren’t evil then, I guess.

You Should Be Able to Run 2 and 3

In the programming language theory there is this basic requirement that, given a “complete” programming language, I can run any other programming language. In the world of Java I’m able to run Ruby, Java, C++, C, and Lua all at the same time. In the world of Microsoft I can run F#, C#, C++, and Python all at the same time. This isn’t just a theoretical thing. There is solid math behind it. Math that is truly the foundation of computer science.

The fact that you can’t run Python 2 and Python 3 at the same time is purely a social and technical decision that the Python project made with no basis in mathematical reality. This means you are working with a purposefully broken platform when you use Python 3, and I personally can’t condone teaching people to use something that is fundamentally broken.

The programmer-oriented section makes clear that the solid math being referred to is Turing-completeness — the section is even titled “Python 3 Is Not Turing Complete”.

First, notice a rhetorical trick here. You can run Ruby, Java, C++, etc. at the same time, so why not Python 2 and Python 3?

But can you run Java and C# at the same time? (I’m sure someone has done this, but it’s certainly much less popular than something like Jython or IronPython.)

Can you run Ruby 1.8 and Ruby 2.3 at the same time? Ah, no, so I guess Ruby 2.3 is fundamentally and purposefully broken.

Can you run Lua 5.1 and 5.3 at the same time? Lua is a spectacular example, because Lua 5.2 made a breaking change to how the details of scope work, and it’s led to a situation where a lot of programs that embed Lua haven’t bothered upgrading from Lua 5.1. Was Lua 5.2 some kind of dark plot to deliberately break the language? No, it’s just slightly more inconvenient than expected for people to upgrade.

Anyway, as for Turing machines:

In computer science a fundamental law is that if I have one Turing Machine I can build any other Turing Machine. If I have COBOL then I can bootstrap a compiler for FORTRAN (as disgusting as that might be). If I have FORTH, then I can build an interpreter for Ruby. This also applies to bytecodes for CPUs. If I have a Turing Complete bytecode then I can create a compiler for any language. The rule then can be extended even further to say that if I cannot create another Turing Machine in your language, then your language cannot be Turing Complete. If I can’t use your language to write a compiler or interpreter for any other language then your language is not Turing Complete.

Yes, this is true.

Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.

And this is completely asinine. Worse, it’s flat-out dishonest, and relies on another rhetorical trick. You only “cannot” run Python 2 inside the Python 3 VM because no one has written a Python 2 interpreter in Python 3. The “cannot” is not a mathematical impossibility; it’s a simple matter of the code not having been written. Or perhaps it has, but no one cares anyway, because it would be comically and unusably slow.

I assume this was meant to be sarcastic on some level, since it’s followed by a big blue box that seems unsure about whether to double down or reverse course. But I can’t tell why it was even brought up, because it has absolutely nothing to do with Zed’s true complaint, which is that Python 2 and Python 3 do not coexist within a single environment. Implementing language X using language Y does not mean that X and Y can now be used together seamlessly.

The canonical Python release is written in C (just like with Ruby or Lua), but you can’t just dump a bunch of C code into a Python (or Ruby or Lua) file and expect it to work. You can talk to C from Python and vice versa, but defining how they communicate is a bit of a pain in the ass and requires some level of setup.

I’ll get into this some more shortly.

No Working Translator

Python 3 comes with a tool called 2to3 which is supposed to take Python 2 code and translate it to Python 3 code.

I should point out right off the bat that this is not actually what you want to use most of the time, because you probably want to translate your Python 2 code to Python 2/3 code. 2to3 produces code that most likely will not work on Python 2. Other tools exist to help you port more conservatively.

Translating one programming language into another is a solidly researched topic with solid math behind it. There are translators that convert any number of languages into JavaScript, C, C++, Java, and many times you have no idea the translation is being done. In addition to this, one of the first steps when implementing a new language is to convert the new language into an existing language (like C) so you don’t have to write a full compiler. Translation is a fully solved problem.

This is completely fucking ludicrous. Translating one programming language to another is a common task, though “fully solved” sounds mighty questionable. But do you know what the results look like?

I found a project called “Transcrypt”, which puts Python in the browser by “translating” it to JavaScript. I’ve never used or heard of this before; I just googled for something to convert Python to JavaScript. Here’s their first sample, a demo using jQuery:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def start ():
    def changeColors ():
        for div in S__divs:
            S (div) .css ({
                'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
            })

    S__divs = S ('div')
    changeColors ()
    window.setInterval (changeColors, 500)

And here’s the JavaScript code it compiles to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(function () {
    var start = function () {
        var changeColors = function () {
            var __iterable0__ = $divs;
            for (var __index0__ = 0; __index0__ < __iterable0__.length; __index0__++) {
                var div = __iterable0__ [__index0__];
                $ (div).css (dict ({'color': 'rgb({},{},{})'.format.apply (null, function () {
                    var __accu0__ = [];
                    for (var i = 0; i < 3; i++) {
                        __accu0__.append (int (256 * Math.random ()));
                    }
                    return __accu0__;
                } ())}));
            }
        };
        var $divs = $ ('div');
        changeColors ();
        window.setInterval (changeColors, 500);
    };
    __pragma__ ('<all>')
        __all__.start = start;
    __pragma__ ('</all>')
}) ();

Well, not quite. That’s actually just a small piece at the end of the full 1861-line file.

You may notice that the emitted JavaScript effectively has to emulate the Python for loop, because JavaScript doesn’t have anything that works exactly the same way. And this is a basic, common language feature translated between two languages in the same general family! Imagine how your code would look if you relied on gritty details of how classes are implemented.

Is this what you want 2to3 to do to your code?

Even if something has been proven to be mathematically possible, that doesn’t mean it’s easy, and it doesn’t mean the results will be pretty (or fast).

The 2to3 translator fails on about 15% of the code it attempts, and does a poor job of translating the code it can handle. The motivations for this are unclear, but keep in mind that a group of people who claim to be programming language experts can’t write a reliable translator from one version of their own language to another. This is also a cause of their porting problems, which adds up to more evidence Python 3’s future is uncertain.

Writing a translator from one language to another is a fully proven and fundamental piece of computer science. Yet, the 2to3 translator cannot translate code 100%. In my own tests it is only about 85% effective, leaving a large amount of code to translate manually. Given that translation is a solved problem this seems to be a decision bordering on malice rather than incredible incompetence.

The programmer-oriented section doubles down on this idea with a title of “Purposefully Crippled 2to3 Translator” — again, accusing the Python project of sabotaging everyone. That doesn’t even make sense; if their goal is to make everyone use Python 3 at any cost, why would they deliberately break their tool that reduces the amount of Python 2 code and increases the amount of Python 3 code?

2to3 sucks because its job is hard. Python is dynamically typed. If it sees d.iteritems(), it might want to change that to d.items(), as it’s called in Python 3 — but it can’t always be sure that d is actually a dict. If d is some user-defined type, renaming the method is wrong.

But hey, Turing-completeness, right? It must be mathematically possible. And it is! As long as you’re willing to see this:

1
2
for key, value in d.iteritems():
    ...

Get translated to this:

1
2
3
__d = d
for key, value in (__d.items() if isinstance(__d, dict) else __d.iteritems()):
    ...

Would Zed be happier with that, I wonder?

The JVM and CLR Prove It’s Pointless

Yet, for some reason, the Python 3 virtual machine can’t run Python 2? Despite the solidly established mathematics disproving this, the countless examples of running one crazy language inside a Russian doll cascade of other crazy languages, and huge number of languages that can coexist in nearly every other virtual machine? That makes no sense.

This, finally, is the real complaint. It’s not a bad one, and it comes up sometimes, but… it’s not this easy.

The Python 3 VM is fairly similar to the Python 2 VM. The problem isn’t the VM, but the core language constructs and standard library.

Consider: what happens when a Python 2 old-style class instance gets passed into Python 3, which has no such concept? It seems like a value would have to always have the semantics of the language version it came from — that’s how languages usually coexist on the same VM, anyway.

Now, I’m using Python 3, and I load some library written for Python 2. I call a Python 2 function that deals with bytestrings, and I pass it a Python 3 bytestring. Oh no! It breaks because Python 3 bytestrings iterate as integers, whereas the Python 2 library expects them to iterate as characters.

Okay, well, no big deal, you say. Maybe Python 2 libraries just need to be updated to work either way, before they can be used with Python 3.

But that’s exactly the situation we’re in right now. Syntax changes are trivially fixed by 2to3 and similar tools. It’s libraries that cause the subtler issues.

The same applies the other way, too. I write Python 3 code, and it gets an int from some Python 2 library. I try to use the .to_bytes method on it, but that doesn’t exist on Python 2 integers. So my Python 3 code, written and intended purely for Python 3, now has to deal with Python 2 integers as well.

Perhaps “primitive” types should convert automatically, on the boundary? Okay, sure. What about the Python 2 buffer type, which is C-backed and replaced by memoryview in Python 3?

Or how about this very fundamental problem: names of methods and other attributes are str in both versions, but that means they’re bytestrings in Python 2 and text in Python 3. If you’re in Python 3 land, and you call obj.foo() on a Python 2 object, what happens? Python 3 wants a method with the text name foo, but Python 2 wants a method with the bytes name foo. Text and bytes are not implicitly convertible in Python 3. So does it error? Somehow work anyway? What about the other way around?

What about the standard library, which has had a number of improvements in Python 3 that don’t or can’t exist in Python 2? Should Python ship two entire separate copies of its standard library? What about modules like logging, which rely on global state? Does Python 2 and Python 3 code need to set up logging separately within the same process?

There are no good solutions here. The language would double in size and complexity, and you’d still end up with a mess at least as bad as the one we have now when values leak from one version into the other.

We either have two situations here:

  1. Python 3 has been purposefully crippled to prevent Python 2’s execution alongside Python 3 for someone’s professional or ideological gain.
  2. Python 3 cannot run Python 2 due to simple incompetence on the part of the Python project.

I can think of a third.

Difficult To Use Strings

The strings in Python 3 are very difficult to use for beginners. In an attempt to make their strings more “international” they turned them into difficult to use types with poor error messages.

Why is “international” in scare quotes?

Every time you attempt to deal with characters in your programs you’ll have to understand the difference between byte sequences and Unicode strings.

Given that I’m reading part of a book teaching Python, this would be a perfect opportunity to drive this point home by saying “Look! Running exercise N in Python 3 doesn’t work.” Exercise 1, at least, works fine for me with a little extra sprinkle of parentheses:

1
2
3
4
5
6
7
print("Hello World!")
print("Hello Again")
print("I like typing this.")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

Contrast with the actual content of that exercise — at the bottom is a big red warning box telling people from “another country” (relative to where?) that if they get errors about ASCII encodings, they should put an unexplained magical incantation at the top of their scripts to fix “Unicode UTF-8”, whatever that is. I wonder if Zed has read his own book.

Don’t know what that is? Exactly.

If only there were a book that could explain it to beginners in more depth than “you have to fix this if you’re foreign”.

The Python project took a language that is very forgiving to beginners and mostly “just works” and implemented strings that require you to constantly know what type of string they are. Worst of all, when you get an error with strings (which is very often) you get an error message that doesn’t tell you what variable names you need to fix.

The complaint is that this happens in Python 3, whereas it’s accepted in Python 2:

1
2
3
4
>>> b"hello" + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The programmer section is called “Statically Typed Strings”. But this is not static typing. That’s strong typing, a property that sets Python’s type system apart from languages like JavaScript. It’s usually considered a good thing, because the alternative is to silently produce nonsense in some cases, and then that nonsense propagates through your program and is hard to track down when it finally causes problems.

If they’re going to require beginners to struggle with the difference between bytes and Unicode the least they could do is tell people what variables are bytes and what variables are strings.

That would be nice, but it’s not like this is a new problem. Try this in Python 2.

1
2
3
4
>>> 3 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How would Python even report this error when I used literals instead of variables? How could custom types hook into such a thing? Error messages are hard.

By the way, did you know that several error messages are much improved in Python 3? Python 2 is somewhat notorious for the confusing errors it produces when an argument is missing from a method call, but Python 3 is specific about the problem, which is much friendlier to beginners.

However, when you point out that this is hard to use they try to claim it’s good for you. It is not. It’s simple blustering covering for a poor implementation.

I don’t know what about this is hard. Why do you have a text string and a bytestring in the first place? Why is it okay to refuse adding a number to a string, but not to refuse adding bytes to a string?

Imagine if one of the Python core developers were just getting into Python 2 and messing around.

1
2
3
# -*- coding: utf8 -*-
print "Hi, my name is Łukasz Langa."
print "Hi, my name is Łukasz Langa."[::-1]
1
2
Hi, my name is Łukasz Langa.
.agnaL zsaku�� si eman ym ,iH

Good luck figuring out how to fix that.

This isn’t blustering. Bytes are not text; they are binary data that could encode anything. They happen to look like text sometimes, and you can get away with thinking they’re text if you’re not from “another country”, but that mindset will lead you to write code that is wrong. The resulting bugs will be insidious and confusing, and you’ll have a hard time even reasoning about them because it’ll seem like “Unicode text” is somehow a different beast altogether from “ASCII text”.

Exercise 11 mentions at the end that you can use int() to convert a number to an integer. It’s no more complicated to say that you convert bytes to a string using .decode(). It shouldn’t even come up unless you’re explicitly working with binary data, and I don’t see any reading from sockets in LPTHW.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

This comes a scant few paragraphs after “Dynamic typing is what makes Python easy to use and one of the reasons I advocate it for beginners.”

You can’t find any kinds of type errors until you run the code. Welcome to dynamic typing.

Strings are also most frequently received from an external source, such as a network socket, file, or similar input. This means that Python 3’s statically typed strings and lack of static type safety will cause Python 3 applications to crash more often and have more security problems when compared with Python 2.

On the contrary — Python 3 applications should crash less often. The problem with silently converting between bytestrings and text in Python 2 is that it might fail, depending on the contents. "cafe" + u"hello" works fine, but "café" + u"hello" raises a UnicodeDecodeError. Python 2 makes it very easy to write code that appears to work when tested with ASCII data, but later breaks with anything else, even though the values are still the same types. In Python 3, you get an error the first time you try to run such code, regardless of what’s in the actual values. That’s the biggest reason for the change: it improves things from being intermittent value errors to consistent type errors.

More security problems? This is never substantiated, and seems to have been entirely fabricated.

Too Many Formatting Options

In addition to that you will have 3 different formatting options in Python 3.6. That means you’ll have to learn to read and use multiple ways to format strings that are all very different. Not even I, an experienced professional programmer, can easily figure out these new formatting systems or keep up with their changing features.

I don’t know what on earth “keep up with their changing features” is supposed to mean, and Zed doesn’t bother to go into details.

Python 3 has three ways to format strings: % interpolation, str.format(), and the new f"" strings in Python 3.6. The f"" strings use the same syntax as str.format(); the difference is that where str.format() uses numbers or names of keyword arguments, f"" strings just use expressions. Compare:

1
2
3
number = 133
print("{n:02x}".format(n=number))
print(f"{number:02x}")

This isn’t “very different”. A frequently-used method is being promoted to syntax.

I really like this new style, and I have no idea why this wasn’t the formatting for Python 3 instead of that stupid .format function. String interpolation is natural for most people and easy to explain.

The problem is that beginner will now how to know all three of these formatting styles, and that’s too many.

I could swear Zed, an experienced professional programmer, just said he couldn’t easily figure out these new formatting systems. Note also that str.format() has existed in Python 2 since Python 2.6 was released in 2008, so I don’t know why Zed said “new formatting systems“, plural.

This is a truly bizarre complaint overall, because the mechanism Zed likes best is the newest one. If Python core had agreed that three mechanisms was too many, we wouldn’t be getting f"" at all.

Even More Versions of Strings

Finally, I’m told there is a new proposal for a string type that is both bytes and Unicode at the same time? That’d be fantastic if this new type brings back the dynamic typing that makes Python easy, but I’m betting it will end up being yet another static type to learn. For that reason I also think beginners should avoid Python 3 until this new “chimera string” is implemented and works reliably in a dynamic way. Until then, you will just be dealing with difficult strings that are statically typed in a dynamically typed language.

I have absolutely no idea what this is referring to, and I can’t find anyone who does. I don’t see any recent PEPs mentioning such a thing, nor anything in the last several months on the python-dev mailing list. I don’t see it in the Python 3.6 release notes.

The closest thing I can think of is the backwards-compatibility shenanigans for PEP 528 and PEP 529 — they switch to the Windows wide-string APIs for console and filesystem encoding, but pretend under the hood that the APIs take UTF-8-encoded bytes to avoid breaking libraries like Twisted. That’s a microscopic detail that should never matter to anyone but authors of Twisted, and is nothing like a new hybrid string type, but otherwise I’m at a loss.

This paragraph really is a perfect summary of the whole article. It speaks vaguely yet authoritatively about something that doesn’t seem to exist, it doesn’t bother actually investigating the thing the entire section talks about, it conjectures that this mysterious feature will be hard just because it’s in Python 3, and it misuses terminology to complain about a fundamental property of Python that’s always existed.

Core Libraries Not Updated

Many of the core libraries included with Python 3 have been rewritten to use Python 3, but have not been updated to use its features. How could they given Python 3’s constant changing status and new features?

What “constant changing status”? The language makes new releases; is that bad? The only mention of “changing” so far was with string formatting, which makes no sense to me, because the only major change has been the addition of syntax that Zed prefers.

There are several libraries that, despite knowing the encoding of data, fail to return proper strings. The worst offender seems to be any libraries dealing with the HTTP protocol, which does indicate the encoding of the underlying byte stream in many cases.

In many cases, yes. Not in all. Some web servers don’t send back an encoding. Some files don’t have an encoding, because they’re images or other binary data. HTML allows the encoding to be given inside the document, instead. urllib has always returned bytes, so it’s not all that unreasonable to keep doing that, rather than… well, I’m not quite sure what this is proposing. Return strings sometimes?

The documentation for urllib.request and http.client both advise using the higher-level Requests library instead, in a prominent yellow box right at the top. Requests has distinct mechanisms for retrieving bytes versus text and is vastly easier to use overall, though I don’t think even it understands reading encodings from HTML. Alas, computers.

Good luck to any beginner figuring out how to install Requests on Python 2 — but thankfully, Python 3 now comes bundled with pip, which makes installing libraries much easier. Contrast with the beginning of exercise 46, which apologizes for how difficult this is to explain, lists four things to install, warns that it will be frustrating, and advises watching a video to help figure it out.

What’s even more idiotic about this is Python has a really good Chardet library for detecting the encoding of byte streams. If Python 3 is supposed to be “batteries included” then fast Chardet should be baked into the core of Python 3’s strings making it cake to translate strings to bytes even if you don’t know the underlying encoding. … Call the function whatever you want, but it’s not magic to guess at the encoding of a byte stream, it’s science. The only reason this isn’t done for you is that the Python project decided that you should be punished for not knowing about bytes vs. Unicode, and their arrogance means you have difficult to use strings.

Guessing at the encoding of a byte stream isn’t so much science as, well, guessing. Guessing means that sometimes you’re wrong. Sometimes that’s what you want, and I’m honestly ambivalent about having chardet in the standard library, but it’s hardly arrogant to not want to include a highly-fallible heuristic in your programming language.

Conclusions and Warnings

I have resisted writing about these problems with Python 3 for 5 versions because I hoped it would become usable for beginners. Each year I would attempt to convert some of my code and write a couple small tests with Python 3 and simply fail. If I couldn’t use Python 3 reliably then there’s no way a total beginner could manage it. So each year I’d attempt it, and fail, and wait until they fix it. I really liked Python and hoped the Python project would drop their stupid stances on usability.

Let us recap the usability problems seen thusfar.

  • You can’t add b"hello" to "hello".
  • TypeErrors are phrased exactly the same as they were in Python 2.
  • The type system is exactly as dynamic as it was in Python 2.
  • There is a new formatting mechanism, using the same syntax as one in Python 2, that Zed prefers over the ones in Python 2.
  • urllib.request doesn’t decode for you, just like in Python 2.
  • 档牡敤㽴 isn’t built in. Oh, sorry, I meant chardet.

Currently, the state of strings is viewed as a Good Thing in the Python community. The fact that you can’t run Python 2 inside Python 3 is seen as a weird kind of tough love. The brainwashing goes so far as to outright deny the mathematics behind language translation and compilation in an attempt to motivate the Python community to brute force convert all Python 2 code.

Which is probably why the Python project focuses on convincing unsuspecting beginners to use Python 3. They don’t have a switching cost, so if you get them to fumble their way through the Python 3 usability problems then you have new converts who don’t know any better. To me this is morally wrong and is simply preying on people to prop up a project that needs a full reset to survive. It means beginners will fail at learning to code not because of their own abilities, but because of Python 3’s difficulty.

Now that we’re towards the end, it’s a good time to say this: Zed Shaw, your behavior here is fucking reprehensible.

Half of what’s written here is irrelevant nonsense backed by a vague appeal to “mathematics”. Instead of having even the shred of humility required to step back and wonder if there are complicating factors beyond whether something is theoretically possible, you have invented a variety of conflicting and malicious motivations to ascribe to the Python project.

It’s fine to criticize Python 3. The string changes force you to think about what you’re doing a little more in some cases, and occasionally that’s a pain in the ass. I absolutely get it.

But you’ve gone out of your way to invent a conspiracy out of whole cloth and promote it on your popular platform aimed at beginners, who won’t know how obviously full of it you are. And why? Because you can’t add b"hello" to "hello"? Are you kidding me? No one can even offer to help you, because instead of examples of real problems you’ve had, you gave two trivial toys and then yelled a lot about how the whole Python project is releasing mind-altering chemicals into the air.

The Python 3 migration has been hard enough. It’s taken a lot of work from a lot of people who’ve given enough of a crap to help Python evolve — to make it better to the best of their judgment and abilities. Now we’re finally, finally at the point where virtually all libraries support Python 3, a few new ones only support Python 3, and Python 3 adoption is starting to take hold among application developers.

And you show up to piss all over it, to propagate this myth that Python 3 is hamstrung to the point of unusability, because if the Great And Wise Zed Shaw can’t figure it out in ten seconds then it must just be impossible.

Fuck you.

Sadly, I doubt this will happen, and instead they’ll just rant about how I don’t know what I’m talking about and I should shut up.

This is because you don’t know what you’re talking about, and you should shut up.

Iteration in one language, then all the others

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/18/iteration-in-one-language-then-all-the-others/

You may have noticed that I like comparing features across different languages. I hope you like it too, because I’m doing it again.

Python

I’m most familiar with Python, and iteration is one of its major concepts, so it’s a good place to start and a good overview of iteration. I’ll dive into Python a little more deeply, then draw parallels to other languages.

Python only has one form of iteration loop, for. (Note that all of these examples are written for Python 3; in Python 2, some of the names are slightly different, and fewer things are lazy.)

1
2
for value in sequence:
    ...

in is also an operator, so value in sequence is also the way you test for containment. This is either very confusing or very satisfying.

When you need indices, or specifically a range of numbers, you can use the built-in enumerate or range functions. enumerate works with lazy iterables as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# This makes use of tuple unpacking to effectively return two values at a time
for index, value in enumerate(sequence):
    ...

# Note that the endpoint is exclusive, and the default start point is 0.  This
# matches how list indexing works and fits the C style of numbering.
# 0 1 2 3 4
for n in range(5):
    ...

# Start somewhere other than zero, and the endpoint is still exclusive.
# 1 2 3 4
for n in range(1, 5):
    ...

# Count by 2 instead.  Can also use a negative step to count backwards.
# 1 3 5 7 9
for n in range(1, 11, 2):
    ...

dicts (mapping types) have several methods for different kinds of iteration. Additionally, iterating over a dict directly produces its keys.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for key in mapping:
    ...

for key in mapping.keys():
    ...

for value in mapping.values():
    ...

for key, value in mapping.items():
    ...

Python distinguishes between an iterable, any value that can be iterated over, and an iterator, a value that performs the actual work of iteration. Common iterable types include list, tuple, dict, str, and set. enumerate and range are also iterable.

Since Python code rarely works with iterators directly, and many iterable types also function as their own iterators, it’s common to hear “iterator” used to mean an iterable. To avoid this ambiguity, and because the words are fairly similar already, I’ll refer to iterables as containers like the Python documentation sometimes does. Don’t be fooled — an object doesn’t actually need to contain anything to be iterable. Python’s range type is iterable, but it doesn’t physically contain all the numbers in the range; it generates them on the fly as needed.

The fundamental basics of iteration are built on these two ideas. Given a container, ask for an iterator; then repeatedly advance the iterator to get new values. When the iterator runs out of values, it raises StopIteration. That’s it. In Python, those two steps can be performed manually with the iter and next functions. A for loop is roughly equivalent to:

1
2
3
4
5
6
7
8
9
_iterator = iter(container)
_done = False
while not _done:
    try:
        value = next(_iterator)
    except StopIteration:
        _done = True
    else:
        ...

An iterator can only move forwards. Once a value has been produced, it’s lost, at least as far as the iterator is concerned. These restrictions are occasionally limiting, but they allow iteration to be used for some unexpected tasks. For example, iterating over an open file produces its lines — even if the “file” is actually a terminal or pipe, where data only arrives once and isn’t persistently stored anywhere.

Generators

A more common form of “only forwards, only once” in Python is the generator, a function containing a yield statement. For example:

1
2
3
4
5
6
7
8
9
def inclusive_range(start, stop):
    val = start
    while val <= stop:
        yield val
        val += 1

# 6 7 8 9
for n in inclusive_range(6, 9):
    ...

Calling a generator function doesn’t execute its code, but immediately creates a generator iterator. Every time the iterator is advanced, the function executes until the next yield, at which point the yielded value is returned as the next value and the function pauses. The next iteration will then resume the function. When the function returns (or falls off the end), the iterator stops.

Since the values here are produced by running code on the fly, it’s of course impossible to rewind a generator.

The underlying protocol is straightforward. A container must have an __iter__ method that returns an iterator, corresponding to the iter function. An iterator must have a __next__ method that returns the next item, corresponding to the next function. If the iterator is exhausted, __next__ must raise StopIteration. An iterator must also have an __iter__ that returns itself — this is so an iterator can be used directly in a for loop.

The above inclusive range generator might be written out explicitly like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        return InclusiveRangeIterator(self)

class InclusiveRangeIterator:
    def __init__(self, incrange):
        self.incrange = incrange
        self.nextval = incrange.start

    def __iter__(self):
        return self

    def __next__(self):
        if self.nextval > self.incrange.stop:
            raise StopIteration

        val = self.nextval
        self.nextval += 1
        return val

This might seem like a lot of boilerplate, but note that the iterator state (here, nextval) can’t go on InclusiveRange directly, because then it’d be impossible to iterate over the same object twice at the same time. (Some types, like files, do act as their own iterators because they can’t meaningfully be iterated in parallel.)

Even Python’s internals work this way. Try iter([]) in a Python REPL; you’ll get a list_iterator object.

In truth, it is a lot of boilerplate. User code usually uses this trick:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        val = self.start
        while val <= self.stop:
            yield val
            val += 1

Nothing about this is special-cased in any way. Now __iter__ is a generator, and calling a generator function returns an iterator, so all the constraints are met. It’s a really easy way to convert a generator function into a type. If this class were named inclusive_range instead, it would even be backwards-compatible; consuming code wouldn’t even have to know it’s a class.

Reversal

But why would you do this? One excellent reason is to add support for other sequence-like operations, like reverse iteration support. An iterator can’t be reversed, but a container might support being iterated in reverse:

1
2
3
4
fruits = ['apple', 'orange', 'pear']
# pear, orange, apple
for value in reversed(fruits):
    ...

Iterating a lazy container doesn’t always make sense, but when it does, it’s easy to implement by returning an iterator from __reversed__.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        val = self.start
        while val <= self.stop:
            yield val
            val += 1

    def __reversed__(self):
        val = self.stop
        while val >= self.start:
            yield val
            val -= 1

Note that Python does not have “bi-directional” iterators, which can freely switch between forwards and reverse iteration on the fly. A bidirectional iterator is useful for cases like doubly-linked lists, where it’s easy to get from one value to the next or previous value, but not as easy to start from the beginning and get the tenth item.

Iteration is often associated with sequences, though they’re not quite the same. In Python, a sequence is a value that can be indexed in order as container[0], container[1], etc. (Indexing is implemented with __getitem__.) All sequences are iterable; in fact, if a type implements indexing but not __iter__, the iter function will automatically try indexing it from zero instead. reversed does the same, though it requires that the type implement __len__ as well so it knows what the last item is.

Much of this is codified more explicitly in the abstract base classes in collections.abc, which also provide default implementations of common methods.

Not all iterables are sequences, and not every value that can be indexed is a sequence! Python’s mapping type, dict, uses indexing to fetch the value for a key; but a dict has no defined order and is not a sequence. However, a dict can still be iterated over, producing its keys (in arbitrary order). A set can be iterated over, producing its values in arbitrary order, but it cannot be indexed at all. A type could conceivably use indexing for something more unusual and not be iterable at all.

A common question

It’s not really related to iteration, but people coming to Python from Ruby often ask why len() is a built-in function, rather than a method. The same question could be asked about iter() and next() (and other Python builtins), which more or less delegate directly to a “reserved” __dunder__ method anyway.

I believe the technical reason is simply the order that features were added to the language in very early days, which is not very interesting.

The philosophical reason, imo, is that Python does not reserve method names for fundamental operations. All __dunder__ names are reserved, of course, but everything else is fair game. This makes it obvious when a method is intended to add support for some language-ish-level operation, even if you don’t know what all the method names are. Occasionally a third-party library invents its own __dunder__ name, which is a little naughty, but the same reasoning applies: “this is a completely generic interface that some external mechanism is expected to use”.

This approach also avoids a namespacing problem. In Ruby, a Rectangle class might want to have width and length attributes… but the presence of length means a Rectangle looks like it functions as a sequence! Since “interface” method names aren’t namespaced in any way, there is no way to say that you don’t mean the same thing as Array.length.

It’s a minor quibble, since everything’s dynamically typed anyway, so the real solution is “well don’t try to iterate a rectangle then”. And Python does use keys as a method name in some obscure cases. Oh, well.

Some cute tricks

The distinction between sequences and iterables can cause some subtle problems. A lot of code that only needs to loop over items can be passed, e.g., a generator. But this can take some conscious care. Compare:

1
2
3
4
5
6
7
8
# This will NOT work with generators, which don't support len() or indexing
for i in range(len(container)):
    value = container[i]
    ...

# But this will
for i, value in enumerate(container):
    ...

enumerate also has a subtle, unfortunate problem: it cannot be combined with reversed. This has bit me more than once, surprisingly.

1
2
3
4
5
6
7
# This produces a TypeError from reversed()
for i, value in reversed(enumerate(container)):
    ...

# This almost works, but the index goes forwards while the values go backwards
for i, value in enumerate(reversed(container)):
    ...

The problem is that enumerate can’t, in general, reverse itself. It counts up from zero as it iterates over its argument; reversing it means starting from one less than the number of items, but it doesn’t yet know how many items there are. But if you just want to run over a list or other sequence backwards, this feels very silly. A trivial helper can make it work:

1
2
3
4
5
def revenum(iterable, end=0):
    start = len(iterable) + end
    for value in iterable:
        start -= 1
        yield start, value

I’ve run into other odd cases where it’s frustrating that a generator doesn’t have a length or indexing. This especially comes up if you make heavy use of generator expressions, which are a very compact way to write a one-off generator. (Python also has list, set, and dict “comprehensions”, which have the same syntax but use brackets or braces instead of parentheses, and are evaluated immediately instead of lazily.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def get_big_fruits():
    fruits = ['apple', 'orange', 'pear']
    return (fruit.upper() for fruit in fruits)

# Roughly equivalent to:
def get_big_fruits():
    fruits = ['apple', 'orange', 'pear']
    def genexp():
        for fruit in fruits:
            yield fruit.upper()
    return genexp()

If you had thousands of fruits, doing this could save a little memory. The caller is probably just going to loop over them to print them out (or whatever), so using a generator expression means that each uppercase name only exists for a short time; returning a list would mean creating a lot of values all at once.

Ah, but now the caller wants to know how many fruits there are, with minimal fuss. Generators have no length, so that won’t work. Turning this generator expression into a class that also has a __len__ would be fairly ridiculous. So you resort to some slightly ugly trickery.

1
2
3
4
5
6
7
8
# Ugh.  Obvious, but feels really silly.
count = 0
for value in container:
    count += 1

# Better, but weird if you haven't seen it before.  Creates another generator
# expression that just yields 1 for every item, then sums them up.
count = sum(1 for _ in container)

Or perhaps you want the first big fruit? Well, [0] isn’t going to help. This is one of the few cases where using iter and next directly can be handy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Oops!  If the container is empty, this raises StopIteration, which you
# probably don't want.
first = next(iter(container))

# Catch the StopIteration explicitly.
try:
    first = next(iter(container))
except StopIteration:
    # This code runs if there are zero items
    ...

# Regular loop that terminates immediately.
# The "else" clause only runs when the container ends naturally (i.e. NOT if
# the loop breaks), which can only happen here if there are zero items.
for value in container:
    first = value
    break
else:
    ...

# next() -- but not __next__()! -- takes a second argument indicating a
# "default" value to return when the iterator is exhausted.  This only makes
# sense if you were going to substitute a default value anyway; doing this and
# then checking for None will do the wrong thing if the container actually
# contained a None.
first = next(iter(container), None)

Other tricks with iter and next include skipping the first item (or any number of initial items, though consider itertools.islice for more complex cases):

1
2
3
4
5
6
7
it = iter(container)
next(it, None)  # Use second arg to ignore StopIteration
for value in it:
    # Since the first item in the iterator has already been consumed, this loop
    # will start with the second item.  If the container had only one or zero
    # items, the loop will get StopIteration and end immediately.
    ...

Iterating two (or more) items at a time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Obvious way: call next() inside the loop.
it = iter(container)
for value1 in it:
    # With an odd number of items, this will raise an uncaught StopIteration!
    # Catch it or provide a default value.
    value2 = next(it)
    ...

# Moderately clever way: abuse zip().
# zip() takes some number of containers and iterates over them pairwise.  It
# stores an iterator for each container.  When it's asked for its next item, it
# in turn asks all of its iterators for their next items, and returns them as a
# set.  But by giving it the same exact iterator twice, it'll end up advancing
# that iterator twice and returning two consecutive items.
# Note that zip() stops early as soon as an iterator runs dry, so if the
# container has an odd number of items, this will silently skip the last one.
# If you don't want that, use itertools.zip_longest instead.
it = iter(container)
for line1, line2 in zip(it, it):
    ...

# Far too clever way: exactly the same as above, but written as a one-liner.
# zip(iter(), iter()) would create two separate iterators and break the trick.
# List multiplication produces a list containing the same iterator twice.
# One advantage of this is that the 2 can be a variable.
for value1, value2 in zip(*[iter(container)] * 2):
    ...

Wow, that got pretty weird towards the end. Somehow this turned into Stupid Python Iterator Tricks. Don’t worry; I know far less about these other languages.

C

C is an extreme example with no iterator protocol whatsoever. It barely even supports sequences; arrays are just pointer math. All it has is the humble C-style for loop:

1
2
3
4
5
int[] container = {...};
for (int i = 0; i < container_length; i++) {
    int value = container[i];
    ...
}

Unfortunately, it’s really the best C can do. C arrays don’t know their own length, so no matter what, the developer has to provide it some other way. Even without that, a built-in iterator protocol is impossible — iterators require persistent state (the current position) to be bundled alongside code (how to get to the next position). That pretty much means one of two things: closures or objects. C has neither.

Lua

Lua has two forms of for loop. The first is a simple numeric loop.

1
2
3
4
-- 1 3 5 7 9 11
for value = 1, 11, 2 do
    ...
end

The three values after the = are the start, end, and step. They work similarly to Python’s range(), except that everything in Lua is always inclusive, so for i = 1, 5 will count from 1 to 5.

The generic form uses in.

1
2
3
for value in iterate(container) do
    ...
end

iterate isn’t a special name here, but most of the time a generic for will look like this.

See, Lua doesn’t have objects. It has enough tools that you can build objects fairly easily, but the core language has no explicit concept of objects or method calls. An iterator protocol needs to bundle state and behavior somehow, so Lua uses closures for that. But you still need a way to get that closure, and that means calling a function, and a plain value can’t have functions attached to it. So iterating over a table (Lua’s single data structure) looks like this:

1
2
for key, value in pairs(container) do
    ...

pairs is a built-in function. Lua also has an ipairs, which iterates over consecutive keys and values starting from key 1. (Lua starts at 1, not 0. Lua also represents sequences as tables with numeric keys.)

Lua does have a way to associate “methods” with values, which is how objects are made, but for loops almost certainly came first. So iteration is almost always over a function call, not a bare value.

Also, because objects are built out of tables, having a default iteration behavior for all tables would mean having the same default for all objects. Nothing’s stopping you from using pairs on an object now, but at least that looks deliberate. It’s easy enough to give objects iteration methods and iterate over obj:iter(), though it’s slightly unfortunate that every type might look slightly different. Unfortunately, Lua has no truly generic interface for “this can produce a sequence of values”.

The iteration protocol is really just calling a function repeatedly to get new values. When the function returns nil, the iteration ends. (That means nil can never be part of an iteration! You can work around this by returning two values and making sure the first one is something else that’s never nil, like an index.) The manual explains the exact semantics of the generic for with Lua code, a move I wish every language would make.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- This:
for var_1, ···, var_n in explist do block end

-- Is equivalent to this:
do
    local _func, _state, _lastval = explist
    while true do
        local var_1, ···, var_n = _func(_state, _lastval)
        if var_1 == nil then break end
        _lastval = var_1
        block
    end
end

Important to note here is the way multiple-return works in Lua. Lua doesn’t have tuples; multiple assignment is a distinct feature of the language, and multiple return works exactly the same way as multiple assignment. If there are too few values, the extra variables become nil; if there are too many values, the extras are silently discarded.

So in the line local _func, _state, _lastval = explist, the “state” value _state and the “last loop value” _lastval are both optional. Lua doesn’t use them, except to pass them back to the iterator function _func, and they aren’t visible to the for loop body. An iterator can thus be only a function and nothing else, letting _state and _lastval be nil — but they can be a little more convenient at times. Compare:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Usual approach: return only a closure, completely ignoring state and lastval
local function inclusive_range(start, stop)
    local nextval = start
    return function()
        if nextval > stop then
            return
        end
        local val = nextval
        nextval = nextval + 1
        return val
    end
end

-- Alternative approach, not using closures at all.  This is the function we
-- return; each time it's called with the same "state" value and whatever it
-- returned last time it was called.
-- This function could even be written exactly a method (a la Python's
-- __next__), where the state value is the object itself.
local function inclusive_range_iter(stop, prev)
    -- "stop" is the state value; "prev" is the last value we returned
    local val = prev + 1
    if val > stop then
        return
    end
    return val
end
local function inclusive_range(start, stop)
    -- Return the iterator function, and pass it the stop value as its state.
    -- The "last value" is a little weird here; on the first iteration, there
    -- is no last value.  Here we can fake it by subtracting 1 from the
    -- starting number, but in other cases, it might make more sense if the
    -- "state" were a table containing both the start and stop values.
    return inclusive_range_iter, stop, start - 1
end

-- 6 7 8 9 with both implementations
for n in inclusive_range(6, 9) do
    ...
end

Lua doesn’t have generators. Surprisingly, it has fully-fledged coroutines — call stacks that can be paused at any time. Lua sometimes refers to them as “threads”, but only one can be running at a time. Effectively they’re like Python generators, except you can call a function which calls a function which calls a function which eventually yields, and the entire call stack from that point up to the top of the coroutine is paused and preserved.

In Python, the mere presence of yield causes a function to become a generator. In Lua, since any function might try to yield the coroutine it’s currently in, a function has to be explicitly called as a coroutine using functions in the coroutine library.

But this post is about iterators, not coroutines. Coroutines don’t function as iterators, but Lua provides a coroutine.wrap() that takes a function, turns it into a coroutine, and returns a function that resumes the coroutine. That’s enough to allow a coroutine to be turned into an iterator. The Lua book even has a section about this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
local function inclusive_range(start, stop)
    local val = start
    while val <= stop do
        coroutine.yield(val)
        val = val + 1
    end
end
-- Unfortunately, coroutine.wrap() doesn't have any way to pass initial
-- arguments to the function it wraps, so we need this dinky wrapper.
-- I should clarify that the ... here is literal syntax for once.
local function iter_coro(entry_point, ...)
    local args = {...}
    return coroutine.wrap(function()
        entry_point(unpack(args))
    end)
end

# 6 7 8 9
for n in iter_coro(inclusive_range, 6, 9) do
    ...
end

So, that’s cool. Lua doesn’t do a lot for you — unfortunately, list processing tricks can be significantly more painful in Lua — but it has some pretty interesting primitives that compose with each other remarkably well.

Perl 5

Perl has a very straightforward C-style for loop, which looks and works exactly as you might expect. my, which appears frequently in these examples, is just local variable declaration.

1
2
3
for (my $i = 0; $i < 10; $i++) {
    ...
}

Nobody uses it. Everyone uses the iteration-style for loop. (It’s occasionally called foreach, which is extra confusing because both for and foreach can be used for both kinds of loop. Nobody actually uses the foreach keyword.)

1
2
3
for my $value (@container) {
    ...
}

The iteration loop can be used for numbers, as well, since Perl has a .. inclusive range operator. For iterating over an array with indexes, Perl has the slightly odd $#array syntax, which is the index of the last item in @array. Creating something like Python’s enumerate is a little tricky in Perl, because you can’t directly return a list of lists, and the workaround doesn’t support unpacking. It’s complicated.

1
2
3
4
5
6
7
8
for my $i (1..10) {
    ...
}

for my $index (0..$#array) {
    my $value = $array[$index];
    ...
}

A hash (Perl’s mapping “shape”) can’t be iterated directly. Or, well, it can, but the loop will alternate between keys and values because Perl is weird. Instead you need the keys or values built-in functions to get the keys or values as regular lists. (These functions also work on arrays as of Perl 5.12.)

1
2
3
for my $key (keys %container) {
    ...
}

For iterating over both keys and values at the same time, Perl has an each function. The behavior is a little weird, since every call to the function advances an internal iterator inside the hash and returns a new pair. If a loop using each terminates early, the next use of each may silently start somewhere in the middle of the hash, skipping a bunch of its keys. This is probably why I’ve never seen each actually used.

1
2
3
while (my ($key, value) = each %container) {
    ...
}

Despite being very heavily built on the concept of lists, Perl doesn’t have an explicit iterator protocol, and its support for lazy iteration in general is not great. When they’re used at all, lazy iterators tend to be implemented as ad-hoc closures or callable objects, which require a while loop:

1
2
3
4
my $iter = custom_iterator($collection);
while (my $value = $iter->()) {
    ...
}

Here be dragons

It is possible to sorta-kinda fake an iterator protocol. If you’re not familiar, Perl’s variables come in several different “shapes” — hash, array, scalar — and it’s possible to “tie” a variable to a backing object which defines the operations for a particular shape. It’s a little like operator overloading, except that Perl also has operator overloading and it’s a completely unrelated mechanism. In fact, you could use operator overloading to make your object return a tied array when dereferenced as an array. I am talking gibberish now.

Anyway, the trick is to tie an array and return a new value for each consecutive fetch of an index. Like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
use v5.12;
package ClosureIterator;

# This is the tie "constructor" and just creates a regular object to store
# our state
sub TIEARRAY {
    my ($class, $closure) = @_;
    my $self = {
        closure => $closure,
        nextindex => 0,
    };
    return bless $self, $class;
}

# This is called to fetch the item at a particular index; for an iterator,
# only the next item is valid
sub FETCH {
    my ($self, $index) = @_;

    if ($index == 0) {
        # Always allow reading index 0, both to mean a general "get next
        # item" and so that looping over the same array twice will work as
        # expected
        $self->{nextindex} = 0;
    }
    elsif ($index != $self->{nextindex}) {
        die "ClosureIterator does not support random access";
    }

    $self->{nextindex}++;
    return $self->{closure}->();
}

# The built-in shift() function means "remove and return the first item", so
# it's a good fit for a general "advance iterator"
sub SHIFT {
    my ($self) = @_;
    $self->{nextindex} = 0;
    return $self->{closure}->();
}

# Yes, an array has to be able to report its own size...  but luckily, a for
# loop fetches the size on every iteration!  As long as this returns
# increasingly large values, such a loop will continue indefinitely
sub FETCHSIZE {
    my ($self) = @_;
    return $self->{nextindex} + 1;
}

# Most other tied array operations are for modifying the array, which makes no
# sense here.  They're deliberately omitted, so trying to use them will cause a
# "can't locate object method" error.


package main;

# Create an iterator that yields successive powers of 2
tie my @array, 'ClosureIterator', sub {
    # State variables are persistent, like C statics
    state $next = 1;
    my $ret = $next;
    $next *= 2;
    return $ret;
};

# This will print out 1, 2, 4, 8, ... 1024, at which point the loop breaks
for my $i (@array) {
    say $i;
    last if $i > 1000;
}

This transparently works like any other array… sort of. You can loop over it (forever!); you can use shift to pop off the next value; you can stop a loop and then continue reading from it later.

Unfortunately, this is just plain weird, even for Perl, and I very rarely see it used. Ultimately, Perl’s array operations come in a set, and this is an array that pretends not to be able to do half of them. Even Perl developers are likely to be surprised by an array, a fundamental “shape” of the language, with quirky behavior.

The biggest problem is that, as I said, Perl is heavily built on lists. Part of that design is that @arrays are very eager to spill their contents into a surrounding context. Naïvely passing an array to a function, for example, will expand its elements into separate arguments, losing the identity of the array itself (and losing any tied-ness). Interpolating an array into a string automatically space-separates its elements.

Unlike a for loop, these operations only ask the array for its size once — so rather than printing an infinite sequence, they’ll print a completely arbitrary prefix of it. In the case above, spilling a fresh array will read one item; spilling the array after the example loop will read eleven items. So while a tied array works nicely with a for loop, it’s at odds with the most basic rules of Perl syntax.

Also, Perl’s list-based nature means it’s attracted a lot of list-processing utilities — but these naturally expect to receive a spilled list of arguments and cannot work with a lazy iterator.

I found multiple mentions of the List::Gen module while looking into this. I’d never heard of it before and I’ve never seen it used, but it tries to fill this gap (and makes use of array tying, among other things). It’s a bit weird, and its source code is extremely weird, and it took me twenty minutes to figure out how it was using <...> as a quoting construct.

(<...> in Perl does filename globbing, so it’s usually seen as <*.txt>. The same syntax is used for reading from a filehandle, which makes this confusing and ambiguous, so it’s generally discouraged in favor of the built-in glob function which does the same thing. Well, it turns out that <...> must just call glob() at Perl-level, because List::Gen manages to co-opt this syntax simply by exporting its own glob function. Perl is magical.)

Perl 6

Perl 6, a mad experiment to put literally every conceivable feature into one programming language, naturally has a more robust concept of iteration.

At first glance, many of the constructs are similar to those of Perl 5. The C-style for loop still exists for some reason, but has been disambiguated under the loop keyword.

1
2
3
4
5
6
7
8
loop (my $i = 1; $i <= 10; $i++) {
    ...
}

# More interestingly, loop can be used completely bare for an infinite loop
loop {
    ...
}

The for block has slightly different syntax and a couple new tricks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Unlike in Perl 5, $value is automatically declared and scoped to the block,
# without needing an explicit 'my'
for @container -> $value {
    ...
}

for 1..10 -> $i {
    ...
}

# This doesn't iterate in pairs; it reads two items at a time from a flat list!
for 1..10 -> $a, $b {
    ...
}

Not apparent in the above code is that ranges are lazy in Perl 6, as in Python; the elements are computed on demand. In fact, Perl 6 supports a range like 1..Inf.

Loop variables are also aliases. By default they’re read-only, so this appears to work like Python… but Perl has always had a C-like language-level notion of “slots” that Python does not, and it becomes apparent if the loop variable is made read-write:

1
2
3
4
5
6
7
8
my @fruits = «apple orange pear»;
for @fruits -> $fruit is rw {
    # This is "apply method inplace", i.e. shorthand for:
    # $fruit = $fruit.uc;
    # Yes, you can do that.
    $fruit .= uc;
}
say @fruits;  # APPLE ORANGE PEAR

For iterating with indexes, there’s a curious idiom:

1
2
3
4
5
6
7
# ^Inf is shorthand for 0..Inf, read as "up to Inf".
# Z is the zip operator, which interleaves its arguments' elements into a
# single flat list.
# This makes use of the "two at a time" trick from above.
for ^Inf Z @array -> $index, $value {
    ...
}

Iterating hashes is somewhat simpler; hashes have methods, and the .kv method returns the keys and values. (It actually returns them in a flat list interleaved, which again uses “two at a time” syntax. If you only use a single loop variable, your loop iterations will alternate between a key and a value. Iterating a hash directly produces pairs, which are a first-class data type in Perl 6, but I can’t find any syntax for directly unpacking a pair within a loop header.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for %container.kv -> $key, value {
    ...
}

# No surprises here
for %container.keys -> $key {
    ...
}
for %container.values -> $value {
    ...
}

Perl 6 is very big on laziness, which is perhaps why it took fifteen years to see a release. It has the same iterable versus iterator split as Python. Given a container (iterable), ask for an iterator; given an iterator, repeatedly ask for new values. When the iterator is exhausted, it returns the IterationEnd sentinel. Exactly the same ideas. I’m not clear on the precise semantics of the for block and can’t find a simple reference, but they’re probably much like Python’s… plus a thousand special cases.

Generators, kinda

Perl 6 also has its own version of generators, though with a few extra twists. Curiously, generators are a block called gather, rather than a kind of function — this means that a one-off gather is easier to create, but a gather factory must be explicitly wrapped in a function. gather can even take a single expression rather than a block, so there’s no need for separate “generator expression” syntax as in Python.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sub inclusive-range($start, $stop) {
    return gather {
        my $val = $start;
        while $val <= $stop {
            take $val;
            $val++;
        }
    };
}

# 6 7 8 9
for inclusive-range(6, 9) -> $n {
    ...
}

Unlike Python’s yield, Perl 6’s take is dynamically scoped — i.e., take can be used anywhere in the call stack, and it will apply to the most recent gather caller. That means arbitrary-depth coroutines, which seems like a big deal to me, but the documentation mentions it almost as an afterthought.

The documentation also says gather/take can generate values lazily, depending on context,” but neglects to clarify how context factors in. The code I wrote above turns out to be lazy, but this ambiguity inclines me to use the explicit lazy marker everywhere.

Ultimately it’s a pretty flexible feature, but has a few quirks that make it a bit clumsier to use as a straightforward generator. Given that the default behavior is an eagerly-evaluated block, I think the original intention was to avoid the slightly unsatisfying pattern of “push onto an array every iteration through a loop” — instead you can now do this:

1
2
3
4
5
6
my @results = gather {
    for @source-data -> $datum {
        next unless some-test($datum);
        take process($datum);
    }
};

Using a simple (syntax-highlighted!) take puts the focus on the value being taken, rather than the details of putting it where it wants to go and how it gets there. It’s an interesting idea and I’m surprised I’ve never seen it demonstrated this way.

With gather and some abuse of Perl’s exceptionally compactable syntax, I can write a much shorter version of the infinite Perl 5 iterator above.

1
2
3
4
5
6
7
8
my @powers-of-two = lazy gather take (state $n = 1) *= 2 for ^Inf;

# Binds to $_ by default
for @powers-of-two {
    # Method calls are on $_ by default
    .say;
    last if $_ > 1000;
}

It’s definitely shorter, I’ll give it that. Leaving off the lazy in this case causes an infinite loop as Perl tries to evaluate the entire list; using a $ instead of a @ produces a “Cannot .elems a lazy list” error; using $ without lazy prints a ...-terminated representation of the infinite list and then hangs forever. I don’t quite understand the semantics of stuffing a list into a scalar ($) variable in Perl 6, and to be honest the list/array semantics seem to be far more convoluted than Perl 5, so I have no idea what’s going on here. Perl 6 has a lot of fascinating toys that are very easy to use incorrectly.

Nuts and bolts

Iterables and iterators are encoded explicitly as the Iterable and Iterator roles. An Iterable has an .iterator method that should return an Iterator. An Iterator has a .pull-one method that returns the next value, or the IterationEnd sentinel when the iterator is exhausted. Both roles offer several other methods, but they have suitable default implementations.

inclusive-range might be transformed into a class thusly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class InclusiveRangeIterator does Iterator {
    has $.range is required;
    has $!nextval = $!range.start;

    method pull-one() {
        if $!nextval > $!range.stop {
            return IterationEnd;
        }

        # Perl people would probably phrase this:
        # ++$!nextval
        # and they are wrong.
        my $val = $!nextval;
        $!nextval++;
        return $val;
    }
}

class InclusiveRange does Iterable {
    has $.start is required;
    has $.stop is required;

    # Don't even ask
    method new($start, $stop) {
        self.bless(:$start, :$stop);
    }

    method iterator() {
        InclusiveRangeIterator.new(range => self);
    }
}

# 6 7 8 9
for InclusiveRange.new(6, 9) -> $n {
    ...
}

Can we use gather to avoid the need for an extra class, just as in Python? We sure can! The only catch is that Perl 6 iterators don’t also pretend to be iterables (remember, in Python, iter(it) should produce it), so we need to explicitly return a gather block’s iterator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class InclusiveRange does Iterable {
    has $.start is required;
    has $.stop is required;

    # Don't even ask
    method new($start, $stop) {
        self.bless(:$start, :$stop);
    }

    method iterator() {
        gather {
            my $val = $!start;
            while $val <= $!stop {
                take $val;
                $val++;
            }
        }.iterator;  # <- this is important
    }
}

For sequences, Perl 6 has the Seq type. Curiously, even an infinite lazy gather is still a Seq. Indexing and length are not part of Seq — both are implemented as separate methods.

Curiously, even though Perl 6 became much stricter overall, the indexing methods don’t seem to be part of a role; you only need define them, much like Python’s __dunder__ methods. In fact, the preceding examples, does Iterator isn’t necessary at all; the for block will blindly try to call an iterator method and doesn’t much care where it came from.

I’m sure there are plenty of cute tricks possible with Perl 6, but, er, I’ll leave those as an exercise for the reader.

Ruby

Ruby is a popular and well-disguised Perl variant, if Perl just went completely all-in on Smalltalk. It has no C-style for, but it does have an infinite loop block and a very Python-esque for:

1
2
3
for value in sequence do
    ...
end

Nobody uses this. No, really, the core language documentation outright says:

The for loop is rarely used in modern ruby programs.

Instead, you’ll probably see this:

1
2
3
sequence.each do |value|
    ...
end

It doesn’t look it, but this is completely backwards from everything seen so far. All of these other languages have used external iterators, where an object is repeatedly asked to produce values and calling code can do whatever it wants with them. Here, something very different is happening. The entire do ... end block acts as a closure whose argument is value; it’s passed to the each method, which calls it once for each value in the sequence. This is an internal iterator.

Pass a block to a function which can then call it a lot” is a built-in syntactic feature of Ruby, so these kinds of iterators are fairly common. The upside is that they look almost like a custom block, so they fit naturally with the language. The downside is that all of these block-accepting methods are implemented on Array, rather than as generic functions: bsearch, bsearch_index, collect, collect!, combination, count, cycle, delete, delete_if, drop_while, each, each_index, fetch, fill, find_index, index, keep_if, map, map!, permutation, product, reject, reject!, repeated_combination, repeated_permutation, reverse_each, rindex, select, select!, sort, sort!, sort_by!, take_while, uniq, uniq!, zip. Some of those, as well as a number of additional methods, are provided by the Enumerable mixin which can express them in terms of each. I suppose the other upside is that any given type can provide its own more efficient implementation of these methods, if it so desires.

I guess that huge list of methods answers most questions about how to iterate over indices or in reverse. The only bit missing is that .. range syntax exists in Ruby as well, and it produces Range objects which also have an each method. If you don’t care about each index, you can also use the cute 3.times method.

Ruby blocks are a fundamental part of the language and built right into the method-calling syntax. Even break is defined in terms of blocks, and it works with an argument!

1
2
3
4
# This just doesn't feel like it should work, but it does.  Prints 17.
# Braces are conventionally used for inline blocks, but do/end would work too.
primes = [2, 3, 5, 7, 11, 13, 17, 19]
puts primes.each { |p| break p if p > 16 }

each() doesn’t need to do anything special here; break will just cause its return value to be 17. Somehow. (Honestly, this is the sort of thing that makes me wary of Ruby; it seems so ad-hoc and raises so many questions. A language keyword that changes the return value of a different function? Does the inside of each() know about this or have any control over it? How does it actually work? Is there any opportunity for cleanup? I have no idea, and the documentation doesn’t seem to think this is worth commenting on.)

Using blocks

Anyway, with block-passing as a language feature, the “iterator protocol” is pretty straightforward: just write a method that takes a block.

1
2
3
4
5
def each
    for value in self do
        yield value
    end
end

Be careful! Though it’s handy for iteration, that yield is not the same as Python’s yield. Ruby’s yield calls the passed-in block — yields control to the caller — with the given value(s).

I pulled a dirty trick there, because I expressed each in terms of for. So how does for work? Well, ah, it just delegates to each. Oops!

How, then, do you write an iterator completely from scratch? The obvious way is to use yield repeatedly. That gives you something that looks rather a lot like Python, though it doesn’t actually pause execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class InclusiveRange
    # This gets you a variety of other iteration methods, all defined in
    # terms of each()
    include Enumerable

    def initialize(start, stop)
        @start = start
        @stop = stop
    end
    def each
        val = @start
        while val <= @stop do
            yield val
            val += 1
        end
    end
end

# 6 7 8 9
# A `for` loop would also work here
InclusiveRange.new(6, 9).each do |n|
    ...
end

Enumerators

Well, that’s nice for creating a whole collection type, but what if I want an ad-hoc custom iterator? Enter the Enumerator class, which allows you to create… ah, enumerators.

Note that the relationship between Enumerable and Enumerator is not the same as the relationship between “iterable” and “iterator”. Most importantly, neither is really an interface. Enumerable is a set of common iteration methods that any collection type may want to have, and it expects an each to exist. Enumerator is a generic collection type, and in fact mixes in Enumerable. Maybe I should just show you some code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def inclusive_range(start, stop)
    Enumerator.new do |y|
        val = start
        while val <= stop do
            y.yield val
            val += 1
        end
    end
end

# 6 7 8 9
inclusive_range(6, 9).each do |n|
    puts n
end

Enumerator turns a block into a fully-fledged data stream. The block is free to do whatever it wants, and whenever it wants to emit a value, it calls y.yield value. The y argument is a “yielder” object, an opaque magic type; y.yield is a regular method call, unrelated to the yield keyword. (y << value is equivalent; << is Ruby’s “append” operator. And also, yes, bit shift.)

The amazing bit is that you can do this:

1
2
# 6
puts inclusive_range(6, 9).first

Enumerator has all of the Enumerable methods, one of which is first. So, that’s nice.

The really amazing bit is that if you stick some debugging code into the block passed to Enumerator.new, you’ll find that… the values are produced lazily. That call to first() doesn’t generate the full sequence and then discard everything after the first item; it only generates the first item, then stops.

(Beware! The values are produced lazily, but many Enumerable methods are eager. I’ll get back to this in a moment.)

Hang on, didn’t I say yield doesn’t pause execution? Didn’t I also say the above yield is just a method call, not the keyword?

I did! And I wasn’t lying. The really truly amazing bit, which I’ve seen shockingly little excitement about while researching this, is that under the hood, this is all using Fibers. Coroutines.

Enumerator.new takes a block and turns it into a coroutine. Every time something wants a value from the enumerator, it resumes the coroutine. The yielder object’s yield method then calls Fiber.yield() to pause the coroutine. It works just like Lua, but it’s designed to work with existing Ruby conventions, like the piles of internal iteration methods developers expect to find.

So Enumerator.new can produce Python-style generators, albeit in a slightly un-native-looking way. There’s also one other significant difference: an Enumerator can restart itself for each method called on it, simply by calling the block again. This code will print 6 three times:

1
2
3
4
ir = inclusive_range(6, 9)
puts ir.first
puts ir.first
puts ir.first

For something like an inclusive range object, that’s pretty nice. For something like a file, maybe not so nice. It also means you need to be sure to put your setup code inside the block passed to Enumerator.new, or funny things will happen when the block is restarted.

Something like generators

But wait, there’s more. Specifically, this common pattern, which pretty much lets you ignore Enumerator.new entirely.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def some_iterator_method
    # __method__ is the current method name.  block_given? is straightforward.
    return enum_for(__method__) unless block_given?

    # An extremely accurate simulation of a large list.
    (1..1000).each do |item|
        puts "having a look at #{item}"
        # Blocks are invisible to `yield`; this will yield to the block passed
        # to some_iterator_method.
        yield item if item.even?
    end
end

# having a look at 1
# having a look at 2
# 2
puts some_iterator_method.first

Okay, bear with me.

First, some_iterator_method() is called. It doesn’t have a block attached, so block_given? is false, and it returns enum_for(...), whatever that does. Then first() is called on the result, and that produces a single element and stops.

The above code has no magic yielder object. It uses the straightforward yield keyword. Why doesn’t it loop over the entire range from 1 to 1000?

Remember, Enumerator uses coroutines under the hood. One neat thing coroutines can do is pause code that doesn’t know it’s in a coroutine. Python’s generators pause themselves with yield, and the mere presence of yield turns a function into a generator; but in Lua or Ruby or any other language with coroutines, any function can pause at any time. You can even make a closure that pauses, then pass that closure to another function which calls it, without that function ever knowing anything happened.

(This arguably has some considerable downsides as well — it becomes difficult to know when or where your code might pause, which makes reasoning about the order of operations much harder. That’s why Python and some other languages opted to implement async IO with an await keyword — anyone reading the code knows that it can only pause where an await appears.)

(Also, I’m saying “pause” here instead of “yield” because Ruby has really complicated the hell out of this by already having a yield keyword that does something totally different, and naming its coroutine pause function yield.)

Anyway, that’s exactly what’s happening here. enum_for returns an Enumerator that wraps the whole method. (It doesn’t need to know self, because enum_for is actually a method inherited from Object, goodness gracious.) When the Enumerator needs some items, it calls the method a second time with its own block, running in a coroutine, just like a block passed to Enumerator.new. Eventually the method emits a value using the regular old yield keyword, and that value reaches the block created by Enumerator, and that block pauses the call stack. It doesn’t matter that Range.each is eager, because its iteration is still happening in code somewhere, and that code is part of a call stack in a coroutine, so it can be paused. Eventually the coroutine is no longer useful and gets thrown away, so the eager each call simply stops midway through its work, unaware that anything unusual ever happened.

In fact, despite being an Object method, enum_for isn’t special at all. It can be expressed in pure Ruby very easily:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def my_enum_for(receiver, method)
    # Enumerator.new creates a coroutine-as-iteration-source, as above.
    Enumerator.new do |y|
        # All it does is call the named method with a trivial block.  Every
        # time the method produces a value with the `yield` keyword, we pass it
        # along to the yielder object, which pauses the coroutine.
        # This is nothing more than a bridge between "yield" in the Ruby block
        # sense, and "yield" in the coroutine sense.
        receiver.send method do |value|
            y.yield value
        end
    end
end

So, that’s pretty neat. Incidentally, several built-in methods like Array.each and Enumerable.collect act like this, returning an Enumerator if called with no arguments.

Full laziness

I mentioned above that while an Enumerator fetches items lazily, many of the methods are eager. To clarify what I mean by that, consider:

1
2
3
4
5
inclusive_range(6, 9000).collect {
    |n|
    puts "considering #{n}"
    "a" * n
}.first(3)

collect() is one of those common Enumerable methods. You might know it by its other name, map(). Ruby is big on multiple names for the same thing: one that everyone uses in practice, and another that people who don’t use Ruby will actually recognize.

Even though this code ultimately only needs three items, and even though there’s all this coroutine machinery happening under the hood, this still evaluates the entire range. Why?

The problem is that collect() has always returned an array, and is generally expected to continue doing so. It has no way of knowing that it’s about to be fed into first. Rather than violate this API, Ruby added a new method, Enumerable.lazy. This stops after three items:

1
2
3
4
5
inclusive_range(6, 9000).lazy.collect {
    |n|
    puts "considering #{n}"
    "a" * n
}.first(3)

All this does is return an Enumerator::Lazy object, which has lazy implementations of various methods that would usually do a full iteration. Methods like first(3) are still “eager” (in the sense that they just return an array), since their results have a fixed finite size.

This seems a little clunky to me, since the end result is still an object with a collect method that doesn’t return an array. I suspect the real reason is just that Enumerator was added first; even though the coroutine support was already there, Enumerator::Lazy only came along later. Changing existing eager methods to be lazy can, ah, cause problems.

The only built-in type that seems to have interesting lazy behavior is Range, which can be infinite.

1
2
3
4
# Whoops, infinite loop.
(1..Float::INFINITY).select { |n| n.even? }.first(5)
# 2 4 6 8 10
(1..Float::INFINITY).lazy.select { |n| n.even? }.first(5)

A loose end

I think the only remaining piece of this puzzle is something I stumbled upon but can’t explain. Enumerator has a next method, which returns the next value or raises StopIteration.

Wow, that sounds awfully familiar.

But I can’t find anything in the language or standard library that uses this, with one single and boring exception: the loop construct. It catches StopIteration and exits the block.

1
2
3
4
5
6
enumerator = [1, 2, 3].each
loop do
    while true do
        puts enumerator.next
    end
end

On the fourth call, next() will be out of items, so it raises StopIteration. Removing the loop block makes this quite obvious.

That’s it. That’s the only use of it in the language, as far as I can tell. It seems almost… vestigial. It’s also a little weird, since it keeps the current iteration state inside the Enumerator, unlike any of its other methods. But it’s also the only form of external iteration that I know of in Ruby, and that’s handy to have sometimes.

And, uh, so on

I intended to foray into a few more languages, including some recent lower-level friends like C++/Rust/Swift, but this post somehow spiraled out of control and hit nine thousand words. No one has read this far.

Handily, it turns out that the above languages pretty much cover the basic ways of approaching iteration; if any of this made sense, other languages will probably seem pretty familiar.

  • C++’s iteration protocol(s) has existed for a long time in the form of ++it to advance an iterator and *it to read the current item, though this was usually written manually in a C-style for loop, and loops were generally terminated with an explicit endpoint.

    C++11 added the range-based for, which does basically the same stuff under the hood. Idiomatic C++ is inscrutible, but maybe you can make sense of this project which provides optionally-infinite iterable ranges.

  • Rust has an entire (extremely well-documented) iter module with numerous iterators and examples of how to create your own. The core of the Iterator trait is just a next method which returns None when exhausted. It also has a lot of handy Ruby-like chainable methods, so working directly with iterators is more common in Rust than in Python.

  • Swift also has (well-documented) simple next-based iterators, which return nil when exhausted, effectively the same API as Rust.

I could probably keep finding more subsequent languages indefinitely, so I’m gonna take a break from this now.

Iteration in one language, then all the others

Post Syndicated from Eevee original https://eev.ee/blog/2016/11/18/iteration-in-one-language-then-all-the-others/

You may have noticed that I like comparing features across different languages. I hope you like it too, because I’m doing it again.

Python

I’m most familiar with Python, and iteration is one of its major concepts, so it’s a good place to start and a good overview of iteration. I’ll dive into Python a little more deeply, then draw parallels to other languages.

Python only has one form of iteration loop, for. (Note that all of these examples are written for Python 3; in Python 2, some of the names are slightly different, and fewer things are lazy.)

1
2
for value in sequence:
    ...

in is also an operator, so value in sequence is also the way you test for containment. This is either very confusing or very satisfying.

When you need indices, or specifically a range of numbers, you can use the built-in enumerate or range functions. enumerate works with lazy iterables as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# This makes use of tuple unpacking to effectively return two values at a time
for index, value in enumerate(sequence):
    ...

# Note that the endpoint is exclusive, and the default start point is 0.  This
# matches how list indexing works and fits the C style of numbering.
# 0 1 2 3 4
for n in range(5):
    ...

# Start somewhere other than zero, and the endpoint is still exclusive.
# 1 2 3 4
for n in range(1, 5):
    ...

# Count by 2 instead.  Can also use a negative step to count backwards.
# 1 3 5 7 9
for n in range(1, 11, 2):
    ...

dicts (mapping types) have several methods for different kinds of iteration. Additionally, iterating over a dict directly produces its keys.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for key in mapping:
    ...

for key in mapping.keys():
    ...

for value in mapping.values():
    ...

for key, value in mapping.items():
    ...

Python distinguishes between an iterable, any value that can be iterated over, and an iterator, a value that performs the actual work of iteration. Common iterable types include list, tuple, dict, str, and set. enumerate and range are also iterable.

Since Python code rarely works with iterators directly, and many iterable types also function as their own iterators, it’s common to hear “iterator” used to mean an iterable. To avoid this ambiguity, and because the words are fairly similar already, I’ll refer to iterables as containers like the Python documentation sometimes does. Don’t be fooled — an object doesn’t actually need to contain anything to be iterable. Python’s range type is iterable, but it doesn’t physically contain all the numbers in the range; it generates them on the fly as needed.

The fundamental basics of iteration are built on these two ideas. Given a container, ask for an iterator; then repeatedly advance the iterator to get new values. When the iterator runs out of values, it raises StopIteration. That’s it. In Python, those two steps can be performed manually with the iter and next functions. A for loop is roughly equivalent to:

1
2
3
4
5
6
7
8
9
_iterator = iter(container)
_done = False
while not _done:
    try:
        value = next(_iterator)
    except StopIteration:
        _done = True
    else:
        ...

An iterator can only move forwards. Once a value has been produced, it’s lost, at least as far as the iterator is concerned. These restrictions are occasionally limiting, but they allow iteration to be used for some unexpected tasks. For example, iterating over an open file produces its lines — even if the “file” is actually a terminal or pipe, where data only arrives once and isn’t persistently stored anywhere.

Generators

A more common form of “only forwards, only once” in Python is the generator, a function containing a yield statement. For example:

1
2
3
4
5
6
7
8
9
def inclusive_range(start, stop):
    val = start
    while val <= stop:
        yield val
        val += 1

# 6 7 8 9
for n in inclusive_range(6, 9):
    ...

Calling a generator function doesn’t execute its code, but immediately creates a generator iterator. Every time the iterator is advanced, the function executes until the next yield, at which point the yielded value is returned as the next value and the function pauses. The next iteration will then resume the function. When the function returns (or falls off the end), the iterator stops.

Since the values here are produced by running code on the fly, it’s of course impossible to rewind a generator.

The underlying protocol is straightforward. A container must have an __iter__ method that returns an iterator, corresponding to the iter function. An iterator must have a __next__ method that returns the next item, corresponding to the next function. If the iterator is exhausted, __next__ must raise StopIteration. An iterator must also have an __iter__ that returns itself — this is so an iterator can be used directly in a for loop.

The above inclusive range generator might be written out explicitly like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        return InclusiveRangeIterator(self)

class InclusiveRangeIterator:
    def __init__(self, incrange):
        self.incrange = incrange
        self.nextval = incrange.start

    def __iter__(self):
        return self

    def __next__(self):
        if self.nextval > self.incrange.stop:
            raise StopIteration

        val = self.nextval
        self.nextval += 1
        return val

This might seem like a lot of boilerplate, but note that the iterator state (here, nextval) can’t go on InclusiveRange directly, because then it’d be impossible to iterate over the same object twice at the same time. (Some types, like files, do act as their own iterators because they can’t meaningfully be iterated in parallel.)

Even Python’s internals work this way. Try iter([]) in a Python REPL; you’ll get a list_iterator object.

In truth, it is a lot of boilerplate. User code usually uses this trick:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        val = self.start
        while val <= self.stop:
            yield val
            val += 1

Nothing about this is special-cased in any way. Now __iter__ is a generator, and calling a generator function returns an iterator, so all the constraints are met. It’s a really easy way to convert a generator function into a type. If this class were named inclusive_range instead, it would even be backwards-compatible; consuming code wouldn’t even have to know it’s a class.

Reversal

But why would you do this? One excellent reason is to add support for other sequence-like operations, like reverse iteration support. An iterator can’t be reversed, but a container might support being iterated in reverse:

1
2
3
4
fruits = ['apple', 'orange', 'pear']
# pear, orange, apple
for value in reversed(fruits):
    ...

Iterating a lazy container doesn’t always make sense, but when it does, it’s easy to implement by returning an iterator from __reversed__.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class InclusiveRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        val = self.start
        while val <= self.stop:
            yield val
            val += 1

    def __reversed__(self):
        val = self.stop
        while val >= self.start:
            yield val
            val -= 1

Note that Python does not have “bi-directional” iterators, which can freely switch between forwards and reverse iteration on the fly. A bidirectional iterator is useful for cases like doubly-linked lists, where it’s easy to get from one value to the next or previous value, but not as easy to start from the beginning and get the tenth item.

Iteration is often associated with sequences, though they’re not quite the same. In Python, a sequence is a value that can be indexed in order as container[0], container[1], etc. (Indexing is implemented with __getitem__.) All sequences are iterable; in fact, if a type implements indexing but not __iter__, the iter function will automatically try indexing it from zero instead. reversed does the same, though it requires that the type implement __len__ as well so it knows what the last item is.

Much of this is codified more explicitly in the abstract base classes in collections.abc, which also provide default implementations of common methods.

Not all iterables are sequences, and not every value that can be indexed is a sequence! Python’s mapping type, dict, uses indexing to fetch the value for a key; but a dict has no defined order and is not a sequence. However, a dict can still be iterated over, producing its keys (in arbitrary order). A set can be iterated over, producing its values in arbitrary order, but it cannot be indexed at all. A type could conceivably use indexing for something more unusual and not be iterable at all.

A common question

It’s not really related to iteration, but people coming to Python from Ruby often ask why len() is a built-in function, rather than a method. The same question could be asked about iter() and next() (and other Python builtins), which more or less delegate directly to a “reserved” __dunder__ method anyway.

I believe the technical reason is simply the order that features were added to the language in very early days, which is not very interesting.

The philosophical reason, imo, is that Python does not reserve method names for fundamental operations. All __dunder__ names are reserved, of course, but everything else is fair game. This makes it obvious when a method is intended to add support for some language-ish-level operation, even if you don’t know what all the method names are. Occasionally a third-party library invents its own __dunder__ name, which is a little naughty, but the same reasoning applies: “this is a completely generic interface that some external mechanism is expected to use”.

This approach also avoids a namespacing problem. In Ruby, a Rectangle class might want to have width and length attributes… but the presence of length means a Rectangle looks like it functions as a sequence! Since “interface” method names aren’t namespaced in any way, there is no way to say that you don’t mean the same thing as Array.length.

It’s a minor quibble, since everything’s dynamically typed anyway, so the real solution is “well don’t try to iterate a rectangle then”. And Python does use keys as a method name in some obscure cases. Oh, well.

Some cute tricks

The distinction between sequences and iterables can cause some subtle problems. A lot of code that only needs to loop over items can be passed, e.g., a generator. But this can take some conscious care. Compare:

1
2
3
4
5
6
7
8
# This will NOT work with generators, which don't support len() or indexing
for i in range(len(container)):
    value = container[i]
    ...

# But this will
for i, value in enumerate(container):
    ...

enumerate also has a subtle, unfortunate problem: it cannot be combined with reversed. This has bit me more than once, surprisingly.

1
2
3
4
5
6
7
# This produces a TypeError from reversed()
for i, value in reversed(enumerate(container)):
    ...

# This almost works, but the index goes forwards while the values go backwards
for i, value in enumerate(reversed(container)):
    ...

The problem is that enumerate can’t, in general, reverse itself. It counts up from zero as it iterates over its argument; reversing it means starting from one less than the number of items, but it doesn’t yet know how many items there are. But if you just want to run over a list or other sequence backwards, this feels very silly. A trivial helper can make it work:

1
2
3
4
5
def revenum(iterable, end=0):
    start = len(iterable) + end
    for value in iterable:
        start -= 1
        yield start, value

I’ve run into other odd cases where it’s frustrating that a generator doesn’t have a length or indexing. This especially comes up if you make heavy use of generator expressions, which are a very compact way to write a one-off generator. (Python also has list, set, and dict “comprehensions”, which have the same syntax but use brackets or braces instead of parentheses, and are evaluated immediately instead of lazily.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def get_big_fruits():
    fruits = ['apple', 'orange', 'pear']
    return (fruit.upper() for fruit in fruits)

# Roughly equivalent to:
def get_big_fruits():
    fruits = ['apple', 'orange', 'pear']
    def genexp():
        for fruit in fruits:
            yield fruit.upper()
    return genexp()

If you had thousands of fruits, doing this could save a little memory. The caller is probably just going to loop over them to print them out (or whatever), so using a generator expression means that each uppercase name only exists for a short time; returning a list would mean creating a lot of values all at once.

Ah, but now the caller wants to know how many fruits there are, with minimal fuss. Generators have no length, so that won’t work. Turning this generator expression into a class that also has a __len__ would be fairly ridiculous. So you resort to some slightly ugly trickery.

1
2
3
4
5
6
7
8
# Ugh.  Obvious, but feels really silly.
count = 0
for value in container:
    count += 1

# Better, but weird if you haven't seen it before.  Creates another generator
# expression that just yields 1 for every item, then sums them up.
count = sum(1 for _ in container)

Or perhaps you want the first big fruit? Well, [0] isn’t going to help. This is one of the few cases where using iter and next directly can be handy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Oops!  If the container is empty, this raises StopIteration, which you
# probably don't want.
first = next(iter(container))

# Catch the StopIteration explicitly.
try:
    first = next(iter(container))
except StopIteration:
    # This code runs if there are zero items
    ...

# Regular loop that terminates immediately.
# The "else" clause only runs when the container ends naturally (i.e. NOT if
# the loop breaks), which can only happen here if there are zero items.
for value in container:
    first = value
    break
else:
    ...

# next() -- but not __next__()! -- takes a second argument indicating a
# "default" value to return when the iterator is exhausted.  This only makes
# sense if you were going to substitute a default value anyway; doing this and
# then checking for None will do the wrong thing if the container actually
# contained a None.
first = next(iter(container), None)

Other tricks with iter and next include skipping the first item (or any number of initial items, though consider itertools.islice for more complex cases):

1
2
3
4
5
6
7
it = iter(container)
next(it, None)  # Use second arg to ignore StopIteration
for value in it:
    # Since the first item in the iterator has already been consumed, this loop
    # will start with the second item.  If the container had only one or zero
    # items, the loop will get StopIteration and end immediately.
    ...

Iterating two (or more) items at a time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Obvious way: call next() inside the loop.
it = iter(container)
for value1 in it:
    # With an odd number of items, this will raise an uncaught StopIteration!
    # Catch it or provide a default value.
    value2 = next(it)
    ...

# Moderately clever way: abuse zip().
# zip() takes some number of containers and iterates over them pairwise.  It
# stores an iterator for each container.  When it's asked for its next item, it
# in turn asks all of its iterators for their next items, and returns them as a
# set.  But by giving it the same exact iterator twice, it'll end up advancing
# that iterator twice and returning two consecutive items.
# Note that zip() stops early as soon as an iterator runs dry, so if the
# container has an odd number of items, this will silently skip the last one.
# If you don't want that, use itertools.zip_longest instead.
it = iter(container)
for line1, line2 in zip(it, it):
    ...

# Far too clever way: exactly the same as above, but written as a one-liner.
# zip(iter(), iter()) would create two separate iterators and break the trick.
# List multiplication produces a list containing the same iterator twice.
# One advantage of this is that the 2 can be a variable.
for value1, value2 in zip(*[iter(container)] * 2):
    ...

Wow, that got pretty weird towards the end. Somehow this turned into Stupid Python Iterator Tricks. Don’t worry; I know far less about these other languages.

C

C is an extreme example with no iterator protocol whatsoever. It barely even supports sequences; arrays are just pointer math. All it has is the humble C-style for loop:

1
2
3
4
5
int[] container = {...};
for (int i = 0; i < container_length; i++) {
    int value = container[i];
    ...
}

Unfortunately, it’s really the best C can do. C arrays don’t know their own length, so no matter what, the developer has to provide it some other way. Even without that, a built-in iterator protocol is impossible — iterators require persistent state (the current position) to be bundled alongside code (how to get to the next position). That pretty much means one of two things: closures or objects. C has neither.

Lua

Lua has two forms of for loop. The first is a simple numeric loop.

1
2
3
4
-- 1 3 5 7 9 11
for value = 1, 11, 2 do
    ...
end

The three values after the = are the start, end, and step. They work similarly to Python’s range(), except that everything in Lua is always inclusive, so for i = 1, 5 will count from 1 to 5.

The generic form uses in.

1
2
3
for value in iterate(container) do
    ...
end

iterate isn’t a special name here, but most of the time a generic for will look like this.

See, Lua doesn’t have objects. It has enough tools that you can build objects fairly easily, but the core language has no explicit concept of objects or method calls. An iterator protocol needs to bundle state and behavior somehow, so Lua uses closures for that. But you still need a way to get that closure, and that means calling a function, and a plain value can’t have functions attached to it. So iterating over a table (Lua’s single data structure) looks like this:

1
2
for key, value in pairs(container) do
    ...

pairs is a built-in function. Lua also has an ipairs, which iterates over consecutive keys and values starting from key 1. (Lua starts at 1, not 0. Lua also represents sequences as tables with numeric keys.)

Lua does have a way to associate “methods” with values, which is how objects are made, but for loops almost certainly came first. So iteration is almost always over a function call, not a bare value.

Also, because objects are built out of tables, having a default iteration behavior for all tables would mean having the same default for all objects. Nothing’s stopping you from using pairs on an object now, but at least that looks deliberate. It’s easy enough to give objects iteration methods and iterate over obj:iter(), though it’s slightly unfortunate that every type might look slightly different. Unfortunately, Lua has no truly generic interface for “this can produce a sequence of values”.

The iteration protocol is really just calling a function repeatedly to get new values. When the function returns nil, the iteration ends. (That means nil can never be part of an iteration! You can work around this by returning two values and making sure the first one is something else that’s never nil, like an index.) The manual explains the exact semantics of the generic for with Lua code, a move I wish every language would make.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- This:
for var_1, ···, var_n in explist do block end

-- Is equivalent to this:
do
    local _func, _state, _lastval = explist
    while true do
        local var_1, ···, var_n = _func(_state, _lastval)
        if var_1 == nil then break end
        _lastval = var_1
        block
    end
end

Important to note here is the way multiple-return works in Lua. Lua doesn’t have tuples; multiple assignment is a distinct feature of the language, and multiple return works exactly the same way as multiple assignment. If there are too few values, the extra variables become nil; if there are too many values, the extras are silently discarded.

So in the line local _func, _state, _lastval = explist, the “state” value _state and the “last loop value” _lastval are both optional. Lua doesn’t use them, except to pass them back to the iterator function _func, and they aren’t visible to the for loop body. An iterator can thus be only a function and nothing else, letting _state and _lastval be nil — but they can be a little more convenient at times. Compare:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Usual approach: return only a closure, completely ignoring state and lastval
local function inclusive_range(start, stop)
    local nextval = start
    return function()
        if nextval > stop then
            return
        end
        local val = nextval
        nextval = nextval + 1
        return val
    end
end

-- Alternative approach, not using closures at all.  This is the function we
-- return; each time it's called with the same "state" value and whatever it
-- returned last time it was called.
-- This function could even be written exactly a method (a la Python's
-- __next__), where the state value is the object itself.
local function inclusive_range_iter(stop, prev)
    -- "stop" is the state value; "prev" is the last value we returned
    local val = prev + 1
    if val > stop then
        return
    end
    return val
end
local function inclusive_range(start, stop)
    -- Return the iterator function, and pass it the stop value as its state.
    -- The "last value" is a little weird here; on the first iteration, there
    -- is no last value.  Here we can fake it by subtracting 1 from the
    -- starting number, but in other cases, it might make more sense if the
    -- "state" were a table containing both the start and stop values.
    return inclusive_range_iter, stop, start - 1
end

-- 6 7 8 9 with both implementations
for n in inclusive_range(6, 9) do
    ...
end

Lua doesn’t have generators. Surprisingly, it has fully-fledged coroutines — call stacks that can be paused at any time. Lua sometimes refers to them as “threads”, but only one can be running at a time. Effectively they’re like Python generators, except you can call a function which calls a function which calls a function which eventually yields, and the entire call stack from that point up to the top of the coroutine is paused and preserved.

In Python, the mere presence of yield causes a function to become a generator. In Lua, since any function might try to yield the coroutine it’s currently in, a function has to be explicitly called as a coroutine using functions in the coroutine library.

But this post is about iterators, not coroutines. Coroutines don’t function as iterators, but Lua provides a coroutine.wrap() that takes a function, turns it into a coroutine, and returns a function that resumes the coroutine. That’s enough to allow a coroutine to be turned into an iterator. The Lua book even has a section about this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
local function inclusive_range(start, stop)
    local val = start
    while val <= stop do
        coroutine.yield(val)
        val = val + 1
    end
end
-- Unfortunately, coroutine.wrap() doesn't have any way to pass initial
-- arguments to the function it wraps, so we need this dinky wrapper.
-- I should clarify that the ... here is literal syntax for once.
local function iter_coro(entry_point, ...)
    local args = {...}
    return coroutine.wrap(function()
        entry_point(unpack(args))
    end)
end

# 6 7 8 9
for n in iter_coro(inclusive_range, 6, 9) do
    ...
end

So, that’s cool. Lua doesn’t do a lot for you — unfortunately, list processing tricks can be significantly more painful in Lua — but it has some pretty interesting primitives that compose with each other remarkably well.

Perl 5

Perl has a very straightforward C-style for loop, which looks and works exactly as you might expect. my, which appears frequently in these examples, is just local variable declaration.

1
2
3
for (my $i = 0; $i < 10; $i++) {
    ...
}

Nobody uses it. Everyone uses the iteration-style for loop. (It’s occasionally called foreach, which is extra confusing because both for and foreach can be used for both kinds of loop. Nobody actually uses the foreach keyword.)

1
2
3
for my $value (@container) {
    ...
}

The iteration loop can be used for numbers, as well, since Perl has a .. inclusive range operator. For iterating over an array with indexes, Perl has the slightly odd $#array syntax, which is the index of the last item in @array. Creating something like Python’s enumerate is a little tricky in Perl, because you can’t directly return a list of lists, and the workaround doesn’t support unpacking. It’s complicated.

1
2
3
4
5
6
7
8
for my $i (1..10) {
    ...
}

for my $index (0..$#array) {
    my $value = $array[$index];
    ...
}

A hash (Perl’s mapping “shape”) can’t be iterated directly. Or, well, it can, but the loop will alternate between keys and values because Perl is weird. Instead you need the keys or values built-in functions to get the keys or values as regular lists. (These functions also work on arrays as of Perl 5.12.)

1
2
3
for my $key (keys %container) {
    ...
}

For iterating over both keys and values at the same time, Perl has an each function. The behavior is a little weird, since every call to the function advances an internal iterator inside the hash and returns a new pair. If a loop using each terminates early, the next use of each may silently start somewhere in the middle of the hash, skipping a bunch of its keys. This is probably why I’ve never seen each actually used.

1
2
3
while (my ($key, value) = each %container) {
    ...
}

Despite being very heavily built on the concept of lists, Perl doesn’t have an explicit iterator protocol, and its support for lazy iteration in general is not great. When they’re used at all, lazy iterators tend to be implemented as ad-hoc closures or callable objects, which require a while loop:

1
2
3
4
my $iter = custom_iterator($collection);
while (my $value = $iter->()) {
    ...
}

Here be dragons

It is possible to sorta-kinda fake an iterator protocol. If you’re not familiar, Perl’s variables come in several different “shapes” — hash, array, scalar — and it’s possible to “tie” a variable to a backing object which defines the operations for a particular shape. It’s a little like operator overloading, except that Perl also has operator overloading and it’s a completely unrelated mechanism. In fact, you could use operator overloading to make your object return a tied array when dereferenced as an array. I am talking gibberish now.

Anyway, the trick is to tie an array and return a new value for each consecutive fetch of an index. Like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
use v5.12;
package ClosureIterator;

# This is the tie "constructor" and just creates a regular object to store
# our state
sub TIEARRAY {
    my ($class, $closure) = @_;
    my $self = {
        closure => $closure,
        nextindex => 0,
    };
    return bless $self, $class;
}

# This is called to fetch the item at a particular index; for an iterator,
# only the next item is valid
sub FETCH {
    my ($self, $index) = @_;

    if ($index == 0) {
        # Always allow reading index 0, both to mean a general "get next
        # item" and so that looping over the same array twice will work as
        # expected
        $self->{nextindex} = 0;
    }
    elsif ($index != $self->{nextindex}) {
        die "ClosureIterator does not support random access";
    }

    $self->{nextindex}++;
    return $self->{closure}->();
}

# The built-in shift() function means "remove and return the first item", so
# it's a good fit for a general "advance iterator"
sub SHIFT {
    my ($self) = @_;
    $self->{nextindex} = 0;
    return $self->{closure}->();
}

# Yes, an array has to be able to report its own size...  but luckily, a for
# loop fetches the size on every iteration!  As long as this returns
# increasingly large values, such a loop will continue indefinitely
sub FETCHSIZE {
    my ($self) = @_;
    return $self->{nextindex} + 1;
}

# Most other tied array operations are for modifying the array, which makes no
# sense here.  They're deliberately omitted, so trying to use them will cause a
# "can't locate object method" error.


package main;

# Create an iterator that yields successive powers of 2
tie my @array, 'ClosureIterator', sub {
    # State variables are persistent, like C statics
    state $next = 1;
    my $ret = $next;
    $next *= 2;
    return $ret;
};

# This will print out 1, 2, 4, 8, ... 1024, at which point the loop breaks
for my $i (@array) {
    say $i;
    last if $i > 1000;
}

This transparently works like any other array… sort of. You can loop over it (forever!); you can use shift to pop off the next value; you can stop a loop and then continue reading from it later.

Unfortunately, this is just plain weird, even for Perl, and I very rarely see it used. Ultimately, Perl’s array operations come in a set, and this is an array that pretends not to be able to do half of them. Even Perl developers are likely to be surprised by an array, a fundamental “shape” of the language, with quirky behavior.

The biggest problem is that, as I said, Perl is heavily built on lists. Part of that design is that @arrays are very eager to spill their contents into a surrounding context. Naïvely passing an array to a function, for example, will expand its elements into separate arguments, losing the identity of the array itself (and losing any tied-ness). Interpolating an array into a string automatically space-separates its elements.

Unlike a for loop, these operations only ask the array for its size once — so rather than printing an infinite sequence, they’ll print a completely arbitrary prefix of it. In the case above, spilling a fresh array will read one item; spilling the array after the example loop will read eleven items. So while a tied array works nicely with a for loop, it’s at odds with the most basic rules of Perl syntax.

Also, Perl’s list-based nature means it’s attracted a lot of list-processing utilities — but these naturally expect to receive a spilled list of arguments and cannot work with a lazy iterator.

I found multiple mentions of the List::Gen module while looking into this. I’d never heard of it before and I’ve never seen it used, but it tries to fill this gap (and makes use of array tying, among other things). It’s a bit weird, and its source code is extremely weird, and it took me twenty minutes to figure out how it was using <...> as a quoting construct.

(<...> in Perl does filename globbing, so it’s usually seen as <*.txt>. The same syntax is used for reading from a filehandle, which makes this confusing and ambiguous, so it’s generally discouraged in favor of the built-in glob function which does the same thing. Well, it turns out that <...> must just call glob() at Perl-level, because List::Gen manages to co-opt this syntax simply by exporting its own glob function. Perl is magical.)

Perl 6

Perl 6, a mad experiment to put literally every conceivable feature into one programming language, naturally has a more robust concept of iteration.

At first glance, many of the constructs are similar to those of Perl 5. The C-style for loop still exists for some reason, but has been disambiguated under the loop keyword.

1
2
3
4
5
6
7
8
loop (my $i = 1; $i <= 10; $i++) {
    ...
}

# More interestingly, loop can be used completely bare for an infinite loop
loop {
    ...
}

The for block has slightly different syntax and a couple new tricks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Unlike in Perl 5, $value is automatically declared and scoped to the block,
# without needing an explicit 'my'
for @container -> $value {
    ...
}

for 1..10 -> $i {
    ...
}

# This doesn't iterate in pairs; it reads two items at a time from a flat list!
for 1..10 -> $a, $b {
    ...
}

Not apparent in the above code is that ranges are lazy in Perl 6, as in Python; the elements are computed on demand. In fact, Perl 6 supports a range like 1..Inf.

Loop variables are also aliases. By default they’re read-only, so this appears to work like Python… but Perl has always had a C-like language-level notion of “slots” that Python does not, and it becomes apparent if the loop variable is made read-write:

1
2
3
4
5
6
7
8
my @fruits = «apple orange pear»;
for @fruits -> $fruit is rw {
    # This is "apply method inplace", i.e. shorthand for:
    # $fruit = $fruit.uc;
    # Yes, you can do that.
    $fruit .= uc;
}
say @fruits;  # APPLE ORANGE PEAR

For iterating with indexes, there’s a curious idiom:

1
2
3
4
5
6
7
# ^Inf is shorthand for 0..Inf, read as "up to Inf".
# Z is the zip operator, which interleaves its arguments' elements into a
# single flat list.
# This makes use of the "two at a time" trick from above.
for ^Inf Z @array -> $index, $value {
    ...
}

Iterating hashes is somewhat simpler; hashes have methods, and the .kv method returns the keys and values. (It actually returns them in a flat list interleaved, which again uses “two at a time” syntax. If you only use a single loop variable, your loop iterations will alternate between a key and a value. Iterating a hash directly produces pairs, which are a first-class data type in Perl 6, but I can’t find any syntax for directly unpacking a pair within a loop header.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for %container.kv -> $key, value {
    ...
}

# No surprises here
for %container.keys -> $key {
    ...
}
for %container.values -> $value {
    ...
}

Perl 6 is very big on laziness, which is perhaps why it took fifteen years to see a release. It has the same iterable versus iterator split as Python. Given a container (iterable), ask for an iterator; given an iterator, repeatedly ask for new values. When the iterator is exhausted, it returns the IterationEnd sentinel. Exactly the same ideas. I’m not clear on the precise semantics of the for block and can’t find a simple reference, but they’re probably much like Python’s… plus a thousand special cases.

Generators, kinda

Perl 6 also has its own version of generators, though with a few extra twists. Curiously, generators are a block called gather, rather than a kind of function — this means that a one-off gather is easier to create, but a gather factory must be explicitly wrapped in a function. gather can even take a single expression rather than a block, so there’s no need for separate “generator expression” syntax as in Python.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sub inclusive-range($start, $stop) {
    return gather {
        my $val = $start;
        while $val <= $stop {
            take $val;
            $val++;
        }
    };
}

# 6 7 8 9
for inclusive-range(6, 9) -> $n {
    ...
}

Unlike Python’s yield, Perl 6’s take is dynamically scoped — i.e., take can be used anywhere in the call stack, and it will apply to the most recent gather caller. That means arbitrary-depth coroutines, which seems like a big deal to me, but the documentation mentions it almost as an afterthought.

The documentation also says gather/take can generate values lazily, depending on context,” but neglects to clarify how context factors in. The code I wrote above turns out to be lazy, but this ambiguity inclines me to use the explicit lazy marker everywhere.

Ultimately it’s a pretty flexible feature, but has a few quirks that make it a bit clumsier to use as a straightforward generator. Given that the default behavior is an eagerly-evaluated block, I think the original intention was to avoid the slightly unsatisfying pattern of “push onto an array every iteration through a loop” — instead you can now do this:

1
2
3
4
5
6
my @results = gather {
    for @source-data -> $datum {
        next unless some-test($datum);
        take process($datum);
    }
};

Using a simple (syntax-highlighted!) take puts the focus on the value being taken, rather than the details of putting it where it wants to go and how it gets there. It’s an interesting idea and I’m surprised I’ve never seen it demonstrated this way.

With gather and some abuse of Perl’s exceptionally compactable syntax, I can write a much shorter version of the infinite Perl 5 iterator above.

1
2
3
4
5
6
7
8
my @powers-of-two = lazy gather take (state $n = 1) *= 2 for ^Inf;

# Binds to $_ by default
for @powers-of-two {
    # Method calls are on $_ by default
    .say;
    last if $_ > 1000;
}

It’s definitely shorter, I’ll give it that. Leaving off the lazy in this case causes an infinite loop as Perl tries to evaluate the entire list; using a $ instead of a @ produces a “Cannot .elems a lazy list” error; using $ without lazy prints a ...-terminated representation of the infinite list and then hangs forever. I don’t quite understand the semantics of stuffing a list into a scalar ($) variable in Perl 6, and to be honest the list/array semantics seem to be far more convoluted than Perl 5, so I have no idea what’s going on here. Perl 6 has a lot of fascinating toys that are very easy to use incorrectly.

Nuts and bolts

Iterables and iterators are encoded explicitly as the Iterable and Iterator roles. An Iterable has an .iterator method that should return an Iterator. An Iterator has a .pull-one method that returns the next value, or the IterationEnd sentinel when the iterator is exhausted. Both roles offer several other methods, but they have suitable default implementations.

inclusive-range might be transformed into a class thusly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class InclusiveRangeIterator does Iterator {
    has $.range is required;
    has $!nextval = $!range.start;

    method pull-one() {
        if $!nextval > $!range.stop {
            return IterationEnd;
        }

        # Perl people would probably phrase this:
        # ++$!nextval
        # and they are wrong.
        my $val = $!nextval;
        $!nextval++;
        return $val;
    }
}

class InclusiveRange does Iterable {
    has $.start is required;
    has $.stop is required;

    # Don't even ask
    method new($start, $stop) {
        self.bless(:$start, :$stop);
    }

    method iterator() {
        InclusiveRangeIterator.new(range => self);
    }
}

# 6 7 8 9
for InclusiveRange.new(6, 9) -> $n {
    ...
}

Can we use gather to avoid the need for an extra class, just as in Python? We sure can! The only catch is that Perl 6 iterators don’t also pretend to be iterables (remember, in Python, iter(it) should produce it), so we need to explicitly return a gather block’s iterator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class InclusiveRange does Iterable {
    has $.start is required;
    has $.stop is required;

    # Don't even ask
    method new($start, $stop) {
        self.bless(:$start, :$stop);
    }

    method iterator() {
        gather {
            my $val = $!start;
            while $val <= $!stop {
                take $val;
                $val++;
            }
        }.iterator;  # <- this is important
    }
}

For sequences, Perl 6 has the Seq type. Curiously, even an infinite lazy gather is still a Seq. Indexing and length are not part of Seq — both are implemented as separate methods.

Curiously, even though Perl 6 became much stricter overall, the indexing methods don’t seem to be part of a role; you only need define them, much like Python’s __dunder__ methods. In fact, the preceding examples, does Iterator isn’t necessary at all; the for block will blindly try to call an iterator method and doesn’t much care where it came from.

I’m sure there are plenty of cute tricks possible with Perl 6, but, er, I’ll leave those as an exercise for the reader.

Ruby

Ruby is a popular and well-disguised Perl variant, if Perl just went completely all-in on Smalltalk. It has no C-style for, but it does have an infinite loop block and a very Python-esque for:

1
2
3
for value in sequence do
    ...
end

Nobody uses this. No, really, the core language documentation outright says:

The for loop is rarely used in modern ruby programs.

Instead, you’ll probably see this:

1
2
3
sequence.each do |value|
    ...
end

It doesn’t look it, but this is completely backwards from everything seen so far. All of these other languages have used external iterators, where an object is repeatedly asked to produce values and calling code can do whatever it wants with them. Here, something very different is happening. The entire do ... end block acts as a closure whose argument is value; it’s passed to the each method, which calls it once for each value in the sequence. This is an internal iterator.

Pass a block to a function which can then call it a lot” is a built-in syntactic feature of Ruby, so these kinds of iterators are fairly common. The upside is that they look almost like a custom block, so they fit naturally with the language. The downside is that all of these block-accepting methods are implemented on Array, rather than as generic functions: bsearch, bsearch_index, collect, collect!, combination, count, cycle, delete, delete_if, drop_while, each, each_index, fetch, fill, find_index, index, keep_if, map, map!, permutation, product, reject, reject!, repeated_combination, repeated_permutation, reverse_each, rindex, select, select!, sort, sort!, sort_by!, take_while, uniq, uniq!, zip. Some of those, as well as a number of additional methods, are provided by the Enumerable mixin which can express them in terms of each. I suppose the other upside is that any given type can provide its own more efficient implementation of these methods, if it so desires.

I guess that huge list of methods answers most questions about how to iterate over indices or in reverse. The only bit missing is that .. range syntax exists in Ruby as well, and it produces Range objects which also have an each method. If you don’t care about each index, you can also use the cute 3.times method.

Ruby blocks are a fundamental part of the language and built right into the method-calling syntax. Even break is defined in terms of blocks, and it works with an argument!

1
2
3
4
# This just doesn't feel like it should work, but it does.  Prints 17.
# Braces are conventionally used for inline blocks, but do/end would work too.
primes = [2, 3, 5, 7, 11, 13, 17, 19]
puts primes.each { |p| break p if p > 16 }

each() doesn’t need to do anything special here; break will just cause its return value to be 17. Somehow. (Honestly, this is the sort of thing that makes me wary of Ruby; it seems so ad-hoc and raises so many questions. A language keyword that changes the return value of a different function? Does the inside of each() know about this or have any control over it? How does it actually work? Is there any opportunity for cleanup? I have no idea, and the documentation doesn’t seem to think this is worth commenting on.)

Using blocks

Anyway, with block-passing as a language feature, the “iterator protocol” is pretty straightforward: just write a method that takes a block.

1
2
3
4
5
def each
    for value in self do
        yield value
    end
end

Be careful! Though it’s handy for iteration, that yield is not the same as Python’s yield. Ruby’s yield calls the passed-in block — yields control to the caller — with the given value(s).

I pulled a dirty trick there, because I expressed each in terms of for. So how does for work? Well, ah, it just delegates to each. Oops!

How, then, do you write an iterator completely from scratch? The obvious way is to use yield repeatedly. That gives you something that looks rather a lot like Python, though it doesn’t actually pause execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class InclusiveRange
    # This gets you a variety of other iteration methods, all defined in
    # terms of each()
    include Enumerable

    def initialize(start, stop)
        @start = start
        @stop = stop
    end
    def each
        val = @start
        while val <= @stop do
            yield val
            val += 1
        end
    end
end

# 6 7 8 9
# A `for` loop would also work here
InclusiveRange.new(6, 9).each do |n|
    ...
end

Enumerators

Well, that’s nice for creating a whole collection type, but what if I want an ad-hoc custom iterator? Enter the Enumerator class, which allows you to create… ah, enumerators.

Note that the relationship between Enumerable and Enumerator is not the same as the relationship between “iterable” and “iterator”. Most importantly, neither is really an interface. Enumerable is a set of common iteration methods that any collection type may want to have, and it expects an each to exist. Enumerator is a generic collection type, and in fact mixes in Enumerable. Maybe I should just show you some code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def inclusive_range(start, stop)
    Enumerator.new do |y|
        val = start
        while val <= stop do
            y.yield val
            val += 1
        end
    end
end

# 6 7 8 9
inclusive_range(6, 9).each do |n|
    puts n
end

Enumerator turns a block into a fully-fledged data stream. The block is free to do whatever it wants, and whenever it wants to emit a value, it calls y.yield value. The y argument is a “yielder” object, an opaque magic type; y.yield is a regular method call, unrelated to the yield keyword. (y << value is equivalent; << is Ruby’s “append” operator. And also, yes, bit shift.)

The amazing bit is that you can do this:

1
2
# 6
puts inclusive_range(6, 9).first

Enumerator has all of the Enumerable methods, one of which is first. So, that’s nice.

The really amazing bit is that if you stick some debugging code into the block passed to Enumerator.new, you’ll find that… the values are produced lazily. That call to first() doesn’t generate the full sequence and then discard everything after the first item; it only generates the first item, then stops.

(Beware! The values are produced lazily, but many Enumerable methods are eager. I’ll get back to this in a moment.)

Hang on, didn’t I say yield doesn’t pause execution? Didn’t I also say the above yield is just a method call, not the keyword?

I did! And I wasn’t lying. The really truly amazing bit, which I’ve seen shockingly little excitement about while researching this, is that under the hood, this is all using Fibers. Coroutines.

Enumerator.new takes a block and turns it into a coroutine. Every time something wants a value from the enumerator, it resumes the coroutine. The yielder object’s yield method then calls Fiber.yield() to pause the coroutine. It works just like Lua, but it’s designed to work with existing Ruby conventions, like the piles of internal iteration methods developers expect to find.

So Enumerator.new can produce Python-style generators, albeit in a slightly un-native-looking way. There’s also one other significant difference: an Enumerator can restart itself for each method called on it, simply by calling the block again. This code will print 6 three times:

1
2
3
4
ir = inclusive_range(6, 9)
puts ir.first
puts ir.first
puts ir.first

For something like an inclusive range object, that’s pretty nice. For something like a file, maybe not so nice. It also means you need to be sure to put your setup code inside the block passed to Enumerator.new, or funny things will happen when the block is restarted.

Something like generators

But wait, there’s more. Specifically, this common pattern, which pretty much lets you ignore Enumerator.new entirely.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def some_iterator_method
    # __method__ is the current method name.  block_given? is straightforward.
    return enum_for(__method__) unless block_given?

    # An extremely accurate simulation of a large list.
    (1..1000).each do |item|
        puts "having a look at #{item}"
        # Blocks are invisible to `yield`; this will yield to the block passed
        # to some_iterator_method.
        yield item if item.even?
    end
end

# having a look at 1
# having a look at 2
# 2
puts some_iterator_method.first

Okay, bear with me.

First, some_iterator_method() is called. It doesn’t have a block attached, so block_given? is false, and it returns enum_for(...), whatever that does. Then first() is called on the result, and that produces a single element and stops.

The above code has no magic yielder object. It uses the straightforward yield keyword. Why doesn’t it loop over the entire range from 1 to 1000?

Remember, Enumerator uses coroutines under the hood. One neat thing coroutines can do is pause code that doesn’t know it’s in a coroutine. Python’s generators pause themselves with yield, and the mere presence of yield turns a function into a generator; but in Lua or Ruby or any other language with coroutines, any function can pause at any time. You can even make a closure that pauses, then pass that closure to another function which calls it, without that function ever knowing anything happened.

(This arguably has some considerable downsides as well — it becomes difficult to know when or where your code might pause, which makes reasoning about the order of operations much harder. That’s why Python and some other languages opted to implement async IO with an await keyword — anyone reading the code knows that it can only pause where an await appears.)

(Also, I’m saying “pause” here instead of “yield” because Ruby has really complicated the hell out of this by already having a yield keyword that does something totally different, and naming its coroutine pause function yield.)

Anyway, that’s exactly what’s happening here. enum_for returns an Enumerator that wraps the whole method. (It doesn’t need to know self, because enum_for is actually a method inherited from Object, goodness gracious.) When the Enumerator needs some items, it calls the method a second time with its own block, running in a coroutine, just like a block passed to Enumerator.new. Eventually the method emits a value using the regular old yield keyword, and that value reaches the block created by Enumerator, and that block pauses the call stack. It doesn’t matter that Range.each is eager, because its iteration is still happening in code somewhere, and that code is part of a call stack in a coroutine, so it can be paused. Eventually the coroutine is no longer useful and gets thrown away, so the eager each call simply stops midway through its work, unaware that anything unusual ever happened.

In fact, despite being an Object method, enum_for isn’t special at all. It can be expressed in pure Ruby very easily:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def my_enum_for(receiver, method)
    # Enumerator.new creates a coroutine-as-iteration-source, as above.
    Enumerator.new do |y|
        # All it does is call the named method with a trivial block.  Every
        # time the method produces a value with the `yield` keyword, we pass it
        # along to the yielder object, which pauses the coroutine.
        # This is nothing more than a bridge between "yield" in the Ruby block
        # sense, and "yield" in the coroutine sense.
        receiver.send method do |value|
            y.yield value
        end
    end
end

So, that’s pretty neat. Incidentally, several built-in methods like Array.each and Enumerable.collect act like this, returning an Enumerator if called with no arguments.

Full laziness

I mentioned above that while an Enumerator fetches items lazily, many of the methods are eager. To clarify what I mean by that, consider:

1
2
3
4
5
inclusive_range(6, 9000).collect {
    |n|
    puts "considering #{n}"
    "a" * n
}.first(3)

collect() is one of those common Enumerable methods. You might know it by its other name, map(). Ruby is big on multiple names for the same thing: one that everyone uses in practice, and another that people who don’t use Ruby will actually recognize.

Even though this code ultimately only needs three items, and even though there’s all this coroutine machinery happening under the hood, this still evaluates the entire range. Why?

The problem is that collect() has always returned an array, and is generally expected to continue doing so. It has no way of knowing that it’s about to be fed into first. Rather than violate this API, Ruby added a new method, Enumerable.lazy. This stops after three items:

1
2
3
4
5
inclusive_range(6, 9000).lazy.collect {
    |n|
    puts "considering #{n}"
    "a" * n
}.first(3)

All this does is return an Enumerator::Lazy object, which has lazy implementations of various methods that would usually do a full iteration. Methods like first(3) are still “eager” (in the sense that they just return an array), since their results have a fixed finite size.

This seems a little clunky to me, since the end result is still an object with a collect method that doesn’t return an array. I suspect the real reason is just that Enumerator was added first; even though the coroutine support was already there, Enumerator::Lazy only came along later. Changing existing eager methods to be lazy can, ah, cause problems.

The only built-in type that seems to have interesting lazy behavior is Range, which can be infinite.

1
2
3
4
# Whoops, infinite loop.
(1..Float::INFINITY).select { |n| n.even? }.first(5)
# 2 4 6 8 10
(1..Float::INFINITY).lazy.select { |n| n.even? }.first(5)

A loose end

I think the only remaining piece of this puzzle is something I stumbled upon but can’t explain. Enumerator has a next method, which returns the next value or raises StopIteration.

Wow, that sounds awfully familiar.

But I can’t find anything in the language or standard library that uses this, with one single and boring exception: the loop construct. It catches StopIteration and exits the block.

1
2
3
4
5
6
enumerator = [1, 2, 3].each
loop do
    while true do
        puts enumerator.next
    end
end

On the fourth call, next() will be out of items, so it raises StopIteration. Removing the loop block makes this quite obvious.

That’s it. That’s the only use of it in the language, as far as I can tell. It seems almost… vestigial. It’s also a little weird, since it keeps the current iteration state inside the Enumerator, unlike any of its other methods. But it’s also the only form of external iteration that I know of in Ruby, and that’s handy to have sometimes.

And, uh, so on

I intended to foray into a few more languages, including some recent lower-level friends like C++/Rust/Swift, but this post somehow spiraled out of control and hit nine thousand words. No one has read this far.

Handily, it turns out that the above languages pretty much cover the basic ways of approaching iteration; if any of this made sense, other languages will probably seem pretty familiar.

  • C++’s iteration protocol(s) has existed for a long time in the form of ++it to advance an iterator and *it to read the current item, though this was usually written manually in a C-style for loop, and loops were generally terminated with an explicit endpoint.

    C++11 added the range-based for, which does basically the same stuff under the hood. Idiomatic C++ is inscrutible, but maybe you can make sense of this project which provides optionally-infinite iterable ranges.

  • Rust has an entire (extremely well-documented) iter module with numerous iterators and examples of how to create your own. The core of the Iterator trait is just a next method which returns None when exhausted. It also has a lot of handy Ruby-like chainable methods, so working directly with iterators is more common in Rust than in Python.

  • Swift also has (well-documented) simple next-based iterators, though these return nil when exhausted, which means (like Lua) that an iterator cannot produce nil as a value. (This isn’t the case with Rust, where next returns an Option<T> — a valid None would be returned as Some(None).)

I could probably keep finding more languages indefinitely, so I’m gonna take a break from this now.

Bringing the Viewer In: The Video Opportunity in Virtual Reality

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/151940036881

By Satender Saroha, Video Engineering

Virtual reality (VR) 360° videos are the next frontier of how we engage with and consume content. Unlike a traditional scenario in which a person views a screen in front of them, VR places the user inside an immersive experience. A viewer is “in” the story, and not on the sidelines as an observer.

Ivan Sutherland, widely regarded as the father of computer graphics, laid out the vision for virtual reality in his famous speech, “Ultimate Display” in 1965 [1]. In that he said, “You shouldn’t think of a computer screen as a way to display information, but rather as a window into a virtual world that could eventually look real, sound real, move real, interact real, and feel real.”

Over the years, significant advancements have been made to bring reality closer to that vision. With the advent of headgear capable of rendering 3D spatial audio and video, realistic sound and visuals can be virtually reproduced, delivering immersive experiences to consumers.

When it comes to entertainment and sports, streaming in VR has become the new 4K HEVC/UHD of 2016. This has been accelerated by the release of new camera capture hardware like GoPro and streaming capabilities such as 360° video streaming from Facebook and YouTube. Yahoo streams lots of engaging sports, finance, news, and entertainment video content to tens of millions of users. The opportunity to produce and stream such content in 360° VR opens a unique opportunity to Yahoo to offer new types of engagement, and bring the users a sense of depth and visceral presence.

While this is not an experience that is live in product, it is an area we are actively exploring. In this blog post, we take a look at what’s involved in building an end-to-end VR streaming workflow for both Live and Video on Demand (VOD). Our experiments and research goes from camera rig setup, to video stitching, to encoding, to the eventual rendering of videos on video players on desktop and VR headsets. We also discuss challenges yet to be solved and the opportunities they present in streaming VR.

1. The Workflow

Yahoo’s video platform has a workflow that is used internally to enable streaming to an audience of tens of millions with the click of a few buttons. During experimentation, we enhanced this same proven platform and set of APIs to build a complete 360°/VR experience. The diagram below shows the end-to-end workflow for streaming 360°/VR that we built on Yahoo’s video platform.

Figure 1: VR Streaming Workflow at Yahoo

1.1. Capturing 360° video

In order to capture a virtual reality video, you need access to a 360°-capable video camera. Such a camera uses either fish-eye lenses or has an array of wide-angle lenses to collectively cover a 360 (θ) by 180 (ϕ) sphere as shown below.

Though it sounds simple, there is a real challenge in capturing a scene in 3D 360° as most of the 360° video cameras offer only 2D 360° video capture.

In initial experiments, we tried capturing 3D video using two cameras side-by-side, for left and right eyes and arranging them in a spherical shape. However this required too many cameras – instead we use view interpolation in the stitching step to create virtual cameras.

Another important consideration with 360° video is the number of axes the camera is capturing video with. In traditional 360° video that is captured using only a single-axis (what we refer as horizontal video), a user can turn their head from left to right. But this setup of cameras does not support a user tilting their head at 90°.

To achieve true 3D in our setup, we went with 6-12 GoPro cameras having 120° field of view (FOV) arranged in a ring, and an additional camera each on top and bottom, with each one outputting 2.7K at 30 FPS.

1.2. Stitching 360° video

Projection Layouts

Because a 360° view is a spherical video, the surface of this sphere needs to be projected onto a planar surface in 2D so that video encoders can process it. There are two popular layouts:

Equirectangular layout: This is the most widely-used format in computer graphics to represent spherical surfaces in a rectangular form with an aspect ratio of 2:1. This format has redundant information at the poles which means some pixels are over-represented, introducing distortions at the poles compared to the equator (as can be seen in the equirectangular mapping of the sphere below).

Figure 2: Equirectangular Layout [2]

CubeMap layout: CubeMap layout is a format that has also been used in computer graphics. It contains six individual 2D textures that map to six sides of a cube. The figure below is a typical cubemap representation. In a cubemap layout, the sphere is projected onto six faces and the images are folded out into a 2D image, so pieces of a video frame map to different parts of a cube, which leads to extremely efficient compact packing. Cubemap layouts require about 25% fewer pixels compared to equirectangular layouts.

Figure 3: CubeMap Layout [3]

Stitching Videos

In our setup, we experimented with a couple of stitching softwares. One was from Vahana VR [4], and the other was a modified version of the open-source Surround360 technology that works with a GoPro rig [5]. Both softwares output equirectangular panoramas for the left and the right eye. Here are the steps involved in stitching together a 360° image:

Raw frame image processing: Converts uncompressed raw video data to RGB, which involves several steps starting from black-level adjustment, to applying Demosaic algorithms in order to figure out RGB color parts for each pixel based on the surrounding pixels. This also involves gamma correction, color correction, and anti vignetting (undoing the reduction in brightness on the image periphery). Finally, this stage applies sharpening and noise-reduction algorithms to enhance the image and suppress the noise.

Calibration: During the calibration step, stitching software takes steps to avoid vertical parallax while stitching overlapping portions in adjacent cameras in the rig. The purpose is to align everything in the scene, so that both eyes see every point at the same vertical coordinate. This step essentially matches the key points in images among adjacent camera pairs. It uses computer vision algorithms for feature detection like Binary Robust Invariant Scalable Keypoints (BRISK) [6] and AKAZE [7].

Optical Flow: During stitching, to cover the gaps between adjacent real cameras and provide interpolated view, optical flow is used to create virtual cameras. The optical flow algorithm finds the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It uses OpenCV algorithms to find the optical flow [8].

Below are the frames produced by the GoPro camera rig:

Figure 4: Individual frames from 12-camera rig

Figure 5: Stitched frame output with PtGui

Figure 6: Stitched frame with barrel distortion using Surround360

Figure 7: Stitched frame after removing barrel distortion using Surround360

To get the full depth in stereo, the rig is set-up so that i = r * sin(FOV/2 – 360/n). where:

  • i = IPD/2 where IPD is the inter-pupillary distance between eyes.\
  • r = Radius of the rig.
  • FOV = Field of view of GoPro cameras, 120 degrees.
  • n = Number of cameras which is 12 in our setup.

Given IPD is normally 6.4 cms, i should be greater than 3.2 cm. This implies that with a 12-camera setup, the radius of the the rig comes to 14 cm(s). Usually, if there are more cameras it is easier to avoid black stripes.

Reducing Bandwidth – FOV-based adaptive transcoding

For a truly immersive experience, users expect 4K (3840 x 2160) quality resolution at 60 frames per second (FPS) or higher. Given typical HMDs have a FOV of 120 degrees, a full 360° video needs a resolution of at least 12K (11520 x 6480). 4K streaming needs a bandwidth of 25 Mbps [9]. So for 12K resolution, this effectively translates to > 75 Mbps and even more for higher framerates. However, average wifi in US has bandwidth of 15 Mbps [10].

One way to address the bandwidth issue is by reducing the resolution of areas that are out of the field of view. Spatial sub-sampling is used during transcoding to produce multiple viewport-specific streams. Each viewport-specific stream has high resolution in a given viewport and low resolution in the rest of the sphere.

On the player side, we can modify traditional adaptive streaming logic to take into account field of view. Depending on the video, if the user moves his head around a lot, it could result in multiple buffer fetches and could result in rebuffering. Ideally, this will work best in videos where the excessive motion happens in one field of view at a time and does not span across multiple fields of view at the same time. This work is still in an experimental stage.

The default output format from stitching software of both Surround360 and Vahana VR is equirectangular format. In order to reduce the size further, we pass it through a cubemap filter transform integrated into ffmpeg to get an additional pixel reduction of ~25%  [11] [12].

At the end of above steps, the stitching pipeline produces high-resolution stereo 3D panoramas which are then ingested into the existing Yahoo Video transcoding pipeline to produce multiple bit-rates HLS streams.

1.3. Adding a stitching step to the encoding pipeline

Live – In order to prepare for multi-bitrate streaming over the Internet, a live 360° video-stitched stream in RTMP is ingested into Yahoo’s video platform. A live Elemental encoder was used to re-encode and package the live input into multiple bit-rates for adaptive streaming on any device (iOS, Android, Browser, Windows, Mac, etc.)

Video on Demand – The existing Yahoo video transcoding pipeline was used to package multiple bit-rates HLS streams from raw equirectangular mp4 source videos.

1.4. Rendering 360° video into the player

The spherical video stream is delivered to the Yahoo player in multiple bit rates. As a user changes their viewing angle, different portion of the frame are shown, presenting a 360° immersive experience. There are two types of VR players currently supported at Yahoo:

WebVR based Javascript Player – The Web community has been very active in enabling VR experiences natively without plugins from within browsers. The W3C has a Javascript proposal [13], which describes support for accessing virtual reality (VR) devices, including sensors and head-mounted displays on the Web. VR Display is the main starting point for all the device APIs supported. Some of the key interfaces and attributes exposed are:

  • VR Display Capabilities: It has attributes to indicate position support, orientation support, and has external display.
  • VR Layer: Contains the HTML5 canvas element which is presented by VR Display when its submit frame is called. It also contains attributes defining the left bound and right bound textures within source canvas for presenting to an eye.
  • VREye Parameters: Has information required to correctly render a scene for given eye. For each eye, it has offset the distance from middle of the user’s eyes to the center point of one eye which is half of the interpupillary distance (IPD). In addition, it maintains the current FOV of the eye, and the recommended renderWidth and render Height of each eye viewport.
  • Get VR Displays: Returns a list of VR Display(s) HMDs accessible to the browser.

We implemented a subset of webvr spec in the Yahoo player (not in production yet) that lets you watch monoscopic and stereoscopic 3D video on supported web browsers (Chrome, Firefox, Samsung), including Oculus Gear VR-enabled phones. The Yahoo player takes the equirectangular video and maps its individual frames on the Canvas javascript element. It uses the webGL and Three.JS libraries to do computations for detecting the orientation and extracting the corresponding frames to display.

For web devices which support only monoscopic rendering like desktop browsers without HMD, it creates a single Perspective Camera object specifying the FOV and aspect ratio. As the device’s requestAnimationFrame is called it renders the new frames. As part of rendering the frame, it first calculates the projection matrix for FOV and sets the X (user’s right), Y (Up), Z (behind the user) coordinates of the camera position.

For devices that support stereoscopic rendering like mobile phones from Samsung Gear, the webvr player creates two PerspectiveCamera objects, one for the left eye and one for the right eye. Each Perspective camera queries the VR device capabilities to get the eye parameters like FOV, renderWidth and render Height every time a frame needs to be rendered at the native refresh rate of HMD. The key difference between stereoscopic and monoscopic is the perceived sense of depth that the user experiences, as the video frames separated by an offset are rendered by separate canvas elements to each individual eye.

Cardboard VR – Google provides a VR sdk for both iOS and Android [14]. This simplifies common VR tasks like-lens distortion correction, spatial audio, head tracking, and stereoscopic side-by-side rendering. For iOS, we integrated Cardboard VR functionality into our Yahoo Video SDK, so that users can watch stereoscopic 3D videos on iOS using Google Cardboard.

2. Results

With all the pieces in place, and experimentation done, we were able to successfully do a 360° live streaming of an internal company-wide event.

Figure 8: 360° Live streaming of Yahoo internal event

In addition to demonstrating our live streaming capabilities, we are also experimenting with showing 360° VOD videos produced with a GoPro-based camera rig. Here is a screenshot of one of the 360° videos being played in the Yahoo player.

Figure 9: Yahoo Studios produced 360° VOD content in the Yahoo Player

3. Challenges and Opportunities

3.1. Enormous amounts of data

As we alluded to in the video processing section of this post, delivering 4K resolution videos for each eye for each FOV at a high frame-rate remains a challenge. While FOV-adaptive streaming does reduce the size by providing high resolution streams separately for each FOV, providing an impeccable 60 FPS or more viewing experience still requires a lot more data than the current internet pipes can handle. Some of the other possible options which we are closely paying attention to are:

Compression efficiency with HEVC and VP9 – New codecs like HEVC and VP9 have the potential to provide significant compression gains. HEVC open source codecs like x265 have shown a 40% compression performance gain compared to the currently ubiquitous H.264/AVC codec. LIkewise, a VP9 codec from Google has shown similar 40% compression performance gains. The key challenge is the hardware decoding support and the browser support. But with Apple and Microsoft very much behind HEVC and Firefox and Chrome already supporting VP9, we believe most browsers would support HEVC or VP9 within a year.

Using 10 bit color depth vs 8 bit color depth – Traditional monitors support 8 bpc (bits per channel) for displaying images. Given each pixel has 3 channels (RGB), 8 bpc maps to 256x256x256 color/luminosity combinations to represent 16 million colors. With 10 bit color depth, you have the potential to represent even more colors. But the biggest stated advantage of using 10 bit color depth is with respect to compression during encoding even if the source only uses 8 bits per channel. Both x264 and x265 codecs support 10 bit color depth, with ffmpeg already supporting encoding at 10 bit color depth.

3.2. Six degrees of freedom

With current camera rig workflows, users viewing the streams through HMD are able to achieve three degrees of Freedom (DoF) i.e., the ability to move up/down, clockwise/anti-clockwise, and swivel. But you still can’t get a different perspective when you move inside it i.e., move forward/backward. Until now, this true six DoF immersive VR experience has only been possible in CG VR games. In video streaming, LightField technology-based video cameras produced by Lytro are the first ones to capture light field volume data from all directions [15]. But Lightfield-based videos require an order of magnitude more data than traditional fixed FOV, fixed IPD, fixed lense camera rigs like GoPro. As bandwidth problems get resolved via better compressions and better networks, achieving true immersion should be possible.

4. Conclusion

VR streaming is an emerging medium and with the addition of 360° VR playback capability, Yahoo’s video platform provides us a great starting point to explore the opportunities in video with regard to virtual reality. As we continue to work to delight our users by showing immersive video content, we remain focused on optimizing the rendering of high-quality 4K content in our players. We’re looking at building FOV-based adaptive streaming capabilities and better compression during delivery. These capabilities, and the enhancement of our webvr player to play on more HMDs like HTC Vive and Oculus Rift, will set us on track to offer streaming capabilities across the entire spectrum. At the same time, we are keeping a close watch on advancements in supporting spatial audio experiences, as well as advancements in the ability to stream volumetric lightfield videos to achieve true six degrees of freedom, with the aim of realizing the true potential of VR.

Glossary – VR concepts:

VR – Virtual reality, commonly referred to as VR, is an immersive computer-simulated reality experience that places viewers inside an experience. It “transports” viewers from their physical reality into a closed virtual reality. VR usually requires a headset device that takes care of sights and sounds, while the most-involved experiences can include external motion tracking, and sensory inputs like touch and smell. For example, when you put on VR headgear you suddenly start feeling immersed in the sounds and sights of another universe, like the deck of the Star Trek Enterprise. Though you remain physically at your place, VR technology is designed to manipulate your senses in a manner that makes you truly feel as if you are on that ship, moving through the virtual environment and interacting with the crew.

360 degree video – A 360° video is created with a camera system that simultaneously records all 360 degrees of a scene. It is a flat equirectangular video projection that is morphed into a sphere for playback on a VR headset. A standard world map is an example of equirectangular projection, which maps the surface of the world (sphere) onto orthogonal coordinates.

Spatial Audio – Spatial audio gives the creator the ability to place sound around the user. Unlike traditional mono/stereo/surround audio, it responds to head rotation in sync with video. While listening to spatial audio content, the user receives a real-time binaural rendering of an audio stream [17].

FOV – A human can naturally see 170 degrees of viewable area (field of view). Most consumer grade head mounted displays HMD(s) like Oculus Rift and HTC Vive now display 90 degrees to 120 degrees.

Monoscopic video – A monoscopic video means that both eyes see a single flat image, or video file. A common camera setup involves six cameras filming six different fields of view. Stitching software is used to form a single equirectangular video. Max output resolution on 2D scopic videos on Gear VR is 3480×1920 at 30 frames per second.

Presence – Presence is a kind of immersion where the low-level systems of the brain are tricked to such an extent that they react just as they would to non-virtual stimuli.

Latency – It’s the time between when you move your head, and when you see physical updates on the screen. An acceptable latency is anywhere from 11 ms (for games) to 20 ms (for watching 360 vr videos).

Head Tracking – There are two forms:

  • Positional tracking – movements and related translations of your body, eg: sway side to side.
  • Traditional head tracking – left, right, up, down, roll like clock rotation.

References:

[1] Ultimate Display Speech as reminisced by Fred Brooks: http://www.roadtovr.com/fred-brooks-ivan-sutherlands-1965-ultimate-display-speech/

[2] Equirectangular Layout Image: https://www.flickr.com/photos/[email protected]/10111691364/

[3] CubeMap Layout: http://learnopengl.com/img/advanced/cubemaps_skybox.png

[4] Vahana VR: http://www.video-stitch.com/

[5] Surround360 Stitching software: https://github.com/facebook/Surround360

[6] Computer Vision Algorithm BRISK: https://www.robots.ox.ac.uk/~vgg/rg/papers/brisk.pdf

[7] Computer Vision Algorithm AKAZE: http://docs.opencv.org/3.0-beta/doc/tutorials/features2d/akaze_matching/akaze_matching.html

[8] Optical Flow: http://docs.opencv.org/trunk/d7/d8b/tutorial_py_lucas_kanade.html

[9] 4K connection speeds: https://help.netflix.com/en/node/306

[10] Average connection speeds in US: https://www.akamai.com/us/en/about/news/press/2016-press/akamai-releases-fourth-quarter-2015-state-of-the-internet-report.jsp

[11] CubeMap transform filter for ffmpeg: https://github.com/facebook/transform

[12] FFMPEG software: https://ffmpeg.org/

[13] WebVR Spec: https://w3c.github.io/webvr/

[14] Google Daydream SDK: https://vr.google.com/cardboard/developers/

[15] Lytro LightField Volume for six DoF: https://www.lytro.com/press/releases/lytro-immerge-the-worlds-first-professional-light-field-solution-for-cinematic-vr

[16] 10 bit color depth: https://gist.github.com/l4n9th4n9/4459997

Python FAQ: Why should I use Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/

Part of my Python FAQ, which is doomed to never be finished.

The short answer is: because it’s the actively-developed version of the language, and you should use it for the same reason you’d use 2.7 instead of 2.6.

If you’re here, I’m guessing that’s not enough. You need something to sweeten the deal. Well, friend, I have got a whole mess of sugar cubes just for you.

And once you’re convinced, you may enjoy the companion article, how to port to Python 3! It also has some more details on the diffences between Python 2 and 3, whereas this article doesn’t focus too much on the features removed in Python 3.

Some background

If you aren’t neck-deep in Python, you might be wondering what the fuss is all about, or why people keep telling you that Python 3 will set your computer on fire. (It won’t.)

Python 2 is a good language, but it comes with some considerable baggage. It has two integer types; it may or may not be built in a way that completely mangles 16/17 of the Unicode space; it has a confusing mix of lazy and eager functional tools; it has a standard library that takes “batteries included” to lengths beyond your wildest imagination; it boasts strong typing, then casually insists that None < 3 < "2"; overall, it’s just full of little dark corners containing weird throwbacks to the days of Python 1.

(If you’re really interested, Nick Coghlan has written an exhaustive treatment of the slightly different question of why Python 3 was created. This post is about why Python 3 is great, so let’s focus on that.)

Fixing these things could break existing code, whereas virtually all code written for 2.0 will still work on 2.7. So Python decided to fix them all at once, producing a not-quite-compatible new version of the language, Python 3.

Nothing like this has really happened with a mainstream programming language before, and it’s been a bit of a bumpy ride since then. Python 3 was (seemingly) designed with the assumption that everyone would just port to Python 3, drop Python 2, and that would be that. Instead, it’s turned out that most libraries want to continue to run on both Python 2 and Python 3, which was considerably difficult to make work at first. Python 2.5 was still in common use at the time, too, and it had none of the helpful backports that showed up in Python 2.6 and 2.7; likewise, Python 3.0 didn’t support u'' strings. Writing code that works on both 2.5 and 3.0 was thus a ridiculous headache.

The porting effort also had a dependency problem: if your library or app depends on library A, which depends on library B, which depends on C, which depends on D… then none of those projects can even think about porting until D’s porting effort is finished. Early days were very slow going.

Now, though, things are looking brighter. Most popular libraries work with Python 3, and those that don’t are working on it. Python 3’s Unicode handling, one of its most contentious changes, has had many of its wrinkles ironed out. Python 2.7 consists largely of backported Python 3 features, making it much simpler to target 2 and 3 with the same code — and both 2.5 and 2.6 are no longer supported.

Don’t get me wrong, Python 2 will still be around for a while. A lot of large applications have been written for Python 2 — think websites like Yelp, YouTube, Reddit, Dropbox — and porting them will take some considerable effort. I happen to know that at least one of those websites was still running 2.6 last year, years after 2.6 had been discontinued, if that tells you anything about the speed of upgrades for big lumbering software.

But if you’re just getting started in Python, or looking to start a new project, there aren’t many reasons not to use Python 3. There are still some, yes — but unless you have one specifically in mind, they probably won’t affect you.

I keep having Python beginners tell me that all they know about Python 3 is that some tutorial tried to ward them away from it for vague reasons. (Which is ridiculous, since especially for beginners, Python 2 and 3 are fundamentally not that different.) Even the #python IRC channel has a few people who react, ah, somewhat passive-aggressively towards mentions of Python 3. Most of the technical hurdles have long since been cleared; it seems like one of the biggest roadblocks now standing in the way of Python 3 adoption is the community’s desire to sabotage itself.

I think that’s a huge shame. Not many people seem to want to stand up for Python 3, either.

Well, here I am, standing up for Python 3. I write all my new code in Python 3 now — because Python 3 is great and you should use it. Here’s why.

Hang on, let’s be real for just a moment

None of this is going to 💥blow your mind💥. It’s just a programming language. I mean, the biggest change to Python 2 in the last decade was probably the addition of the with statement, which is nice, but hardly an earth-shattering innovation. The biggest changes in Python 3 are in the same vein: they should smooth out some points of confusion, help avoid common mistakes, and maybe give you a new toy to play with.

Also, if you’re writing a library that needs to stay compatible with Python 2, you won’t actually be able to use any of this stuff. Sorry. In that case, the best reason to port is so application authors can use this stuff, rather than citing your library as the reason they’re trapped on Python 2 forever. (But hey, if you’re starting a brand new library that will blow everyone’s socks off, do feel free to make it Python 3 exclusive.)

Application authors, on the other hand, can go wild.

Unicode by default

Let’s get the obvious thing out of the way.

In Python 2, there are two string types: str is a sequence of bytes (which I would argue makes it not a string), and unicode is a sequence of Unicode codepoints. A literal string in source code is a str, a bytestring. Reading from a file gives you bytestrings. Source code is assumed ASCII by default. It’s an 8-bit world.

If you happen to be an English speaker, it’s very easy to write Python 2 code that seems to work perfectly, but chokes horribly if fed anything outside of ASCII. The right thing involves carefully specifying encodings everywhere and using u'' for virtually all your literal strings, but that’s very tedious and easily forgotten.

Python 3 reshuffles this to put full Unicode support front and center.

Most obviously, the str type is a real text type, similar to Python 2’s unicode. Literal strings are still str, but now that makes them Unicode strings. All of the “structural” strings — names of types, functions, modules, etc. — are likewise Unicode strings. Accordingly, identifiers are allowed to contain any Unicode “letter” characters. repr() no longer escapes printable Unicode characters, though there’s a new ascii() (and corresponding !a format cast and %a placeholder) that does. Unicode completely pervades the language, for better or worse.

And just for the record: this is way better. It is so much better. It is incredibly better. Do you know how much non-ASCII garbage I type? Every single em dash in this damn post was typed by hand, and Python 2 would merrily choke on them.

Source files are now assumed to be UTF-8 by default, so adding an em dash in a comment will no longer break your production website. (I have seen this happen.) You’re still free to specify another encoding explicitly if you want, using a magic comment.

There is no attempted conversion between bytes and text, as in Python 2; b'a' + 'b' is a TypeError. Some modules require you to know what you’re dealing with: zlib.compress only accepts bytes, because zlib is defined in terms of bytes; json.loads only accepts str, because JSON is defined in terms of Unicode codepoints. Calling str() on some bytes will defer to repr, producing something like "b'hello'". (But see -b and -bb below.) Overall it’s pretty obvious when you’ve mixed bytes with text.

Oh, and two huge problem children are fixed: both the csv module and urllib.parse (formerly urlparse) can handle text. If you’ve never tried to make those work, trust me, this is miraculous.

I/O does its best to make everything Unicode. On Unix, this is a little hokey, since the filesystem is explicitly bytes with no defined encoding; Python will trust the various locale environment variables, which on most systems will make everything UTF-8. The default encoding of text-mode file I/O is derived the same way and thus usually UTF-8. (If it’s not what you expect, run locale and see what you get.) Files opened in binary mode, with a 'b', will still read and write bytes.

Python used to come in “narrow” and “wide” builds, where “narrow” builds actually stored Unicode as UTF-16, and this distinction could leak through to user code in subtle ways. On a narrow build, unichr(0x1F4A3) raises ValueError, and the length of u'💣' is 2. Surprise! Maybe your code will work on someone else’s machine, or maybe it won’t. Python 3.3 eliminated narrow builds.

I think those are the major points. For the most part, you should be able to write code as though encodings don’t exist, and the right thing will happen more often. And the wrong thing will immediately explode in your face. It’s good for you.

If you work with binary data a lot, you might be frowning at me at this point; it was a bit of a second-class citizen in Python 3.0. I think things have improved, though: a number of APIs support both bytes and text, the bytes-to-bytes codec issue has largely been resolved, we have bytes.hex() and bytes.fromhex(), bytes and bytearray both support % now, and so on. They’re listening!

Refs: Python 3.0 release notes; myriad mentions all over the documentation

Backported features

Python 3.0 was released shortly after Python 2.6, and a number of features were then backported to Python 2.7. You can use these if you’re only targeting Python 2.7, but if you were stuck with 2.6 for a long time, you might not have noticed them.

  • Set literals:

    1
    {1, 2, 3}
    
  • Dict and set comprehensions:

    1
    2
    {word.lower() for word in words}
    {value: key for (key, value) in dict_to_invert.items()}
    
  • Multi-with:

    1
    2
    with open("foo") as f1, open("bar") as f2:
        ...
    
  • print is now a function, with a couple bells and whistles added: you can change the delimiter with the sep argument, you can change the terminator to whatever you want (including nothing) with the end argument, and you can force a flush with the flush argument. In Python 2.6 and 2.7, you still have to opt into this with from __future__ import print_function.

  • The string representation of a float now uses the shortest decimal number that has the same underlying value — for example, repr(1.1) was '1.1000000000000001' in Python 2.6, but is just '1.1' in Python 2.7 and 3.1+, because both are represented the same way in a 64-bit float.

  • collections.OrderedDict is a dict-like type that remembers the order of its keys.

    Note that you cannot do OrderedDict(a=1, b=2), because the constructor still receives its keyword arguments in a regular dict, losing the order. You have to pass in a sequence of 2-tuples or assign keys one at a time.

  • collections.Counter is a dict-like type for counting a set of things. It has some pretty handy operations that allow it to be used like a multiset.

  • The entire argparse module is a backport from 3.2.

  • str.format learned a , formatting specifier for numbers, which always uses commas and groups of three digits. This is wrong for many countries, and the correct solution involves using the locale module, but it’s useful for quick output of large numbers.

  • re.sub, re.subn, and re.split accept a flags argument. Minor, but, thank fucking God.

Ref: Python 2.7 release notes

Iteration improvements

Everything is lazy

Python 2 has a lot of pairs of functions that do the same thing, except one is eager and one is lazy: range and xrange, map and itertools.imap, dict.keys and dict.iterkeys, and so on.

Python 3.0 eliminated all of the lazy variants and instead made the default versions lazy. Iterating over them works exactly the same way, but no longer creates an intermediate list — for example, range(1000000000) won’t eat all your RAM. If you need to index them or store them for later, you can just wrap them in list(...).

Even better, the dict methods are now “views“. You can keep them around, and they’ll reflect any changes to the underlying dict. They also act like sets, so you can do a.keys() & b.keys() to get the set of keys that exist in both dicts.

Refs: dictionary view docs; Python 3.0 release notes

Unpacking

Unpacking got a huge boost. You could always do stuff like this in Python 2:

1
a, b, c = range(3)  # a = 0, b = 1, c = 2

Python 3.0 introduces:

1
2
a, b, *c = range(5)  # a = 0, b = 1, c = [2, 3, 4]
a, *b, c = range(5)  # a = 0, b = [1, 2, 3], c = 4

Python 3.5 additionally allows use of the * and ** unpacking operators in literals, or multiple times in function calls:

1
2
3
4
5
print(*range(3), *range(3))  # 0 1 2 0 1 2

x = [*range(3), *range(3)]  # x = [0, 1, 2, 0, 1, 2]
y = {*range(3), *range(3)}  # y = {0, 1, 2}  (it's a set, remember!)
z = {**dict1, **dict2}  # finally, syntax for dict merging!

Refs: Python 3.0 release notes; PEP 3132; Python 3.5 release notes; PEP 448

yield from

yield from is an extension of yield. Where yield produces a single value, yield from yields an entire sequence.

1
2
3
4
5
def flatten(*sequences):
    for seq in sequences:
        yield from seq

list(flatten([1, 2], [3, 4]))  # [1, 2, 3, 4]

Of course, for a simple example like that, you could just do some normal yielding in a for loop. The magic of yield from is that it can also take another generator or other lazy iterable, and it’ll effectively pause the current generator until the given one has been exhausted. It also takes care of passing values back into the generator using .send() or .throw().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def foo():
    a = yield 1
    b = yield from bar(a)
    print("foo got back", b)
    yield 4

def bar(a):
    print("in bar", a)
    x = yield 2
    y = yield 3
    print("leaving bar")
    return x + y

gen = foo()
val = None
while True:
    try:
        newval = gen.send(val)
    except StopIteration:
        break
    print("yielded", newval)
    val = newval * 10

# yielded 1
# in bar 10
# yielded 2
# yielded 3
# leaving bar
# foo got back 50
# yielded 4

Oh yes, and you can now return a value from a generator. The return value becomes the result of a yield from, or if the caller isn’t using yield from, it’s available as the argument to the StopIteration exception.

A small convenience, perhaps. The real power here isn’t in the use of generators as lazy iterators, but in the use of generators as coroutines.

A coroutine is a function that can “suspend” itself, like yield does, allowing other code to run until the function is resumed. It’s kind of like an alternative to threading, but only one function is actively running at any given time, and that function has to delierately relinquish control (or end) before anything else can run.

Generators could do this already, more or less, but only one stack frame deep. That is, you can yield from a generator to suspend it, but if the generator calls another function, that other function has no way to suspend the generator. This is still useful, but significantly less powerful than the coroutine functionality in e.g. Lua, which lets any function yield anywhere in the call stack.

With yield from, you can create a whole chain of generators that yield from one another, and as soon as the one on the bottom does a regular yield, the entire chain will be suspended.

This laid the groundwork for making the asyncio module possible. I’ll get to that later.

Refs: docs; Python 3.3 release notes; PEP 380

Syntactic sugar

Keyword-only arguments

Python 3.0 introduces “keyword-only” arguments, which must be given by name. As a corollary, you can now accept a list of args and have more arguments afterwards. The full syntax now looks something like this:

1
2
def foo(a, b=None, *args, c=None, d, **kwargs):
    ...

Here, a and d are required, b and c are optional. c and d must be given by name.

1
2
3
4
5
6
7
8
foo(1)                      # TypeError: missing d
foo(1, 2)                   # TypeError: missing d
foo(d=4)                    # TypeError: missing a
foo(1, d=4)                 # a = 1, d = 4
foo(1, 2, d=4)              # a = 1, b = 2, d = 4
foo(1, 2, 3, d=4)           # a = 1, b = 2, args = (3,), d = 4
foo(1, 2, c=3, d=4)         # a = 1, b = 2, c = 3, d = 4
foo(1, b=2, c=3, d=4, e=5)  # a = 1, b = 2, c = 3, d = f, kwargs = {'e': 5}

This is extremely useful for functions with a lot of arguments, functions with boolean arguments, functions that accept *args (or may do so in the future) but also want some options, etc. I use it a lot!

If you want keyword-only arguments, but you don’t want to accept *args, you just leave off the variable name:

1
2
def foo(*, arg=None):
    ...

Refs: Python 3.0 release notes; PEP 3102

Format strings

Python 3.6 (not yet out) will finally bring us string interpolation, more or less, using the str.format() syntax:

1
2
3
a = 0x133
b = 0x352
print(f"The answer is {a + b:04x}.")

It’s pretty much the same as str.format(), except that instead of a position or name, you can give an entire expression. The formatting suffixes with : still work, the special built-in conversions like !r still work, and __format__ is still invoked.

Refs: docs; Python 3.6 release notes; PEP 498

async and friends

Right, so, about coroutines.

Python 3.4 introduced the asyncio module, which offers building blocks for asynchronous I/O (and bringing together the myriad third-party modules that do it already).

The design is based around coroutines, which are really generators using yield from. The idea, as I mentioned above, is that you can create a stack of generators that all suspend at once:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@coroutine
def foo():
    # do some stuff
    yield from bar()
    # do more stuff

@coroutine
def bar():
    # do some stuff
    response = yield from get_url("https://eev.ee/")
    # do more stuff

When this code calls get_url() (not actually a real function, but see aiohttp), get_url will send a request off into the æther, and then yield. The entire stack of generators — get_url, bar, and foo — will all suspend, and control will return to whatever first called foo, which with asyncio will be an “event loop”.

The event loop’s entire job is to notice that get_url yielded some kind of “I’m doing a network request” thing, remember it, and resume other coroutines in the meantime. (Or just twiddle its thumbs, if there’s nothing else to do.) When a response comes back, the event loop will resume get_url and send it the response. get_url will do some stuff and return it up to bar, who continues on, none the wiser that anything unusual happened.

The magic of this is that you can call get_url several times, and instead of having to wait for each request to completely finish before the next one can even start, you can do other work while you’re waiting. No threads necessary; this is all one thread, with functions cooperatively yielding control when they’re waiting on some external thing to happen.

Now, notice that you do have to use yield from each time you call another coroutine. This is nice in some ways, since it lets you see exactly when and where your function might be suspended out from under you, which can be important in some situations. There are also arguments about why this is bad, and I don’t care about them.

However, yield from is a really weird phrase to be sprinkling all over network-related code. It’s meant for use with iterables, right? Lists and tuples and things. get_url is only one thing. What are we yielding from it? Also, what’s this @coroutine decorator that doesn’t actually do anything?

Python 3.5 smoothed over this nonsense by introducing explicit syntax for these constructs, using new async and await keywords:

1
2
3
4
5
6
7
8
9
async def foo():
    # do some stuff
    await bar()
    # do more stuff

async def bar():
    # do some stuff
    response = await get_url("https://eev.ee/")
    # do more stuff

async def clearly identifies a coroutine, even one that returns immediately. (Before, you’d have a generator with no yield, which isn’t actually a generator, which causes some problems.) await explains what’s actually happening: you’re just waiting for another function to be done.

async for and async with are also available, replacing some particularly clumsy syntax you’d need to use before. And, handily, you can only use any of these things within an async def.

The new syntax comes with corresponding new special methods like __await__, whereas the previous approach required doing weird things with __iter__, which is what yield from ultimately calls.

I could fill a whole post or three with stuff about asyncio, and can’t possibly give it justice in just a few paragraphs. The short version is: there’s built-in syntax for doing network stuff in parallel without threads, and that’s cool.

Refs for asyncio: docs (asyncio); Python 3.4 release notes; PEP 3156

Refs for async and await: docs (await); docs (async); docs (special methods); Python 3.5 release notes; PEP 492

Function annotations

Function arguments and return values can have annotations:

1
2
def foo(a: "hey", b: "what's up") -> "whoa":
    ...

The annotations are accessible via the function’s __annotations__ attribute. They have no special meaning to Python, so you’re free to experiment with them.

Well…

You were free to experiment with them, but the addition of the typing module (mentioned below) has hijacked them for type hints. There’s no clear way to attach a type hint and some other value to the same argument, so you’ll have a tough time making function annotations part of your API.

There’s still no hard requirement that annotations be used exclusively for type hints (and it’s not like Python does anything with type hints, either), but the original PEP suggests it would like that to be the case someday. I guess we’ll see.

If you want to see annotations preserved for other uses as well, it would be a really good idea to do some creative and interesting things with them as soon as possible. Just saying.

Refs: docs; Python 3.0 release notes; PEP 3107

Matrix multiplication

Python 3.5 learned a new infix operator for matrix multiplication, spelled @. It doesn’t do anything for any built-in types, but it’s supported in NumPy. You can implement it yourself with the __matmul__ special method and its r and i variants.

Shh. Don’t tell anyone, but I suspect there are fairly interesting things you could do with an operator called @ — some of which have nothing to do with matrix multiplication at all!

Refs: Python 3.5 release notes; PEP 465

Ellipsis

... is now valid syntax everywhere. It evaluates to the Ellipsis singleton, which does nothing. (This exists in Python 2, too, but it’s only allowed when slicing.)

It’s not of much practical use, but you can use it to indicate an unfinished stub, in a way that’s clearly not intended to be final but will still parse and run:

1
2
3
class ReallyComplexFiddlyThing:
    # fuck it, do this later
    ...

Refs: docs; Python 3.0 release notes

Enhanced exceptions

A slightly annoying property of Python 2’s exception handling is that if you want to do your own error logging, or otherwise need to get at the traceback, you have to use the slightly funky sys.exc_info() API and carry the traceback around separately. As of Python 3.0, exceptions automatically have a __traceback__ attribute, as well as a .with_traceback() method that sets the traceback and returns the exception itself (so you can use it inline).

This makes some APIs a little silly — __exit__ still accepts the exception type and value and traceback, even though all three are readily available from just the exception object itself.

A much more annoying property of Python 2’s exception handling was that custom exception handling would lose track of where the problem actually occurred. Consider the following call stack.

1
2
3
4
5
A
B
C
D
E

Now say an exception happens in E, and it’s caught by code like this in C.

1
2
3
4
try:
    D()
except Exception as e:
    raise CustomError("Failed to call D")

Because this creates and raises a new exception, the traceback will start from this point and not even mention E. The best workaround for this involves manually creating a traceback between C and E, formatting it as a string, and then including that in the error message. Preposterous.

Python 3.0 introduced exception chaining, which allows you to do this:

1
raise CustomError("Failed to call D") from e

Now, if this exception reaches the top level, Python will format it as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Traceback (most recent call last):
File C, blah blah
File D, blah blah
File E, blah blah
SomeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File A, blah blah
File B, blah blah
File C, blah blah
CustomError: Failed to call D

The best part is that you don’t need to explicitly say from e at all — if you do a plain raise while there’s already an active exception, Python will automatically chain them together. Even internal Python exceptions will have this behavior, so a broken exception handler won’t lose the original exception. (In the implicit case, the intermediate text becomes “During handling of the above exception, another exception occurred:”.)

The chained exception is stored on the new exception as either __cause__ (if from an explicit raise ... from) or __context__ (if automatic).

If you direly need to hide the original exception, Python 3.3 introduced raise ... from None.

Speaking of exceptions, the error messages for missing arguments have been improved. Python 2 does this:

1
TypeError: foo() takes exactly 1 argument (0 given)

Python 3 does this:

1
TypeError: foo() missing 1 required positional argument: 'a'

Refs:

Cooler classes

super() with no arguments

You can call super() with no arguments. It Just Works. Hallelujah.

Also, you can call super() with no arguments. That’s so great that I could probably just fill the rest of this article with it and be satisfied.

Did I mention you can call super() with no arguments?

Refs: docs; Python 3.0 release notes; PEP 3135

New metaclass syntax and kwargs for classes

Compared to that, everything else in this section is going to sound really weird and obscure.

For example, __metaclass__ is gone. It’s now a keyword-only argument to the class statement.

1
2
class Foo(metaclass=FooMeta):
    ...

That doesn’t sound like much, right? Just some needless syntax change that makes porting harder, right?? Right??? Haha nope watch this because it’s amazing but it barely gets any mention at all.

1
2
class Foo(metaclass=FooMeta, a=1, b=2, c=3):
    ...

You can include arbitrary keyword arguments in the class statement, and they will be passed along to the metaclass call as keyword arguments. (You have to catch them in both __new__ and __init__, since they always get the same arguments.) (Also, the class statement now has the general syntax of a function call, so you can put *args and **kwargs in it.)

This is pretty slick. Consider SQLAlchemy, which uses a metaclass to let you declare a table with a class.

1
2
3
4
class SomeTable(TableBase):
    __tablename__ = 'some_table'
    id = Column()
    ...

Note that SQLAlchemy has you put the name of the table in the clumsy __tablename__ attribute, which it invented. Why not just name? Well, because then you couldn’t declare a column called name! Any “declarative” metaclass will have the same problem of separating the actual class contents from configuration. Keyword arguments offer an easy way out.

1
2
3
4
# only hypothetical, alas
class SomeTable(TableBase, name='some_table'):
    id = Column()
    ...

Refs: docs; Python 3.0 release notes; PEP 3115

__prepare__

Another new metaclass feature is the introduction of the __prepare__ method.

You may have noticed that the body of a class is just a regular block, which can contain whatever code you want. Before decorators were a thing, you’d actually declare class methods in two stages:

1
2
3
4
class Foo:
    def do_the_thing(cls):
        ...
    do_the_thing = classmethod(do_the_thing)

That’s not magical class-only syntax; that’s just regular code assigning to a variable. You can put ifs and fors and whiles and dels inside a class body, too; you just don’t see it very often because there aren’t very many useful reasons to do it.

A class body is a kind of weird pseudo-scope. It can create locals, and it can read values from outer scopes, but methods don’t see the class body as an outer scope. Once the class body reaches its end, any remaining locals are passed to the type constructor and become the new class’s attributes. (This is why, for example, you can’t refer to a class directly within its own body — the class doesn’t and can’t exist until after the body has executed.)

All of this is to say: __prepare__ is a new hook that returns the dict the class body’s locals go into.

Maybe that doesn’t sound particularly interesting, but consider: the value you return doesn’t have to be an actual dict. It can be anything that understands __setitem__. You could, say, use an OrderedDict, and keep track of the order your attributes were declared. That’s useful for declarative metaclasses, where the order of attributes may be important (consider a C struct).

But you can go further. You might allow more than one attribute of the same name. You might do something special with the attributes as soon as they’re assigned, rather than at the end of the body. You might predeclare some attributes. __prepare__ is passed the class’s kwargs, so you might alter the behavior based on those.

For a nice practical example, consider the new enum module, which I briefly mention later on. One drawback of this module is that you have to specify a value for every variant, since variants are defined as class attributes, which must have a value. There’s an example of automatic numbering, but it still requires assigning a dummy value like (). Clever use of __prepare__ would allow lifting this restriction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# XXX: Should prefer MutableMapping here, but the ultimate call to type()
# raises a TypeError if you pass a namespace object that doesn't inherit
# from dict!  Boo.
class EnumLocals(dict):
    def __init__(self):
        self.nextval = 1

    def __getitem__(self, key):
        if key not in self and not key.startswith('_') and not key.endswith('_'):
            self[key] = self.nextval
            self.nextval += 1
        return super().__getitem__(key)

class EnumMeta(type):
    @classmethod
    def __prepare__(meta, name, bases):
        return EnumLocals()

class Colors(metaclass=EnumMeta):
    red
    green
    blue

print(Colors.red, Colors.green, Colors.blue)
# 1 2 3

Deciding whether this is a good idea is left as an exercise.

This is an exceptionally obscure feature that gets very little attention — it’s not even mentioned explicitly in the 3.0 release notes — but there’s nothing else like it in the language. Between __prepare__ and keyword arguments, the class statement has transformed into a much more powerful and general tool for creating all kinds of objects. I almost wish it weren’t still called class.

Refs: docs; Python 3.0 release notes; PEP 3115

Attribute definition order

If that’s still too much work, don’t worry: a proposal was just accepted for Python 3.6 that makes this even easier. Now every class will have a __definition_order__ attribute, a tuple listing the names of all the attributes assigned within the class body, in order. (To make this possible, the default return value of __prepare__ will become an OrderedDict, but the __dict__ attribute will remain a regular dict.)

Now you don’t have to do anything at all: you can always check to see what order any class’s attributes were defined in.


Additionally, descriptors can now implement a __set_name__ method. When a class is created, any descriptor implementing the method will have it called with the containing class and the name of the descriptor.

I’m very excited about this, but let me try to back up. A descriptor is a special Python object that can be used to customize how a particular class attribute works. The built-in property decorator is a descriptor.

1
2
3
4
5
6
class MyClass:
    foo = SomeDescriptor()

c = MyClass()
c.foo = 5  # calls SomeDescriptor.__set__!
print(c.foo)  # calls SomeDescriptor.__get__!

This is super cool and can be used for all sorts of DSL-like shenanigans.

Now, most descriptors ultimately want to store a value somewhere, and the obvious place to do that is in the object’s __dict__. Above, SomeDescriptor might want to store its value in c.__dict__['foo'], which is fine since Python will still consult the descriptor first. If that weren’t fine, it could also use the key '_foo', or whatever. It probably wants to use its own name somehow, because otherwise… what would happen if you had two SomeDescriptors in the same class?

Therein lies the problem, and one of my long-running and extremely minor frustrations with Python. Descriptors have no way to know their own name! There are only really two solutions to this:

  1. Require the user to pass the name in as an argument, too: foo = SomeDescriptor('foo'). Blech!

  2. Also have a metaclass (or decorator, or whatever), which can iterate over all the class’s attributes, look for SomeDescriptor objects, and tell them what their names are. Needing a metaclass means you can’t make general-purpose descriptors meant for use in arbitrary classes; a decorator would work, but boy is that clumsy.

Both of these suck and really detract from what could otherwise be very neat-looking syntax trickery.

But now! Now, when MyClass is created, Python will have a look through its attributes. If it sees that the foo object has a __set_name__ method, it’ll call that method automatically, passing it both the owning class and the name 'foo'! Huzzah!

This is so great I am so happy you have no idea.


Lastly, there’s now an __init_subclass__ class method, which is called when the class is subclassed. A great many metaclasses exist just to do a little bit of work for each new subclass; now, you don’t need a metaclass at all in many simple cases. You want a plugin registry? No problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Plugin:
    _known_plugins = {}

    def __init_subclass__(cls, *, name, **kwargs):
        cls._known_plugins[name] = cls
        super().__init_subclass__(**kwargs)

    @classmethod
    def get_plugin(cls, name):
        return cls._known_plugins[name]

    # ...probably some interface stuff...

class FooPlugin(Plugin, name="foo"):
    ...

No metaclass needed at all.

Again, none of this stuff is available yet, but it’s all slated for Python 3.6, due out in mid-December. I am super pumped.

Refs: docs (customizing class creation); docs (descriptors); Python 3.6 release notes; PEP 520 (attribute definition order); PEP 487 (__init_subclass__ and __set_name__)

Math stuff

int and long have been merged, and there is no longer any useful distinction between small and very large integers. I’ve actually run into code that breaks if you give it 1 instead of 1L, so, good riddance. (Python 3.0 release notes; PEP 237)

The / operator always does “true” division, i.e., gives you a float. If you want floor division, use //. Accordingly, the __div__ magic method is gone; it’s split into two parts, __truediv__ and __floordiv__. (Python 3.0 release notes; PEP 238)

decimal.Decimal, fractions.Fraction, and floats now interoperate a little more nicely: numbers of different types hash to the same value; all three types can be compared with one another; and most notably, the Decimal and Fraction constructors can accept floats directly. (docs (decimal); docs (fractions); Python 3.2 release notes)

math.gcd returns the greatest common divisor of two integers. This existed before, but was in the fractions module, where nobody knew about it. (docs; Python 3.5 release notes)

math.inf is the floating-point infinity value. Previously, this was only available by writing float('inf'). There’s also a math.nan, but let’s not? (docs; Python 3.5 release notes)

math.isclose (and the corresponding complex version, cmath.isclose) determines whether two values are “close enough”. Intended to do the right thing when comparing floats. (docs; Python 3.5 release notes; PEP 485)

More modules

The standard library has seen quite a few improvements. In fact, Python 3.2 was developed with an explicit syntax freeze, so it consists almost entirely of standard library enhancements. There are far more changes across six and a half versions than I can possibly list here; these are the ones that stood out to me.

The module shuffle

Python 2, rather inexplicably, had a number of top-level modules that were named after the single class they contained, CamelCase and all. StringIO and SimpleHTTPServer are two obvious examples. In Python 3, the StringIO class lives in io (along with BytesIO), and SimpleHTTPServer has been renamed to http.server. If you’re anything like me, you’ll find this deeply satisfying.

Wait, wait, there’s a practical upside here. Python 2 had several pairs of modules that did the same thing with the same API, but one was pure Python and one was much faster C: pickle/cPickle, profile/cProfile, and StringIO/cStringIO. I’ve seen code (cough, older versions of Babel, cough) that spent a considerable amount of its startup time reading pickles with the pure Python version, because it did the obvious thing and used the pickle module. Now, these pairs have been merged: importing pickle gives you the faster C implementation, importing profile gives you the faster C implementation, and BytesIO/StringIO are the fast C implementations in the io module.

Refs: docs (sort of); Python 3.0 release notes; PEP 3108 (exhaustive list of removed and renamed modules)

Additions to existing modules

A number of file format modules, like bz2 and gzip, went through some cleanup and modernization in 3.2 through 3.4: some learned a more straightforward open function, some gained better support for the bytes/text split, and several learned to use their file types as context managers (i.e., with with).

collections.ChainMap is a mapping type that consults some number of underlying mappings in order, allowing for a “dict with defaults” without having to merge them together. (docs; Python 3.3 release notes)

configparser dropped its ridiculous distinction between ConfigParser and SafeConfigParser; there is now only ConfigParser, which is safe. The parsed data now preserves order by default and can be read or written using normal mapping syntax. Also there’s a fancier alternative interpolation parser. (docs; Python 3.2 release notes)

contextlib.ContextDecorator is some sort of devilry that allows writing a context manager which can also be used as a decorator. It’s used to implement the @contextmanager decorator, so those can be used as decorators as well. (docs; Python 3.2 release notes)

contextlib.ExitStack offers cleaner and more fine-grained handling of multiple context managers, as well as resources that don’t have their own context manager support. (docs; Python 3.3 release notes)

contextlib.suppress is a context manager that quietly swallows a given type of exception. (docs; Python 3.4 release notes)

contextlib.redirect_stdout is a context manager that replaces sys.stdout for the duration of a block. (docs; Python 3.4 release notes)

datetime.timedelta already existed, of course, but now it supports being multiplied and divided by numbers or divided by other timedeltas. The upshot of this is that timedelta finally, finally has a .total_seconds() method which does exactly what it says on the tin. (docs; Python 3.2 release notes)

datetime.timezone is a new concrete type that can represent fixed offsets from UTC. There has long been a datetime.tzinfo, but it was a useless interface, and you were left to write your own actual class yourself. datetime.timezone.utc is a pre-existing instance that represents UTC, an offset of zero. (docs; Python 3.2 release notes)

functools.lru_cache is a decorator that caches the results of a function, keyed on the arguments. It also offers cache usage statistics and a method for emptying the cache. (docs; Python 3.2 release notes)

functools.partialmethod is like functools.partial, but the resulting object can be used as a descriptor (read: method). (docs; Python 3.4 release notes)

functools.singledispatch allows function overloading, based on the type of the first argument. (docs; Python 3.4 release notes; PEP 443)

functools.total_ordering is a class decorator that allows you to define only __eq__ and __lt__ (or any other) and defines the other comparison methods in terms of them. Note that since Python 3.0, __ne__ is automatically the inverse of __eq__ and doesn’t need defining. Note also that total_ordering doesn’t correctly support NotImplemented until Python 3.4. For an even easier way to do this, consider my classtools.keyed_ordering decorator. (docs; Python 3.2 release notes)

inspect.getattr_static fetches an attribute like getattr but avoids triggering dynamic lookup like @property. (docs; Python 3.2 release notes)

inspect.signature fetches the signature of a function as the new and more featureful Signature object. It also knows to follow the __wrapped__ attribute set by functools.wraps since Python 3.2, so it can see through well-behaved wrapper functions to the “original” signature. (docs; Python 3.3 release notes; PEP 362)

The logging module can, finally, use str.format-style string formatting by passing style='{' to Formatter. (docs; Python 3.2 release notes)

The logging module spits warnings and higher to stderr if logging hasn’t been otherwise configured. This means that if your app doesn’t use logging, but it uses a library that does, you’ll get actual output rather than the completely useless “No handlers could be found for logger ‘foo’”. (docs; Python 3.2 release notes)

os.scandir lists the contents of a directory while avoiding stat calls as much as possible, making it significantly faster. (docs; Python 3.5 release notes; PEP 471)

re.fullmatch checks for a match against the entire input string, not just a substring. (docs; Python 3.4 release notes)

reprlib.recursive_repr is a decorator for __repr__ implementations that can detect recursive calls to the same object and replace them with ..., just like the built-in structures. Believe it or not, reprlib is an existing module, though in Python 2 it was called repr. (docs; Python 3.2 release notes)

shutil.disk_usage returns disk space statistics for a given path with no fuss. (docs; Python 3.3 release notes)

shutil.get_terminal_size tries very hard to detect the size of the terminal window. (docs; Python 3.3 release notes)

subprocess.run is a new streamlined function that consolidates several other helpers in the subprocess module. It returns an object that describes the final state of the process, and it accepts arguments for a timeout, requiring that the process return success, and passing data as stdin. This is now the recommended way to run a single subprocess. (docs; Python 3.5 release notes)

tempfile.TemporaryDirectory is a context manager that creates a temporary directory, then destroys it and its contents at the end of the block. (docs; Python 3.2 release notes)

textwrap.indent can add an arbitrary prefix to every line in a string. (docs; Python 3.3 release notes)

time.monotonic returns the value of a monotonic clock — i.e., it will never go backwards. You should use this for measuring time durations within your program; using time.time() will produce garbage results if the system clock changes due to DST, a leap second, NTP, manual intervention, etc. (docs; Python 3.3 release notes; PEP 418)

time.perf_counter returns the value of the highest-resolution clock available, but is only suitable for measuring a short duration. (docs; Python 3.3 release notes; PEP 418)

time.process_time returns the total system and user CPU time for the process, excluding sleep. Note that the starting time is undefined, so only durations are meaningful. (docs; Python 3.3 release notes; PEP 418)

traceback.walk_stack and traceback.walk_tb are small helper functions that walk back along a stack or traceback, so you can use simple iteration rather than the slightly clumsier linked-list approach. (docs; Python 3.5 release notes)

types.MappingProxyType offers a read-only proxy to a dict. Since it holds a reference to the dict in C, you can return MappingProxyType(some_dict) to effectively create a read-only dict, as the original dict will be inaccessible from Python code. This is the same type used for the __dict__ of an immutable object. Note that this has existed in various forms for a while, but wasn’t publicly exposed or documented; see my module dictproxyhack for something that does its best to work on every Python version. (docs; Python 3.3 release notes)

types.SimpleNamespace is a blank type for sticking arbitrary unstructed attributes to. Previously, you would have to make a dummy subclass of object to do this. (docs; Python 3.3 release notes)

weakref.finalize allows you to add a finalizer function to an arbitrary (weakrefable) object from the “outside”, without needing to add a __del__. The finalize object will keep itself alive, so there’s no need to hold onto it. (docs; Python 3.4 release notes)

New modules with backports

These are less exciting, since they have backports on PyPI that work in Python 2 just as well. But they came from Python 3 development, so I credit Python 3 for them, just like I credit NASA for inventing the microwave.

asyncio is covered above, but it’s been backported as trollius for 2.6+, with the caveat that Pythons before 3.3 don’t have yield from and you have to use yield From(...) as a workaround. That caveat means that third-party asyncio libraries will almost certainly not work with trollius! For this and other reasons, the maintainer is no longer supporting it. Alas. Guess you’ll have to upgrade to Python 3, then.

enum finally provides an enumeration type, something which has long been desired in Python and solved in myriad ad-hoc ways. The variants become instances of a class, can be compared by identity, can be converted between names and values (but only explicitly), can have custom methods, and can implement special methods as usual. There’s even an IntEnum base class whose values end up as subclasses of int (!), making them perfectly compatible with code expecting integer constants. Enums have a surprising amount of power, far more than any approach I’ve seen before; I heartily recommend that you skim the examples in the documentation. Backported as enum34 for 2.4+. (docs; Python 3.4 release notes; PEP 435)

ipaddress offers types for representing IPv4 and IPv6 addresses and subnets. They can convert between several representations, perform a few set-like operations on subnets, identify special addresses, and so on. Backported as ipaddress for 2.6+. (There’s also a py2-ipaddress, but its handling of bytestrings differs from Python 3’s built-in module, which is likely to cause confusing compatibility problems.) (docs; Python 3.3 release notes; PEP 3144)

pathlib provides the Path type, representing a filesystem path that you can manipulate with methods rather than the mountain of functions in os.path. It also overloads / so you can do path / 'file.txt', which is kind of cool. PEP 519 intends to further improve interoperability of Paths with classic functions for the not-yet-released Python 3.6. Backported as pathlib2 for 2.6+; there’s also a pathlib, but it’s no longer maintained, and I don’t know what happened there. (docs; Python 3.4 release notes; PEP 428)

selectors (created as part of the work on asyncio) attempts to wrap select in a high-level interface that doesn’t make you want to claw your eyes out. A noble pursuit. Backported as selectors34 for 2.6+. (docs; Python 3.4 release notes)

statistics contains a number of high-precision statistical functions. Backported as backports.statistics for 2.6+. (docs; Python 3.4 release notes; PEP 450)

unittest.mock provides multiple ways for creating dummy objects, temporarily (with a context manager or decorator) replacing an object or some of its attributes, and verifying that some sequence of operations was performed on a dummy object. I’m not a huge fan of mocking so much that your tests end up mostly testing that your source code hasn’t changed, but if you have to deal with external resources or global state, some light use of unittest.mock can be very handy — even if you aren’t using the rest of unittest. Backported as mock for 2.6+. (docs; Python 3.3, but no release notes)

New modules without backports

Perhaps more exciting because they’re Python 3 exclusive! Perhaps less exciting because they’re necessarily related to plumbing.

faulthandler

faulthandler is a debugging aid that can dump a Python traceback during a segfault or other fatal signal. It can also be made to hook on an arbitrary signal, and can intervene even when Python code is deadlocked. You can use the default behavior with no effort by passing -X faulthandler on the command line, by setting the PYTHONFAULTHANDLER environment variable, or by using the module API manually.

I think -X itself is new as of Python 3.2, though it’s not mentioned in the release notes. It’s reserved for implementation-specific options; there are a few others defined for CPython, and the options can be retrieved from Python code via sys._xoptions.

Refs: docs; Python 3.3 release notes

importlib

importlib is the culmination of a whole lot of work, performed in multiple phases across numerous Python releases, to extend, formalize, and cleanly reimplement the entire import process.

I can’t possibly describe everything the import system can do and what Python versions support what parts of it. Suffice to say, it can do a lot of things: Python has built-in support for importing from zip files, and I’ve seen third-party import hooks that allow transparently importing modules written in another programming language.

If you want to mess around with writing your own custom importer, importlib has a ton of tools for helping you do that. It’s possible in Python 2, too, using the imp module, but that’s a lot rougher around the edges.

If not, the main thing of interest is the import_module function, which imports a module by name without all the really weird semantics of __import__. Seriously, don’t use __import__. It’s so weird. It probably doesn’t do what you think. importlib.import_module even exists in Python 2.7.

Refs: docs; Python 3.3 release notes; PEP 302?

tracemalloc

tracemalloc is another debugging aid which tracks Python’s memory allocations. It can also compare two snapshots, showing how much memory has been allocated or released between two points in time, and who was responsible. If you have rampant memory use issues, this is probably more helpful than having Python check its own RSS.

Technically, tracemalloc can be used with Python 2.7… but that involves patching and recompiling Python, so I hesitate to call it a backport. Still, if you really need it, give it a whirl.

Refs: docs; Python 3.4 release notes; PEP 454

typing

typing offers a standard way to declare type hints — the expected types of arguments and return values. Type hints are given using the function annotation syntax.

Python itself doesn’t do anything with the annotations, though they’re accessible and inspectable at runtime. An external tool like mypy can perform static type checking ahead of time, using these standard types. mypy is an existing project that predates typing (and works with Python 2), but the previous syntax relied on magic comments; typing formalizes the constructs and puts them in the standard library.

I haven’t actually used either the type hints or mypy myself, so I can’t comment on how helpful or intrusive they are. Give them a shot if they sound useful to you.

Refs: docs; Python 3.5 release notes; PEP 484

venv and ensurepip

I mean, yes, of course, virtualenv and pip are readily available in Python 2. The whole point of these is that they are bundled with Python, so you always have them at your fingertips and never have to worry about installing them yourself.

Installing Python should now give you pipX and pipX.Y commands automatically, corresponding to the latest stable release of pip when that Python version was first released. You’ll also get pyvenv, which is effectively just virtualenv.

There’s also a module interface: python -m ensurepip will install pip (hopefully not necessary), python -m pip runs pip with a specific Python version (a feature of pip and not new to the bundling), and python -m venv runs the bundled copy of virtualenv with a specific Python version.

There was a time where these were completely broken on Debian, because Debian strongly opposes vendoring (the rationale being that it’s easiest to push out updates if there’s only one copy of a library in the Debian package repository), so they just deleted ensurepip and venv? Which completely defeated the point of having them in the first place? I think this has been fixed by now, but it might still bite you if you’re on the Ubuntu 14.04 LTS.

Refs: ensurepip docs; pyvenv docs; Python 3.4 release notes; PEP 453

zipapp

zipapp makes it easy to create executable zip applications, which have been a thing since 2.6 but have languished in obscurity. Well, no longer.

This wasn’t particularly difficult before: you just zip up some code, make sure there’s a __main__.py in the root, and pass it to Python. Optionally, you can set it executable and add a shebang line, since the ZIP format ignores any leading junk in the file. That’s basically all zipapp does. (It does not magically infer your dependencies and bundle them as well; you’re on your own there.)

I can’t find a backport, which is a little odd, since I don’t think this module does anything too special.

Refs: docs; Python 3.5 release notes; PEP 441

Miscellaneous nice enhancements

There were a lot of improvements to language semantics that don’t fit anywhere else above, but make me a little happier.

The interactive interpreter does tab-completion by default. I say “by default” because I’ve been told that it was supported before, but you had to do some kind of goat blood sacrifice to get it to work. Also, command history persists between runs. (docs; Python 3.4 release notes)

The -b command-line option produces a warning when calling str() on a bytes or bytearray, or when comparing text to bytes. -bb produces an error. (docs)

The -I command-like option runs Python in “isolated mode”: it ignores all PYTHON* environment variables and leaves the current directory and user site-packages directories off of sys.path. The idea is to use this when running a system script (or in the shebang line of a system script) to insulate it from any weird user-specific stuff. (docs; Python 3.4 release notes)

Functions and classes learned a __qualname__ attribute, which is a dotted name describing (lexically) where they were defined. For example, a method’s __name__ might be foo, but its __qualname__ would be something like SomeClass.foo. Similarly, a class or function defined within another function will list that containing function in its __qualname__. (docs; Python 3.3 release notes; PEP 3155)

Generators signal their end by raising StopIteration internally, but it was also possible to raise StopIteration directly within a generator — most notably, when calling next() on an exhausted iterator. This would cause the generator to end prematurely and silently. Now, raising StopIteration inside a generator will produce a warning, which will become a RuntimeError in Python 3.7. You can opt into the fatal behavior early with from __future__ import generator_stop. (Python 3.5 release notes; PEP 479)

Implicit namespace packages allow a package to span multiple directories. The most common example is a plugin system, foo.plugins.*, where plugins may come from multiple libraries, but all want to share the foo.plugins namespace. Previously, they would collide, and some sys.path tricks were necessary to make it work; now, support is built in. (This feature also allows you to have a regular package without an __init__.py, but I’d strongly recommend still having one.) (Python 3.3 release notes; PEP 420)

Object finalization behaves in less quirky ways when destroying an isolated reference cycle. Also, modules no longer have their contents changed to None during shutdown, which fixes a long-running type of error when a __del__ method tries to call, say, os.path.join() — if you were unlucky, os.path would have already have had its contents replaced with Nones, and you’d get an extremely confusing TypeError from trying to call a standard library function. (Python 3.4 release notes; PEP 442)

str.format_map is like str.format, but it accepts a mapping object directly (instead of having to flatten it with **kwargs). This allows some fancy things that weren’t previously possible, like passing a fake map that creates values on the fly based on the keys looked up in it. (docs; Python 3.2 release notes)

When a blocking system call is interrupted by a signal, it returns EINTR, indicating that the calling code should try the same system call again. In Python, this becomes OSError or InterruptedError. I have never in my life seen any C or Python code that actually deals with this correctly. Now, Python will do it for you: all the built-in and standard library functions that make use of system calls will automatically retry themselves when interrupted. (Python 3.5 release notes; PEP 475)

File descriptors created by Python code are now flagged “non-inheritable”, meaning they’re closed automatically when spawning a child process. (docs; Python 3.4 release notes; PEP 446)

A number of standard library functions now accept file descriptors in addition to paths. (docs; Python 3.3 release notes)

Several different OS and I/O exceptions were merged into a single and more fine-grained hierarchy, rooted at OSError. Code can now catch a specific subclass in most cases, rather than examine .errno. (docs; Python 3.3 release notes; PEP 3151)

ResourceWarning is a new kind of warning for issues with resource cleanup. One is produced if a file object is destroyed, but was never closed, which can cause issues on Windows or with garbage-collected Python implementations like PyPy; one is also produced if uncollectable objects still remain when Python shuts down, indicating some severe finalization problems. The warning is ignored by default, but can be enabled with -W default on the command line. (Python 3.2 release notes)

hasattr() only catches (and returns False for) AttributeErrors. Previously, any exception would be considered a sign that the attribute doesn’t exist, even though an unusual exception like an OSError usually means the attribute is computed dynamically, and that code is broken somehow. Now, exceptions other than AttributeError are allowed to propagate to the caller. (docs; Python 3.2 release notes)

Hash randomization is on by default, meaning that dict and set iteration order is different per Python runs. This protects against some DoS attacks, but more importantly, it spitefully forces you not to rely on incidental ordering. (docs; Python 3.3 release notes)

List comprehensions no longer leak their loop variables into the enclosing scope. (Python 3.0 release notes)

nonlocal allows writing to a variable in an enclosing (but non-global) scope. (docs; Python 3.0 release notes; PEP 3104)

Comparing objects of incompatible types now produces a TypeError, rather than using Python 2’s very silly fallback. (Python 3.0 release notes)

!= defaults to returning the opposite of ==. (Python 3.0 release notes)

Accessing a method as a class attribute now gives you a regular function, not an “unbound method” object. (Python 3.0 release notes)

The input builtin no longer performs an eval (!), removing a huge point of confusion for beginners. This is the behavior of raw_input in Python 2. (docs; Python 3.0 release notes; PEP 3111)

Fast and furious

These aren’t necessarily compelling, and they may not even make any appreciable difference for your code, but I think they’re interesting technically.

Objects’ __dict__s can now share their key storage internally. Instances of the same type generally have the same attribute names, so this provides a modest improvement in speed and memory usage for programs that create a lot of user-defined objects. (Python 3.3 release notes; PEP 412)

OrderedDict is now implemented in C, making it “4 to 100” (!) times faster. Note that the backport in the 2.7 standard library is pure Python. So, there’s a carrot. (Python 3.5 release notes)

The GIL was made more predictable. My understanding is that the old behavior was to yield after some number of Python bytecode operations, which could take wildly varying amounts of time; the new behavior yields after a given duration, by default 5ms. (Python 3.2 release notes)

The io library was rewritten in C, making it more fast. Again, the Python 2.7 implementation is pure Python. (Python 3.1 release notes)

Tuples and dicts containing only immutable objects — i.e., objects that cannot possibly contain circular references — are ignored by the garbage collector. This was backported to Python 2.7, too, but I thought it was super interesting. (Python 3.1 release notes)

That’s all I’ve got

Huff, puff.

I hope something here appeals to you as a reason to at least experiment with Python 3. It’s fun over here. Give it a try.

LPC Audio BoF Notes

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/audio-bof-notes.html

Here are some very short notes from the Audio BoF
at the Linux Plumbers
Conference
in Portland two weeks ago. Sorry for the delay!

Biggest issue discussed was audio routing. On embedded devices this gets
more complex each day, and there are a lot of open questions on the desktop,
too. Different DSP scenarios; how do mixer controls match up with PCM streams
and jack sensing? How do we determine which volume control sliders that are in
the pipeline we are currently interested in? How does that relate to policy
decisions? Format to store audio routing in?

The ALSA scenario subsystem
currently being worked on by Liam Girdwood and the folks at SlimLogic and
currently on its way to being integrated into ALSA proper hopefully helps us,
so that we can strip a lot of complexity related to the routing logic from
PulseAudio and move it into a lower level which naturally knows more about the
hardware’s internal routing.

Does it make sense for some apps to bypass the ALSA userspace layer and
to talk to the kernel drivers via ioctl()s directly?i (i.e. thus not depending on ALSA’s
LISP intepreter, and a lot of other complexities)? Probably yes, but certainly
not in the short term future. Salsa? libsydney?

Should the timing deviation estimation/interpolation be moved from
PulseAudio into the kernel? Might be a good idea. Particularly interesting
when we try to to monitor not only the system and audio clocks, but the video
output and particularly the video input (i.e. video4linux) clocks, too. A
unified kernel-based timing system has advantages in accuracy, allows better
handling of (pseudo-) atomic timing snapshots, and would centralize timing
handling not only between different applications (PA and JACK) but also
between different subsystems. Problem: current timing stuff in PulseAudio
might be a bit too homegrown for moving it 1:1 into the kernel. Also, depends
on FP. Needs someone to push this. Apple does the clock handling in the
kernel. How does this relate to ALSA’s timer API?

Seems Ubuntu is going to kill OSS pretty soon too, following Fedora’s lead. Yay!

And that’s all I have. Should be the biggest points raised. Ping me if I
forgot something.

FOMS/LCA Recap

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/foms-lca-recap.html

Finally, here’s my linux.conf.au 2007 and FOMS 2007
recap. Maybe a little bit late, but better late then never.

FOMS was a very well organized conference with a packed schedule
and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the
attendees without any opposition (at least none was expressed
aloud). After a few “discussions” on a few mailing lists (including
GNOME MLs) and some personal emails I got, I had thought that more
people were in opposition of the idea of having a userspace sound
daemon for the desktop. Apparently, I was overly pessimistic. Good
news, that!

During the FOMS conference we discussed the problems audio on Linux
currently has. One of the major issues still is that we’re lacking a
cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific
and complicated to use. The only real contender is PortAudio. However,
PortAudio has its share of problems and hasn’t reach wide adoption
yet. Right now most larger software projects implement an audio
abstraction layer of some kind, and mostly in a very dirty, simplistic
and limited fasion. MPlayer does, Xine does it, Flash does
it. Everyone does it, and it sucks. (Note: this is only a very short
overview why audio on Linux sucks right now. For a longer one, please
have a look on the first 15mins of my PulseAudio talk at LCA, linked
below.)

Several people were asking why not to make the PulseAudio API the
new “standard” PCM API for Linux. Due to several reasons that would be a
bad idea. First of all, the PulseAudio API cannot be used on anything
else but PulseAudio. While PulseAudio has been ported to Win32, Vista
already has a userspace desktop sound server, hence running PulseAudio
on top of that doesn’t make much sense. Thus the API is not exactly
cross-platform. Secondly, I – as the guy who designed it – am not
happy with the current PulseAudio API. While it is very powerful it is
also very difficult to use and easy to misuse, mostly due to its fully
asynchronous nature. In addition it is also not the exactly smallest
API around.

So, what could be done about this? We agreed on a – maybe –
controversional solution: defining yet another abstracted PCM audio
API. Yes, fixing the problem that we have too many conflicting,
competing sound systems by defining yet another API sounds like a
paradoxon, but I do believe this is the right path to follow. Why?
Because none of the currently available solutions is suitable for all
application areas we have on Linux. Either the current APIs are not
portable, or they are horribly difficult to use properly, or have a
strange license, or are too simple in their functionality. MacOSX
managed to establish a single audio API (CoreAudio) that makes almost
everyone happy on that system – and we should be able to do same for
Linux. Secondly, none of the current APIs has been designed with
network sound servers in mind. However, proper networking support
reflects back into the API, and in a non-trivial way. An API which
works fine in networked environment needs to eliminate roundtrips
where possible, be open for time interpolation and have a flexible
buffering (besides other minor things). Thirdly none of the current
APIs offers enough functionality to properly support all the needs of
modern desktop sound systems, such as per-stream volumes, stream names
and notifications about external state changes.

During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin
(from Xiph) and I sat down and designed a draft API for the
functionality we would like to see in this API. For the time being we
dubbed it libsydney, after the city where we started this
project. I plan to make this the only supported audio API for
PulseAudio, eventually. Thus, if you will code against PulseAudio you
will get cross-platform support for free. In addition, because
PulseAudio is now being integrated into the major distributions (at
least Ubuntu and Fedora), this library will be made available on most
systems through the backdoor.

So, what will this new API offer? Firstly, the buffering model is
much more powerful than of any current sound API. The buffering model
mostly follows PulseAudio’s internal buffering model which
(theoretically) can offer zero-latency streaming and has been
pioneered by Jim Gettys’ AF sound server. It allows you to seek around
in the playback buffer very flexibly. This is very useful to allow
very fast reaction to the user’s playback control commands while still
allowing large buffers, which are good to deal with high network
lag. In addition it is very handy for the programmer, such as when
implementing streaming clients where packets may arrive
out-of-order. The API will emulate this buffering model on top of
traditional audio devices, and when used on top of PulseAudio it will use
its native implementation. The API will also clearly define which
sound formats are guaranteed to be available, thus making it a lot
easier to code without thinking of different hardware supporting
different formats all the time. Of course, the API will be easier to
use than PulseAudio’s current API. It will be very portable, scaling
from FPU-less architectures to pro-audio machines with a massive
number of synchronised channels. There are several modes available to
deal with XRUNs semi-automatically, one of them guaranteeing that the
time axis stays linear and monotonical in all events.

The list of features of this new API is much longer, however,
enough of these grand plans! We didn’t write any real code for this
yet. To make sure that this project is not another one of those which
are announced grandiosely without ever producing any code I will stop
listing features here now. We will eventually publish a first draft of
our C API for public discussion. Stay tuned.

Side-by-side with libsydney I discussed an abstract API
for desktop event sounds with Mikko (i.e. those annoying “bing” sounds
when you click a button and the like). Dubbed libcanberra
(named after the city which one of the developers visited after
Sydney), this will hopefully be for the PulseAudio sample cache API
what libsydney is for the PulseAudio streaming API: a total
replacement.

As a by-product of the libsydney discussion Jean-Marc
coded a
fast C resampling library
supporting both floating point and fixed
point and being licensed under BSD. (In contrast to
libsamplerate which is GPL and floating-point-only, but which
probably has better quality). PulseAudio will make use of this new
library, as will libsydney. And I sincerly hope that ALSA,
GStreamer and other projects replace their crappy home-grown
resamplers with this one!

For PulseAudio I was looking for a CODEC which we could use to
encode audio if we have to transfer it over the network. Such a CODEC
would need to have low CPU requirements and allow low-latency
operation, while providing hifi audio. Compression ratio is not such a
high requirement. Unfortunately, as it seems no such CODEC exists,
especially not a “Free” one. However, the Xiph people recommended to
hack up a special version of FLAC for this task. FLAC is fast, has
(obviously) good quality and if hacked up could provide low-latency
encoding. However, FLAC doesn’t compress that well. Current PulseAudio
thin-client installations require 170kB network bandwidth for each
client if hifi audio is used. Encoding this in FLAC this could cut
this in half. Not perfect, but better than nothing.

So, that was FOMS! FOMS is a definitely highly recommended
conference. If you have the chance to attend next year, don’t miss it!
I’ve never been to a more productive, packed conference in my life!

At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our
talk about Avahi went very well. During my flights to and back from
.au I hacked up avahi-ui
which I also announced during that talk. Also, in related news,
tedp started to work on an implementation of NAT-PMP
(aka “reverse firewall piercing”; both client and server) for
inclusion in Avahi. This will hopefully make the upcoming Wide-Area
DNS support in Avahi much more useful.

linux.conf.au was a very exciting conference. As a speaker
you’re treated like a rock star, with stuff like the speakers dinner,
the speakers adventure (climbing on top of Sydney’s AMP tower) and
the penguin dinner. Heck, the organizers even picked me up at the
airport, something I really didn’t expect when I landed in Sydney,
which however is quite nice after a 27h flight.

Two talks I particularly enjoyed at LCA:

And just for the sake of completeness, here are the links to my presentations:

Ok, that’s it for now. Thanks go to Silvia Pfeiffer, the rest of
the FOMS team and the Seven Team for organizing these two amazing
conferences!