Tag Archives: 8-bit

[$] Breaking Libgcrypt RSA via a side channel

Post Syndicated from jake original https://lwn.net/Articles/727179/rss

A recent paper [PDF] by
a group of eight cryptography researchers shows, once again, how
cryptographic breakthroughs are made. They often start small, with just a
reduction in the strength of a cipher or key search space, say, but then grow
over time to reach the point of a full-on breaking of a cipher or the
implementation of one. In this case, the RSA
implementation in Libgcrypt
for 1024-bit keys has been fully broken using a side-channel
attack
against the operation of the library—2048-bit keys are also
susceptible, but not with the same reliability, at least using this exact
technique.

Why LÖVE?

Post Syndicated from Eevee original https://eev.ee/blog/2017/03/23/why-love/

This month, IndustrialRobot asked my opinion of FOSS game engines — or, more specifically, why I chose LÖVE.

The short version is that it sort of landed in my lap, I tried it, I liked it, and I don’t know of anything I might like better. The long version is…

LÖVE

I’d already made a couple of games (Under Construction, Isaac’s Descent) for the PICO-8, a fantasy 8-bit-ish console powered by Lua. I’ve got a few strong criticisms of Lua, but we’ve formed an uneasy truce. It makes a better JavaScript than JavaScript, at least.

Coming off of those, I was pretty familiar with Lua, so I was already naturally gravitating towards LÖVE. I also knew one or two people who’d used it before, which helped. And it had the faint ring of familiarity, which I’ve since realized is only because I’d once seen a trailer for Love, the procedurally-generated adventure MMO with zero relationship to the LÖVE engine.

Hmm. Perhaps not the most compelling criteria.

I stayed with LÖVE because it hits a very nice sweet spot for me. It’s not a framework or anything, just an engine. Its API has tidy coverage of use cases at every tier of complexity, if that makes sense? It can do the obvious basics, like drawing immediate-mode-style circles or sprites, but it can also batch together those sprite draws for a ridiculous speedup. Without having to know or care or see any details of OpenGL. If you do care about OpenGL, that’s cool: you can write your own shaders, too. But you don’t have to.

LÖVE also has some nice touches like using a virtual filesystem that overlays the game itself with the player’s save directory (which is created for you). That means games are automatically moddable! You can create whatever new assets you want and drop them in your save directory; they’ll replace existing assets of the same name and be picked up by a directory scan. It’s a simple idea that eliminates a whole class of dumb problem I don’t want to think about — where do I put save data? — and at the same time opens the door to some really interesting applications.

Distribution takes a similar approach. You can just concatenate a zipped up project to a LÖVE binary and distribute that. Nothing to build, and games automatically make their source and assets available. (That’s a perk for me; it may not be for you.) You can also point LÖVE at a separate zip file, or even a directory; the latter effectively gives you development mode.

Overall, it’s pretty well thought-out and simple to get into without being opinionated, but with a lot of hidden depth. I dig that kind of approach. My one major criticism is that the API is neither forwards- nor backwards-compatible. My games work on 0.10.2, not 0.9.x (or even 0.10.1), and I can tell from the dev log that they won’t work on 0.11.0 either. It’s unfortunate, but willingness to churn is probably how LÖVE ended up with as nice an API as it has. Maybe things will calm down whenever it hits 1.0.

Bearing all this in mind, let’s look at the competition.

pygame

I’m a huge Python dweeb, so pygame seems like an obvious choice.

I’ve never actually tried it. One of the biggest turn-offs for me is the website, which is admittedly frivolous to care about, but prominent use of lime green sets off alarm bells in my head.

Every time I’ve looked into pygame, it’s felt almost abandoned. I think part of this is that the website has always been using a very generic CMS, where everything is “nodes” and there are tons of superfluous features that don’t seem to belong. Those setups always feel big and empty and vaguely abstract to me. I see there’s now a site revamp in progress, but it’s basically made out of stock Bootstrap, which gives exactly the same impression. I feel like any link I click has a 50–50 chance of being broken or leading to a page that’s been outdated for ten years.

None of this has anything to do with the quality of pygame itself, nor with any concrete facts. The way pygame presents itself just inspires irrational feelings of “if you use this, your project is already obsolete”.

It doesn’t help that I’ve been up to my eyeballs in Python for years, and I’ve seen plenty of people suggest using pygame for game development, yet I’ve never known anyone who has actually used pygame. I can’t name a single game made with pygame. At least I have an acquaintance who’s made a bunch of Ludum Dare games with LÖVE; that’s infinitely more.

I’m also wary of distributing Python software — I know there are lots of tools for doing this, and I’ve seen it done, and many moons ago I even used Python software on a Windows machine without Python installed. But I still expect it to be a pain in the ass in some surprising way.

I know I’m being massively unfair here, and I should really give it a chance sometime. I just, um, don’t want to.

cocos2d

No, not the iPhone one.

That’s actually one of the biggest problems with cocos2d: it’s the name of both a Python library and a vastly more popular iOS library. Have fun Googling for solutions to problems! Oh, and the APIs are almost completely different. And the Python version is much skimpier. And very nearly abandoned, having received only two point releases since the last minor release three years ago.

I have used cocos2d before, on a stub of a game that was abandoned ages ago. I enjoyed it enough, but something about it always felt… clumsy, like everything I tried to do took more effort than necessary. I don’t know if it’s because I was still figuring out game development, because I had to learn cocos’s little framework (versus writing my own scaffolding in LÖVE), because the game I was working on was a bit nebulous, or because something about the design of cocos itself is wrong. I do remember that I had to just about dip into bare vertex-buffer-style OpenGL just to draw lines on the screen for debugging purposes, which I found incredibly annoying. (Don’t tell me it’s faster. I know. If I thought performance were a grave concern, I probably wouldn’t be writing the thing in Python in the first place.)

I did borrow some of cocos2d’s ideas for Under Construction and later games, so I don’t regret using it or anything. It has some good ideas, and it has some handy touches like built-in vector and AABB types. I just wasn’t charmed enough to try using it again. (Yet?)

GameMaker

Closed-source. Windows only. Eh.

Their website says “no barriers to entry” and “making games is for everyone”. Uh huh.

Unity

Ah, yeah, that thing everyone uses.

I don’t have strong opinions of Unity. It’s closed-source, but it has some open source parts, so at least they care a bit and engage with the wider development community.

It’s based on Mono, which gives me pause. Obviously Mono is open source, and I think large chunks of .NET itself are now too, but I’m still very wary of the .NET ecosystem. I don’t have any concrete reason for this; I think living through the IE6 era left me deeply skeptical of anything developer-oriented that has Microsoft fingerprints on it. I’m sure their tools are very nice — plenty of people swear by Visual Studio — but I don’t trust them to actually give a damn about not-Windows. Homebrew software that can’t work on Mono just because it makes a bunch of DirectX calls has not left me particularly impressed with the cross-platform support, either.

But more importantly: Unity for Linux is still experimental, or beta, or something? I think? The actual download page still claims you need Windows 7 or OS X 10.8, and the Linux builds are only available via a forum thread, which doesn’t scream “stable” to me. The thread claims it’s supported, but… it’s still only in a thread, two and a half years after it was first released. I don’t really want to start getting into a huge platform for the first time when the maintainers aren’t confident enough of their Linux port to actually mention it anywhere.

Various web things

There are plenty of these nowadays, like Pixi and… um… others. The distribution story is obviously pretty nice, too: just have a web browser. I’ve been meaning to try one out.

I only haven’t because, well, JavaScript. I don’t particularly enjoy JavaScript. It doesn’t even have a module story yet, unless you tack a bunch of third-party stuff onto it, and I don’t want the first step of writing a game to be “fix the language I’m writing it in”. Also I’ve written more than enough JavaScript in my life already and would like to do something not web-based for a change.

There’s also something about JavaScript that feels clumsy to debug, even though that obviously makes no sense, since every web browser now has gobs of interactive debugging tools baked right in. Maybe it’s that everything is built out of async and promises and event handlers, and those are all still a bit painful to inspect. Or maybe my soul is weary from trying to use debuggers on production sites with minified libraries.

Writing from scratch

Ugh.

I know of SFML, and Allegro, and SDL, and various other libraries somewhere between “game engine” and “generic media handling”. I could definitely make something happen with one of them. I could bind to them from Python if I wanted. Or Rust. Or, hell, Lua.

But I don’t want to? That sounds like I’d spend a bunch of time writing plumbing and not so much time writing game. I mean, yes, okay, I wrote my own physics system even though LÖVE has box2d bindings built in. But I chose to do that because I thought the result would be better, not because I had to invent the universe just to get off the ground.

This is also the approach that would make me care the most about distribution, possibly even in the form of compiling stuff, which I do not enjoy.

In conclusion

My reasoning is probably not as, er, directly rational as readers may have hoped.

In my defense: there were a lot of possible choices here. There are dozens of hyperpopular game engines alone, and far greater numbers of less popular one-offs.

Now, there are two situations I want to avoid more than anything else here.

  1. Spending all of my time looking at game engines and none of my time actually making a game.
  2. Getting halfway through a game only to run into a brick wall, because the chosen engine simply cannot do some very straightforward thing I want.

So I don’t want to get stuck with a dud of an engine, but I also don’t want to spend inordinate amounts of time evaluating dozens of candidates in excruciating detail. (I have enough trouble just deciding what brand of RAM to buy.) The solution is to prune extremely aggressively, discarding anything that has even a whiff of possible inconvenience later. Worst case, I run out of engines and just start again, being less picky on the second round.

pygame? Unclear how much it’s still maintained. Pass.

cocos2d? Not confident about distribution. Pass.

Unity? In beta. Pass.

XNA? Eh on Microsoft. Also apparently discontinued four years ago. Pass.

GameMaker? Don’t want to rely on Wine. Pass.

When all was said and done, not too many contenders remained! So I gave LÖVE a whirl and I liked it well enough. It’s entirely possible I’ve been unfair to one of the things I listed above, or that there’s some amazing game thing I’ve never even heard of. I definitely don’t claim that LÖVE is the best possible tool for all problems, or that everyone should use it — but I’m enjoying it and have successfully made things with it.

I might write a followup to this sometime that comes from the other direction, listing game engines and why you might want to use them, rather than why I weeded them out.

Community Profile: Matt Reed

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-matt-reed/

This column is from The MagPi issue 51. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.

Matt Reed‘s background is in web design/development, extending to graphic design in which he acquired his BFA at the University of Tennessee, Knoxville. In his youth, his passion focused on car stereo systems, designing elaborate builds that his wallet couldn’t afford. However, this enriched his maker skill set by introducing woodwork, electronics, and fabrication exploration into his creations.

Matt Reed Raspberry Pi redpepper MagPi Magazine

Matt hosts the redpepper ‘Touch of Tech’ online series, highlighting the latest in interesting and unusual tech releases

Having joined the integrated marketing agency redpepper eight years ago, Matt originally worked in the design and production of microsites. However, as his interests continued to grow, demand began to evolve, and products such as the Arduino and Raspberry Pi came into the mix. Matt soon found himself moving away from the screen toward physical builds.

“I’m interested in anything that uses tech in a clever way, whether it be AR, VR, front-end, back-end, app dev, servers, hardware, UI, UX, motion graphics, art, science, or human behaviour. I really enjoy coming up with ideas people can relate to.”

Matt’s passion is to make tech seem cool, creative, empowering, and approachable, and his projects reflect this. Away from the Raspberry Pi, Matt has built some amazing creations such as the Home Alone Holidaython, an app that lets you recreate the famous curtain shadow party in Kevin McCallister’s living room. Pick the shadow you want to appear, and projectors illuminate the design against a sheet across the redpepper office window. Christmas on Tweet Street LIVE! captures hilariously negative Christmas-themed tweets from Twitter, displaying them across a traditional festive painting, while DOOR8ELL allows office visitors the opportunity to Slack-message their required staff member via an arcade interface, complete with 8-bit graphics. There’s also been a capacitive piano built with jelly keys, a phone app to simulate the destruction of cars as you sit in traffic, and a working QR code made entirely from Oreos.

Matt Reed Raspberry Pi redpepper MagPi Magazine

The BoomIlluminator, an interactive art installation for the Red Bull Creation Qualifier, used LEDs within empty Red Bull cans that reacted to the bass of any music played. A light show across the cans was then relayed to peoples’ phones, extending the experience.

Playing the ‘technology advocate’ role at redpepper, Matt continues to bridge the gap between the company’s day-to-day business and the fun, intuitive uses of tech. Not only do they offer technological marketing solutions via their rpLab, they have continued to grow, incorporating Google’s Sprint methodology into idea-building and brainstorming within days of receiving a request, “so having tools that are powerful, flexible, and cost-effective like the Pi is invaluable.”

Matt Reed Raspberry Pi redpepper MagPi Magazine

Walk into a room with Doorjam enabled, and suddenly your favourite tune is playing via boombox speakers. Simply select your favourite song from Spotify, walk within range of a Bluetooth iBeacon, and you’re ready to make your entrance in style.

“I just love the intersection of art and science,” Matt explains when discussing his passion for tech. “Having worked with Linux servers for most of my career, the Pi was the natural extension for my interest in hardware. Running Node.js on the Pi has become my go-to toolset.”

Matt Reed Raspberry Pi redpepper MagPi Magazine

Slackbot Bot: Users of the multi-channel messenger service Slack will appreciate this one. Beacons throughout the office allow users to locate Slackbot Bot, which features a tornado siren mounted on a Roomba, and send it to predetermined locations to deliver messages. “It was absolutely hilarious to test in the office.”

We’ve seen Matt’s Raspberry Pi-based portfolio grow over the last couple of years. A few of his builds have been featured in The MagPi, and his Raspberry Preserve was placed 13th in the Top 50 Raspberry Pi Builds in issue 50.

Matt Reed Raspberry Pi redpepper MagPi Magazine

Matt Reed’s ‘Raspberry Preserve’ build allows uses to store their precious photos in a unique memory jar

There’s no denying that Matt will continue to be ‘one to watch’ in the world of quirky, original tech builds. You can follow his work at his website or via his Twitter account.

The post Community Profile: Matt Reed appeared first on Raspberry Pi.

Secure Amazon EMR with Encryption

Post Syndicated from Sai Sriparasa original https://aws.amazon.com/blogs/big-data/secure-amazon-emr-with-encryption/

In the last few years, there has been a rapid rise in enterprises adopting the Apache Hadoop ecosystem for critical workloads that process sensitive or highly confidential data. Due to the highly critical nature of the workloads, the enterprises implement certain organization/industry wide policies and certain regulatory or compliance policies. Such policy requirements are designed to protect sensitive data from unauthorized access.

A common requirement within such policies is about encrypting data at-rest and in-flight. Amazon EMR uses “security configurations” to make it easy to specify the encryption keys and certificates, ranging from AWS Key Management Service to supplying your own custom encryption materials provider.

You create a security configuration that specifies encryption settings and then use the configuration when you create a cluster. This makes it easy to build the security configuration one time and use it for any number of clusters.

o_Amazon_EMR_Encryption_1

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR
  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption
  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Encryption walkthrough

For this post, you create a security configuration that implements encryption in transit and at rest. To achieve this, you create the following resources:

  • KMS keys for LUKS encryption and S3 client-side encryption for data exiting EMR to S3
  • SSL certificates to be used for MapReduce shuffle encryption
  • The environment into which the EMR cluster is launched. For this post, you launch EMR in private subnets and set up an S3 VPC endpoint to get the data from S3.
  • An EMR security configuration

All of the scripts and code snippets used for this walkthrough are available on the aws-blog-emrencryption GitHub repo.

Generate KMS keys

For this walkthrough, you use AWS KMS, a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data and disks.

You generate two KMS master keys, one for S3 client-side encryption to encrypt data going out of EMR and the other for LUKS encryption to encrypt the local disks. The Hadoop MapReduce framework uses HDFS. Spark uses the local file system on each slave instance for intermediate data throughout a workload, where data could be spilled to disk when it overflows memory.

To generate the keys, use the kms.json AWS CloudFormation script.  As part of this script, provide an alias name, or display name, for the keys. An alias must be in the “alias/aliasname” format, and can only contain alphanumeric characters, an underscore, or a dash.

o_Amazon_EMR_Encryption_2

After you finish generating the keys, the ARNs are available as part of the outputs.

o_Amazon_EMR_Encryption_3

Generate SSL certificates

The SSL certificates allow the encryption of the MapReduce shuffle using HTTPS while the data is in transit between nodes.

o_Amazon_EMR_Encryption_4

For this walkthrough, use OpenSSL to generate a self-signed X.509 certificate with a 2048-bit RSA private key that allows access to the issuer’s EMR cluster instances. This prompts you to provide subject information to generate the certificates.

Use the cert-create.sh script to generate SSL certificates that are compressed into a zip file. Upload the zipped certificates to S3 and keep a note of the S3 prefix. You use this S3 prefix when you build your security configuration.

Important

This example is a proof-of-concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates.

To implement certificates from custom providers, use the TLSArtifacts provider interface.

Build the environment

For this walkthrough, launch an EMR cluster into a private subnet. If you already have a VPC and would like to launch this cluster into a public subnet, skip this section and jump to the Create a Security Configuration section.

To launch the cluster into a private subnet, the environment must include the following resources:

  • VPC
  • Private subnet
  • Public subnet
  • Bastion
  • Managed NAT gateway
  • S3 VPC endpoint

As the EMR cluster is launched into a private subnet, you need a bastion or a jump server to SSH onto the cluster. After the cluster is running, you need access to the Internet to request the data keys from KMS. Private subnets do not have access to the Internet directly, so route this traffic via the managed NAT gateway. Use an S3 VPC endpoint to provide a highly reliable and a secure connection to S3.

o_Amazon_EMR_Encryption_5

In the CloudFormation console, create a new stack for this environment and use the environment.json CloudFormation template to deploy it.

As part of the parameters, pick an instance family for the bastion and an EC2 key pair to be used to SSH onto the bastion. Provide an appropriate stack name and add the appropriate tags. For example, the following screenshot is the review step for a stack that I created.

o_Amazon_EMR_Encryption_6

After creating the environment stack, look at the Output tab and make a note of the VPC ID, bastion, and private subnet IDs, as you will use them when you launch the EMR cluster resources.

o_Amazon_EMR_Encryption_7

Create a security configuration

The final step before launching the secure EMR cluster is to create a security configuration. For this walkthrough, create a security configuration with S3 client-side encryption using EMR, and LUKS encryption for local volumes using the KMS keys created earlier. You also use the SSL certificates generated and uploaded to S3 earlier for encrypting the MapReduce shuffle.

o_Amazon_EMR_Encryption_8

Launch an EMR cluster

Now, you can launch an EMR cluster in the private subnet. First, verify that the service role being used for EMR has access to the AmazonElasticMapReduceRole managed service policy. The default service role is EMR_DefaultRole. For more information, see Configuring User Permissions Using IAM Roles.

From the Build an environment section, you have the VPC ID and the subnet ID for the private subnet into which the EMR cluster should be launched. Select those values for the Network and EC2 Subnet fields. In the next step, provide a name and tags for the cluster.

o_Amazon_EMR_Encryption_9

The last step is to select the private key, assign the security configuration that was created in the Create a security configuration section, and choose Create Cluster.

o_Amazon_EMR_Encryption_10

Now that you have the environment and the cluster up and running, you can get onto the master node to run scripts. You need the IP address, which you can retrieve from the EMR console page. Choose Hardware, Master Instance group and note the private IP address of the master node.

o_Amazon_EMR_Encryption_11

As the master node is in a private subnet, SSH onto the bastion instance first and then jump from the bastion instance to the master node. For information about how to SSH onto the bastion and then to the Hadoop master, open the ssh-commands.txt file. For more information about how to get onto the bastion, see the Securely Connect to Linux Instances Running in a Private Amazon VPC post.

After you are on the master node, bring your own Hive or Spark scripts. For testing purposes, the GitHub /code directory includes the test.py PySpark and test.q Hive scripts.

Summary

As part of this post, I’ve identified the different phases where data needs to be encrypted and walked through how data in each phase can be encrypted. Then, I described a step-by-step process to achieve all the encryption prerequisites, such as building the KMS keys, building SSL certificates, and launching the EMR cluster with a strong security configuration. As part of this walkthrough, you also secured the data by launching your cluster in a private subnet within a VPC, and used a bastion instance for access to the EMR cluster.

If you have questions or suggestions, please comment below.


About the Author

sai_90Sai Sriparasa is a Big Data Consultant for AWS Professional Services. He works with our customers to provide strategic & tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 

 


Related

Implementing Authorization and Auditing using Apache Ranger on Amazon EMR

EMRRanger_1

 

 

 

 

 

 

 

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca we have… because we have no other lingua francas. Linguae franca. Surgeons general?

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I call it “weakly typed” and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Unix shells” means some handwaved combination that probably includes bash and its descendants; for expressions, it means the (( ... )) syntax. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files. Ruby doesn’t automatically give required files their own namespace, but doesn’t evaluate them in the caller’s namespace either.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: D, expr, JavaScript, Perl 5, PHP.

Wisened up: F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are +& +| +^), Python, Ruby, Rust, SQL, Swift, Unix shells.

Special mention: C# and Java have C’s precedence, but forbid using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. (Well, since C99; before that it was unspecified, which is probably worse.) This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, Unix shells, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Unix shells.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Ruby, Rust, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, Scala, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), COBOL, CoffeeScript, ECMAScript 7, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby, Unix shells.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), ECMAScript 6, Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript (pre ES6) has both for...in and for each...in, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Ruby, Swift (has a fallthrough statement), Unix shells.

Special mention: C# and D require break, but require something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: ACS, ALGOL, C#, D (though [] goes on the type), Fortran, Java, Perl 6.

Looks Lovely: Ada, Boo, F#, Go, Python 3 (via annotation syntax and the typing module), Rust, Swift, TypeScript.

Special mention: BASIC uses trailing type sigils to indicate scalar types.

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), PHP, Unix shells (almost everything’s a string, but even arrays/scalars are somewhat interchangeable).

Strong: F#, Go (explicit numeric casts), Haskell, Python, Rust (explicit numeric casts).

Special mention: ACS only has integers; even fixed-point values are stored in integers, and the compiler has no notion of a fixed-point type, making it the weakest language imaginable. C++ and Scala both allow defining implicit conversions, for better or worse. Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus. Tcl only has strings.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid), Unix shells.

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Haskell, Perl 6, Python, and probably others have separate integral division operators: div, div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Go, Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust, Swift (even gets combining characters right!).

Special mention: Go’s strings are explicitly arbitrary byte sequences, but iterating over a string with for..range decodes UTF-8 code points. Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Hmm. This one is kind of hard to track down for sure without either knowing a lot about internals or installing fifty different interpreters/compilers.

Increment and decrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood: ACS, awk, C#, D, Go, Java, JavaScript, Perl, Unix shells, Vala.

Double plus good: Lua (which doesn’t have += either), Python, Ruby, Rust, Swift (removed in v3).

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Inform 7, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Inform 7 often blends the negation into the verb, e.g. if the player does not have.... Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter); I believe Perl 5 prefers the words and Ruby prefers the symbols. Perl and Ruby also both have a separate unless block, with the opposite meaning to if. Python has is not and not in operators.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Go, Lua.

Special mention: C# has explicit syntax for out parameters, but it’s a compile-time error to not assign to all of them, which is slightly better than C. Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us! And even Unix shells have set -e; it’s just opt-in.

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about; it also allows both the assignment and the error check together in the header of an if. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation — apparently sugar for an “out” parameter holding an error. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in).

Nullable types: Swift (sugar for a monad).

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that. R and SQL have a more mathematical kind of NULL, which tends to e.g. vanish from lists.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Freedom of expression: ACS, C#, Java, JavaScript, Perl, PHP, Swift.

Makes a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. D allows variable declaration as an expression, so if (int x = 3) is allowed, but regular assignment is not. Functional languages generally don’t have an assignment operator. Go disallows assignment as an expression, but assignment and a test can appear together in an if condition, and this is an idiomatic way to check success. Ruby makes everything an expression, so assignment might as well be too. Rust makes everything an expression, but assignment evaluates to the useless () value (due to ownership rules), so it’s not actually useful. Rust and Swift both have a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low scoring: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-designed: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: Go (ASI), JavaScript (ASI — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons. MUMPS doesn’t support nesting at all, but I believe there are extensions that use dots to indicate it.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. (This approach is called automatic semicolon insertion.) That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca language we have… because we have no other lingua franca languages.

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I complain about argument order and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Bash” generally means zsh and ksh and other derivatives as well, and when referring to expressions, means the $(( ... )) syntax; “Unix shells” means Bourne and thus almost certainly everything else as well. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Ruby, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: C#, D, expr, JavaScript, Perl 5, PHP.

Wisened up: Bash, F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are ?& ?| ?^), Python, Ruby, SQL, Swift.

Special mention: Java has C’s precedence, but forbids using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: Bash, C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Bash, Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Scala.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Racket (#o), Ruby, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), Bash, COBOL, CoffeeScript, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript has both for...in and for...of, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: C#, Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Swift (has a fallthrough statement), Unix shells.

Special mention: D requires break, but requires something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: Java, Perl 6

Looks Lovely: Python 3 (annotation syntax and the typing module), Rust, Swift, TypeScript

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), Unix shells (everything’s a string, but even arrays/scalars are somewhat interchangeable)

Strong: Rust (even numeric upcasts must be explicit).

Special mention: Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: Bash, bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid).

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Perl 6, Python, and probably others have separate integral division operators: div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust.

Special mention: Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Autoincrement and autodecrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood:

Double plus good: Python

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter), but I believe the named forms are preferred.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Go, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Lua.

Special mention: Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us!

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation, so I think it’s all sugar for a monadic return value. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in), R.

Nullable types: Swift.

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Some languages go entirely the opposite direction and make everything an expression, including block constructs like if. In those languages, it makes sense for assignment to be an expression, for consistency with everything else.

Assignment’s an expression: ACS, C#, D, Java, JavaScript, Perl, PHP, Swift.

Everything’s an expression: Ruby, Rust.

Assignment’s a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. Functional languages generally don’t have an assignment operator. Rust has a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low score: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-named: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: JavaScript (kinda — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

Developer Preview – EC2 Instances (F1) with Programmable Hardware

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/developer-preview-ec2-instances-f1-with-programmable-hardware/

Have you ever had to decide between a general purpose tool and one built for a very specific purpose? The general purpose tools can be used to solve many different problems, but may not be the best choice for any particular one. Purpose-built tools excel at one task, but you may need to do that particular task infrequently.

Computer engineers face this problem when designing architectures and instruction sets, almost always pursuing an approach that delivers good performance across a very wide range of workloads. From time to time, new types of workloads and working conditions emerge that are best addressed by custom hardware. This requires another balancing act: trading off the potential for incredible performance vs. a development life cycle often measured in quarters or years.

Enter the FPGA
One of the more interesting routes to a custom, hardware-based solution is known as a Field Programmable Gate Array, or FPGA. In contrast to a purpose-built chip which is designed with a single function in mind and then hard-wired to implement it, an FPGA is more flexible. It can be programmed in the field, after it has been plugged in to a socket on a PC board. Each FPGA includes a fixed, finite number of simple logic gates. Programming an FPGA is “simply” a matter of connecting them up to create the desired logical functions (AND, OR, XOR, and so forth) or storage elements (flip-flops and shift registers). Unlike a CPU which is essentially serial (with a few parallel elements) and has fixed-size instructions and data paths (typically 32 or 64 bit), the FPGA can be programmed to perform many operations in parallel, and the operations themselves can be of almost any width, large or small.

This highly parallelized model is ideal for building custom accelerators to process compute-intensive problems. Properly programmed, an FPGA has the potential to provide a 30x speedup to many types of genomics, seismic analysis, financial risk analysis, big data search, and encryption algorithms and applications.

I hope that this sounds awesome and that you are chomping at the bit to use FPGAs to speed up your own applications! There are a few interesting challenges along the way. First, FPGAs have traditionally been a component of a larger, purpose-built system. You cannot simply buy one and plug it in to your desktop. Instead, the route to FPGA-powered solutions has included hardware prototyping, construction of a hardware appliance, mass production, and a lengthy sales & deployment cycle. The lead time can limit the applicability of FPGAs, and also means that Moore’s Law has time to make CPU-based solutions more cost-effective.

We think we can do better here!

The New F1 Instance
Today we are launching a developer preview of the new F1 instance. In addition to building applications and services for your own use, you will be able to package them up for sale and reuse in AWS Marketplace.  Putting it all together, you will be able to avoid all of the capital-intensive and time-consuming steps that were once a prerequisite to the use of FPGA-powered applications, using a business model that is more akin to that used for every other type of software. We are giving you the ability to design your own logic, simulate and verify it using cloud-based tools, and then get it to market in a matter of days.

Equipped with Intel Broadwell E5 2686 v4 processors (2.3 GHz base speed, 2.7 GHz Turbo mode on all cores, and 3.0 GHz Turbo mode on one core), up to 976 GiB of memory, up to 4 TB of NVMe SSD storage, and one to eight FPGAs, the F1 instances provide you with plenty of resources to complement your core, FPGA-based logic. The FPGAs are dedicated to the instance and are isolated for use in multi-tenant environments.

Here are the specs on the FPGA (remember that there are up to eight of these in a single F1 instance):

  • Xilinx UltraScale+ VU9P  fabricated using a 16 nm process.
  • 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).
  • Dedicated PCIe x16 interface to the CPU.
  • Approximately 2.5 million logic elements.
  • Approximately 6,800 Digital Signal Processing (DSP) engines.
  • Virtual JTAG interface for debugging.

In instances with more than one FPGA,  dedicated PCIe fabric allows the FPGAs to share the same memory address space and to communicate with each other across a PCIe Fabric at up to 12 Gbps in each direction.  The FPGAs within an instance share access to a 400 Gbps bidirectional ring for low-latency, high bandwidth communication (you’ll need to define your own protocol in order to make use of this advanced feature).

The FPGA Development Process

As part of the developer preview we are also making an FPGA developer AMI available. You can launch this AMI on a memory-optimized or compute-optimized instance for development and simulation, and then use an F1 instance for final debugging and testing.

This AMI includes a set of developer tools that you can use in the AWS Cloud at no charge. You write your FPGA code using VHDL or Verilog and then compile, simulate, and verify it using tools from the Xilinx Vivado Design Suite (you can also use third-party simulators, higher-level language compilers, graphical programming tools, and FPGA IP libraries).

Here’s the Verilog code for a simple 8-bit counter:

module up_counter(out, enable, clk, reset);
output [7:0] out;
input enable, clk, reset;
reg [7:0] out;
always @(posedge clk)
if (reset) begin
  out <= 8'b0;
end else if (enable) begin
  out <= out + 1;
end
endmodule

Although these languages are often described as using C-like syntax (and that’s what I used to stylize the code), this does not mean that you can take existing code and recompile it for use on an FPGA. Instead, you need to start by gaining a strong understanding of the FPGA programming model, learn Boolean algebra, and start to learn about things like propagation delays and clock edges. With that as a foundation, you will be able to start thinking about ways to put FPGAs to use in your environment. If this is too low-level for you, rest assured that you can also use many existing High Level Synthesis tools, including OpenCL, to program the FPGA.

After I launched my instance, I logged in, installed a bunch of packages, and set up the license manager so that I could run the Vivado tools. Then I RDP’ed in to the desktop, opened up a terminal window, and started Vivado in GUI mode:

I opened up the sample project (counter.xpr) and was rewarded with my first look at how FPGA’s are designed and programmed:

After a bit of exploration I managed to synthesize my first FPGA (I was doing little more than clicking interesting stuff at this point; I am not even a novice at this stuff):

From here, I would be able to test my design, package it up as an Amazon FPGA Image (AFI), and then use it for my own applications or list it in AWS Marketplace. I hope to be able to show you how to do all of these things within a couple of weeks.

The F1 Hardware Development Kit
After I learned about the F1 instances, one of my first questions had to do with the interfaces between the FPGA(s), the CPU(s), and main memory. The F1 Hardware Development Kit (HDK) includes preconfigured I/O interfaces and sample applications for multiple communication methods including host-to-FPGA, FPGA-to-memory, and FPGA-to-FPGA. It also includes compilation scripts, reference examples, and an in-field debug toolset.

The Net-Net
The bottom line here is that the combination of the F1 instances, the cloud-based development tools, and the ability to sell FPGA-powered applications is unique and powerful. The power and flexibility of the FPGA model is now accessible all AWS users; I am sure that this will inspire entirely new types of applications and businesses.

Get Started Today
As I mentioned earlier, we are launching in developer preview form today in the US East (Northern Virginia) Region (we will expand to multiple regions when the instances become generally available in early 2017). If you have prior FPGA programming experience and are interested in getting started, you should sign up now.

Jeff;

How to Help Achieve Mobile App Transport Security (ATS) Compliance by Using Amazon CloudFront and AWS Certificate Manager

Post Syndicated from Lee Atkinson original https://aws.amazon.com/blogs/security/how-to-help-achieve-mobile-app-transport-security-compliance-by-using-amazon-cloudfront-and-aws-certificate-manager/

Web and application users and organizations have expressed a growing desire to conduct most of their HTTP communication securely by using HTTPS. At its 2016 Worldwide Developers Conference, Apple announced that starting in January 2017, apps submitted to its App Store will be required to support App Transport Security (ATS). ATS requires all connections to web services to use HTTPS and TLS version 1.2. In addition, Google has announced that starting in January 2017, new versions of its Chrome web browser will mark HTTP websites as being “not secure.”

In this post, I show how you can generate Secure Sockets Layer (SSL) or Transport Layer Security (TLS) certificates by using AWS Certificate Manager (ACM), apply the certificates to your Amazon CloudFront distributions, and deliver your websites and APIs over HTTPS.

Background

Hypertext Transfer Protocol (HTTP) was proposed originally without the need for security measures such as server authentication and transport encryption. As HTTP evolved from covering simple document retrieval to sophisticated web applications and APIs, security concerns emerged. For example, if someone were able to spoof a website’s DNS name (perhaps by altering the DNS resolver’s configuration), they could direct users to another web server. Users would be unaware of this because the URL displayed by the browser would appear just as the user expected. If someone were able to gain access to network traffic between a client and server, that individual could eavesdrop on HTTP communication and either read or modify the content, without the client or server being aware of such malicious activities.

Hypertext Transfer Protocol Secure (HTTPS) was introduced as a secure version of HTTP. It uses either SSL or TLS protocols to create a secure channel through which HTTP communication can be transported. Using SSL/TLS, servers can be authenticated by using digital certificates. These certificates can be digitally signed by one of the certificate authorities (CA) trusted by the web client. Certificates can mitigate website spoofing and these can be later revoked by the CA, providing additional security. These revoked certificates are published by the authority on a certificate revocation list, or their status is made available via an online certificate status protocol (OCSP) responder. The SSL/TLS “handshake” that initiates the secure channel exchanges encryption keys in order to encrypt the data sent over it.

To avoid warnings from client applications regarding untrusted certificates, a CA that is trusted by the application must sign the certificates. The process of obtaining a certificate from a CA begins with generating a key pair and a certificate signing request. The certificate authority uses various methods in order to verify that the certificate requester is the owner of the domain for which the certificate is requested. Many authorities charge for verification and generation of the certificate.

Use ACM and CloudFront to deliver HTTPS websites and APIs

The process of requesting and paying for certificates, storing and transporting them securely, and repeating the process at renewal time can be a burden for website owners. ACM enables you to easily provision, manage, and deploy SSL/TLS certificates for use with AWS services, including CloudFront. ACM removes the time-consuming manual process of purchasing, uploading, and renewing certificates. With ACM, you can quickly request a certificate, deploy it on your CloudFront distributions, and let ACM handle certificate renewals. In addition to requesting SSL/TLS certificates provided by ACM, you can import certificates that you obtained outside of AWS.

CloudFront is a global content delivery network (CDN) service that accelerates the delivery of your websites, APIs, video content, and other web assets. CloudFront’s proportion of traffic delivered via HTTPS continues to increase as more customers use the secure protocol to deliver their websites and APIs.

CloudFront supports Apple’s ATS requirements for TLS 1.2, Perfect Forward Secrecy, server certificates with 2048-bit Rivest-Shamir-Adleman (RSA) keys, and a choice of ciphers. See more details in Supported Protocols and Ciphers.

The following diagram illustrates an architecture with ACM, a CloudFront distribution and its origins, and how they integrate to provide HTTPS access to end users and applications.

Solution architecture diagram

  1. ACM automates the creation and renewal of SSL/TLS certificates and deploys them to AWS resources such as CloudFront distributions and Elastic Load Balancing load balancers at your instruction.
  2. Users communicate with CloudFront over HTTPS. CloudFront terminates the SSL/TLS connection at the edge location.
  3. You can configure CloudFront to communicate to the origin over HTTP or HTTPS.

CloudFront enables easy HTTPS adoption. It provides a default *.cloudfront.net wildcard certificate and supports custom certificates, which can be either created by a third-party CA, or created and managed by ACM. ACM automates the process of generating and associating certificates with your CloudFront distribution for the first time and on each renewal. CloudFront supports the Server Name Indication (SNI) TLS extension (enabling efficient use of IP addresses when hosting multiple HTTPS websites) and dedicated-IP SSL/TLS (for older browsers and legacy clients that do no support SNI).

Keeping that background information in mind, I will now show you how you can generate a certificate with ACM and associate it with your CloudFront distribution.

Generate a certificate with ACM and associate it with your CloudFront distribution

In order to help deliver websites and APIs that are compliant with Apple’s ATS requirements, you can generate a certificate in ACM and associate it with your CloudFront distribution.

To generate a certificate with ACM and associate it with your CloudFront distribution:

  1. Go to the ACM console and click Get started.
    ACM "Get started" page
  2. On the next page, type the website’s domain name for your certificate. If applicable, you can enter multiple domains here so that the same certificate can be used for multiple websites. In my case, I type *.leeatk.com to create what is known as a wildcard certificate that can be used for any domain ending in .leeatk.com (that is a domain I own). Click Review and request.
    Request a certificate page
  3. Click Confirm and request. You must now validate that you own the domain. ACM sends an email with a verification link to the domain registrant, technical contact, and administrative contact registered in the Whois record for the domain. ACM also sends the verification link to email addresses commonly associated with an administrator of a domain: administrator, hostmaster, postmaster, and webmaster. ACM sends the same verification email to all these addresses in the expectation that at least one address is monitored by the domain owner. The link in any of the emails can be used to verify the domain.
    List of email addresses to which the email with verification link will be sent
  4. Until the certificate has been validated, the status of the certificate remains Pending validation. When I went through this approval process for *.leeatk.com, I received the verification email shown in the following screenshot. When you receive the verification email, click the link in the email to approve the request.
    Example verification email
  5. After you click I Approve on the landing page, you will then see a page that confirms that you have approved an SSL/TLS certificate for your domain name.
    SSL/TLS certificate confirmation page
  6. Return to the ACM console, and the certificate’s status should become Issued. You may need to refresh the webpage.
    ACM console showing the certificate has been issued
  7. Now that you have created your certificate, go to the CloudFront console and select the distribution with which you want to associate the certificate.
    Screenshot of associating the CloudFront distribution with which to associate the certificate
  8. Click Edit. Scroll down to SSL Certificate and select Custom SSL certificate. From the drop-down list, select the certificate provided by ACM. Select Only Clients that Support Server Name Indication (SNI). You could select All Clients if you want to support older clients that do not support SNI.
    Screenshot of choosing a custom SSL certificate
  9. Save the configuration by clicking Yes, Edit at the bottom of the page.
  10. Now, when you view the website in a browser (Firefox is shown in the following screenshot), you see a green padlock in the address bar, confirming that this page is secured with a certificate trusted by the browser.
    Screenshot showing green padlock in address bar

Configure CloudFront to redirect HTTP requests to HTTPS

We encourage you to use HTTPS to help make your websites and APIs more secure. Therefore, we recommend that you configure CloudFront to redirect HTTP requests to HTTPS.

To configure CloudFront to redirect HTTP requests to HTTPS:

  1. Go to the CloudFront console, select the distribution again, and then click Cache Behavior.
    Screenshot showing Cache Behavior button
  2. In my case, I only have one behavior in my distribution. (If I had more behaviors, I would repeat the process for each behavior that I wanted to have HTTP-to-HTTPS redirection) and click Edit.
  3. Next to Viewer Protocol Policy, choose Redirect HTTP to HTTPS, and click Yes, Edit at the bottom of the page.
    Screenshot of choosing Redirect HTTP to HTTPS

I could also consider employing an HTTP Strict Transport Security (HSTS) policy on my website. In this case, I would add a Strict-Transport-Security response header at my origin to instruct browsers and other applications to make only HTTPS requests to my website for a period of time specified in the header’s value. This ensures that if a user submits a URL to my website specifying only HTTP, the browser will make an HTTPS request anyway. This is also useful for websites that link to my website using HTTP URLs.

Summary

CloudFront and ACM enable more secure communication between your users and your websites. CloudFront allows you to adopt HTTPS for your websites and APIs. ACM provides a simple way to request, manage, and renew your SSL/TLS certificates, and deploy those to AWS services such as CloudFront. Mobile application developers and API providers can more easily meet Apple’s ATS requirements now using CloudFront, in time for the January 2017 deadline.

If you have comments about this post, submit them in the “Comments” section below. If you have implementation questions, please start a new thread on the CloudFront forum.

– Lee

I entered Ludum Dare 36

Post Syndicated from Eevee original https://eev.ee/blog/2016/08/29/i-entered-ludum-dare-36/

Short story: I made a video game again! This time it was for Ludum Dare, a game jam with some tight rules: solo only, 48 hours to make the game and all its (non-code) assets.

(This is called the “Compo”; there’s also a 72-hour “Jam” which is much more chill, but I did hard mode. Usually there’s a ratings round, but not this time, for reasons.)

I used the PICO-8 again, so you can play it on the web as long as you have a keyboard. It’s also on Ludum Dare, and in splore, and here’s the cartridge too.

Isaac's Descent

But wait! Read on a bit first.

Foreword

I’ve never entered a game jam before, and I slightly regretted that I missed a PICO-8 jam that was happening while I was making Under Construction. I’ve certainly never made a game in 48 hours, so that seemed exciting.

More specifically, I have some trouble with shaking ideas loose. I don’t know a more specific word than “idea” for this, but I mean creative, narrative ideas: worldbuilding, characters, events, gameplay mechanics, and the like. They have a different texture from “how could I solve this technical problem” ideas or “what should I work on today” ideas.

I’ll often have an idea or two, maybe a theme I want to move towards, and then hit a wall. I can’t think of any more concepts; I can’t find any way to connect the handful I have. I end up shelving the idea, sometimes indefinitely. This has been particularly haunting with my interactive fiction game in progress, Runed Awakening, which by its very nature is nothing but narrative ideas.

My true goal for entering Ludum Dare was to jiggle the idea faucet and maybe loosen it a bit. Nothing’s quite as motivating as an extreme time limit. I went in without anything in mind; I didn’t even know it was coming up until two days beforehand. (The start time is softly enforced by the announcement of a theme, anyway.) I knew it would probably resemble a platformer, since I already had the code available to make that work, but that was about it.


I already wrote about the approach to making our last game, so I can’t very well just do that again. Instead, here’s something a little different: I took regular notes on the state of the game (and myself), all weekend. You can see exactly how it came together, almost hour by hour. Is that interesting? I think it’s interesting.

I don’t know if this is a better read if you play the game first or last. Maybe both?

There’s also a surprise at the very end, as a reward for reading through it all! No, wait, stop, you can’t just scroll down, that’s cheating—

Timeline

Thursday

09:00 — Already nervous. Registered for the site yesterday; voted on the themes today; jam actually starts tomorrow. I have no idea if I can do this. What a great start.

Friday

09:00 — Even more nervous. Last night I started getting drowsy around 5pm, I guess because my sleep is still a bit weird. So not only do I only have 48 hours, but by the looks of things, I’ll be spending half that time asleep.

17:00 — I can’t even sit still and do anything for the next hour; I’m too antsy about getting started.

START!! 18:00 — Theme revealed: “Ancient Technology”. I have no ideas.

Well, no, hang on. Shortly before the theme was announced, I had a brief Twitter conversation that shook something loose. I’d mentioned that I rarely seem to have enough ideas to fill a game. Someone accidentally teased out of me that it’s more specific than that: I have trouble coming up with ideas that appeal to me, that satisfy me in the way I really like in games and stories. In retrospect, I probably have a bad habit of rejecting ideas by reflex before I even have a chance to think about them and turn them into something more inspiring.

The same person also asked how I want games to feel, and of course, that’s what I should be keeping front and center, before even worrying about genre or mechanics or anything. How does this feel, and how does it make me feel? I know that’s important, but I’m not in the habit of thinking about it.

With that in mind, how does “ancient technology” make me feel?

It reminds me immediately of two things: Indiana Jones-esque temples, full of centuries-old mechanisms and unseen triggers that somehow still work perfectly; and also Stargate, where a race literally called “Ancients” made preposterously advanced devices with such a sleek and minimalist design that they might as well have been magic.

The common thread is a sense of, hm, “soft wonder”? You’re never quite sure what’s around the next corner, but it won’t be a huge surprise, just a new curiosity. There’s the impression of a coherent set of rules somewhere behind the scenes, but you never get to see it, and it doesn’t matter that much anyway. You catch a glimpse of what’s left behind, and half your astonishment is that it’s still here at all.

Also, I bet I can make a puzzle-platformer out of this.

18:20 — Okay, well! I have a character Isaac (stolen from Glip, ahem) who exists in Runed Awakening but otherwise has never seen any real use. I might as well use them now, which means this game is also set somewhere in Flora.

I’ve drawn a two-frame walking animation and saved it as isaac.p8 for now. It’s enough to get started. I’m gonna copy/paste all the engine gunk from my unfinished game, rainblob — it’s based on what was in Under Construction, with some minor cleanups and enhancements.

19:00 — I’m struggling a little bit here, because Isaac is two tiles tall, and I never got around to writing real support for actors that are bigger than a single tile. Most of the sprite drawing is now wrapped in a little sprite type, so I don’t think this will be too bad — I almost have it working, except that it doesn’t run yet.

19:07 — Success! Apparently I was closer than I thought. The solution is a bit of a hack: instead of a list of tiles (as animation frames), Isaac has a list of lists of tiles, where each outer list is the animation for one grid space. It required some type-checking to keep the common case working (boo), and it blindly assumes any multi-tile actor is a 1×n rectangle. It’s fine. Whatever. I’ll fix it if I really need to.

19:16 — I drew and placed some cave floor tiles. Isaac can no longer walk left or jump. I am not sure why. I really, really hope it’s not another collision bug. The collision function has been such a nightmare. Is it choking on a moving object that’s more than a tile tall?

19:20 — I have been asked to put a new bag in the trash can. This is wildly unjust. I do not have time for such trivialities. But I have to pee anyway, so it’s okay — I’ll batch these two standing-up activities together to save time. Speed strats.

19:28 — The left/jump thing seems to be a bug with the PICO-8; the button presses don’t register at all. Restarting the “console” fixed it. This is ominous; I hope a mysterious heisenbug doesn’t plague me for the next 46½ hours.

19:51 — Isaac is a wizard. Surely, they should be able to cast spells or whatever. Teeny problem: the PICO-8 only has two buttons, and I need one of them for jumping. (Under Construction uses up for jump, but I’ve seen several impassioned pleas against doing that because it makes using a real d-pad very awkward, and after using the pocketCHIP I’m inclined to agree.)

New plan, then: you have an inventory. Up and down scroll through it, and the spare button means “use the selected item”. Accordingly, I’ve put a little “selected item” indicator in the top left of the screen.

Isaac hasn’t seen too much real character development; it’s hard to develop a character without actually putting them in something. Their backstory thusfar isn’t really important for this game, but I did have the idea that they travel with a staff that can create a reflective bubble. That’s interesting, because it suggests that Isaac prefers to operate defensively. I made a staff sprite and put it in the starting inventory, but I’m not quite sure what to do with it yet; I don’t know how the bubble idea would work in a tiny game.

20:01 — As a proof of concept, I made the staff shoot out particles when you use it. The particle system is from rainblob, and is pretty neat — they’re just dumb actors that draw themselves as a single pixel.

I bound the X button to “use”. Should jumping be X or O? I’m not sure, hm. My Nintendo instincts tell me the right button is for jumping, but on a keyboard, the “d-pad” and buttons are reversed.

20:04 — I realize I added a sound effect for jumping, then accientally overwrote the code that plays it. Oops; fixing that. Good thing I didn’t overwrite the sound! This is what I get for trying to edit the assets in the PICO-8 and the code in vim, when it’s all stored in a single file.

20:37 — I have a printat function (from Under Construction) which prints text to the screen with a given horizontal and vertical alignment. It needs to know the width of text to do this, which is easy enough: the PICO-8 font is fixed-width. Alas! The latest PICO-8 release added characters to represent the controller buttons, and I’d really like to use them, but they’re double-wide. Hacking around this is proving a bit awkward, especially since there’s no ord() function available oh my god.

20:50 — Okay, done. The point of that was: I rigged a little hint that tells you what button to press to jump. When you approach the first ledge, Isaac sprouts a tiny thought bubble with the O button symbol in it. PICO-8 games tend not to explain themselves (something that has frustrated me more than once), so I think that’s nice. It’s the kind of tiny detail I love including in my work.

21:04 — I wrote a tiny fragment of music, but I really don’t know what I’m doing here, so… I don’t know.

I had the idea that there’d be runes carved in the back wall of this cave, so I made a sprite for that, though it’s basically unrecognizable at this size. I don’t know what reading them will do, yet.

I also made the staff draw a bubble (in the form of a circle around you) while you’re holding the “use” button down, via a cheap hack. Kinda just throwing stuff at the wall in the hopes that something will stick.

21:07 — I’ve decided to eat these chips while I ponder where to go from here.

21:22 — So, argh. Isaac’s staff is supposed to create a bubble that reflects magical attacks. The immediate problem there is that my collision assumes everything is a rectangle. I really don’t want to be rewriting collision with only a weekend to spend on this. I could make the bubble rectangular, but who’s ever heard of a rectangular magic bubble?

Maybe I could make this work, but it raises more questions: what magical attacks? What attacks you? Are there monsters? Do I have to write monster AI? Can Isaac die? I need to translate these scraps of thematics into game mechanics, somehow.

I try to remember to think about the feel. I want you to feel like you’re exploring an old cavern/temple/something, laden with traps meant to keep you out. I think that means death, and death means save points, and save points mean saving the game state, which I don’t have extant code for. Oof.

22:00 — Not much has changed; I started doodling sprites as a distraction. Still getting this thing where left and up stop working, what the hell.

22:05 — Actually, I’m getting tired; I should deal with the cat litter before it gets too late. Please hold.

22:59 — I wrote some saving, which doesn’t work yet. Almost, maybe. I do have a pretty cool death animation, though it looks a bit wonky in-game, because animations are on a global timer. Whoops! All of them have been really simple so far, so it hasn’t mattered, but this is something that really needs to start at the beginning and play through exactly once.

23:15 — Okay! I have a save, and I have death, and I even have some sound effects for them. The animation is still off, alas (and loops forever), and there’s no way to load after you die, but the basic cycle of this kind of game is coming together. If I can get a little more engine stuff working tomorrow, I should be able to build a little game. Goodnight for now.

Saturday

07:48 — I’m. I’m up.

08:28 — Made the animation start when the player dies and stop after it’s played once. Also made the music stop immediately on death and touched up the sprites a bit. Still no loading, so death pretty much ends the game forever; that’s up next and should be easy enough. First, breakfast.

09:09 — The world is now restored after you die, and I fixed a few bugs as well. Cool beans.

09:14 — So, ah. That’s a decent start mechanically, but I need a little more concept, especially as it relates to the theme. I don’t expect this game to be particularly deep, what with its non-plot of “explore these caverns”, but I do want to explore the theme a bit. I want something that’s interesting to play, too, even if for only five minutes.

Isaac is a clever wizard. Canonically, he might be the cleverest wizard. What does his staff do?

What kind of traps would be in a place like this? Spikes, falling floors, puzzles? Monsters? Pressure plates?

What does Isaac’s staff do?

Hang on, let me approach this a much more sensible way: if I were going to explore a cavern like this, what would I want my staff to do?

09:59 — I’m still struggling with this question. I thought perhaps the cavern would only be the introductory part, and then you’d find a cool teleporter to a dusty sleek place that looked a lot more techy. I tried drawing some sleek bricks, but I can’t figure out how to get the aesthetic I want with the PICO-8’s palette. So I distracted myself by drawing some foreground tiles again. Whoops?

10:01 — I’d tweeted two GIFs of Isaac’s death while working on it, complete with joking melodramatic captions like “death has no power here”. I also lamented that I didn’t know yet what the game was about, to which someone jokingly replied that so far it seemed to be “about death”.

Aha. Maybe the power of Isaac’s staff is to create savepoints, and maybe some puzzles or items or whatever transcend death, sticking around after you respawn. I’ll work with that for a bit and see what falls out of it.

11:12 — Wow, I’ve been busy! The staff now creates savepoints, complete with a post-death menu, a sound effect, a flash (bless you, UC’s scenefader), a thought-bubble hint, and everything. It’s pretty great? And it fits perfectly: if you’re exploring a trap-laden cavern then you’d want some flavor of safety equipment with you, right? What’s safer than outright resurrection?

I can see some interesting puzzles coming out of this: you have to pick your savepoint carefully to interact with mechanisms in the right way, or you have to make sure you can kill yourself if you need to, since that’s the only way to hop back to a savepoint. And it’s a purely defensive ability, just as I wanted. And something impossibly cool and powerful but hilariously impractical seems extremely up Isaac’s alley, from what I know about them so far.

11:59 — Still busy, which is a good sign! I’ve been working on making some objects for Isaac to interact with in the world; so far I’ve focused on the runes on the wall, though I’m not quite sold on them yet. The entire game so far is that you have to make a save point, jump down a pit to use a thing that extends a bridge over the pit, then kill yourself to get back to the save point and cross the bridge. It’s very rough, but it’s finally looking like a game, which is really great to see.

12:28 — I finally got sick enough of left/up breaking that I sat down and tried every distinct action I could think of, one at a time, to figure out the cause. Turns out it was my drawing tablet, which I’d used a couple times to draw sprites? If the pen is close enough to even register as a pointer, left and up break. I know I’ve seen the tablet listed as a “joypad” in other SDL applications, so my best guess is that it’s somehow acting as an axis and confusing PICO-8? I can’t imagine why or how. Super, super weird, but at least now I know what the problem is.

14:28 — Uh, whoops. Somehow I spent two hours yelling on Twitter. I don’t know how that happened.

16:42 — Hey, what’s up. I’ve been working on music (with very mixed results) and fixing bugs. I’m still missing a lot of minor functionality — for example, resetting the room doesn’t actually clear the platforms, because resetting the map only asks actors to reset themselves, and the platforms are new actors who don’t know they should vanish. Oops.

Oh, I also have them appearing on a timer, which is cool. I want their appearance to be animated, too, but that’s tricky with the current approach of just drawing tiles directly on the map. I guess I could turn them into real actors that are always present but can appear and vanish, which would also fix the reset thing.

For now, it’s time to eat and swim, so I’ll get back to this later.

18:22 — I’m so fucked. Everything is a mess. The room still doesn’t reset correctly. The time is half up and I have almost one room so far.

I need to shift gears here: fix the bugs as quickly as I can, then focus on rooms.

20:05 — I fixed a bunch of reset bugs, but I’m getting increasingly agitated by how half-assed this engine is. It’s alright for what it is, I guess, but it clearly wasn’t designed for anything in particular, and I feel like I have to bolt features on haphazardly as I need them.

Anyway, I made progression work, kinda: when you touch the right side of the room, you move on to the next one. When you touch the right side of the final room, you win, and the game celebrates by crashing.

I made a little moving laser eye thing that kills you on contact, creating a cute puzzle where you just resurrect yourself as soon as it’s gone past you. Changed death so time keeps passing while the prompt is up, of course.

Now I have a whopping, what, three world objects? And one item you can use, the one you start with? And I’m not sure how to put these together into any more puzzles.

I made Isaac’s cloak flutter a bit while they walk. Cool.

20:31 — For lack of any better ideas, I added something I’d wanted since the beginning: Isaac’s color scheme is now chosen randomly at startup. They are a newt, you see.

21:07 — Did some cleanup and minor polishing, but still feeling blocked. Going to brainstorm with myself a bit.

What are some “ancient” mechanisms? Pressure plates; blowdarts; secret doors; hidden buttons; …?

Does Isaac get an improved resurrection ability later? Resurrect where you died? I don’t know how that would be especially useful unless you died on a moving platform, and I don’t have anything like that.

Other magical objects you find…?

Puzzle ideas? Set up a way to kill yourself so you can use it later? Currently there’s no way to interact with the world other than to add those platforms, so I don’t see how this would work. I also like “conflict” puzzles where two goals seem to depend on each other, but offhand I can’t think of anything along those lines besides the first room.

21:55 — I’ve built a third puzzle, which is just some slightly aggravating platforming, made a little less so by the ability to save your progress.

22:19 — I started on a large room marking the end of the cave sequence and the entrance to the sleek brick area. I made a few tiles and a sound effect for it, but I’m not quite sure how the puzzle will work. I want a bigger and more elaborate setup with some slight backtracking, and I want to give the player a new toy to play with, but I’m not sure what.

I’ll have to figure it out tomorrow.

Sunday

08:49 — Uggh, I’m awake. Barely. I keep sleeping for only six hours or so, which sucks.

I think I want to start out by making a title screen and some sort of ending. Even if I only have three puzzles, a front and back cover will make it look much more like an actual game.

09:57 — I made a little title screen and wrote a simple ditty for it, which I might even almost like?

11:09 — Made a credits screen as well, which implies that there’s an actual ending. And there is! You get the Flurry, an enchanted rapier I thought of a little while ago. It’s not described in the game or even mentioned outside of the “credits”, in true 8-bit fashion.

Now I have a complete game no matter what, so I can focus on hammering out some levels without worrying too much about time.

I also fixed up the ingame music; it used to have some high notes come in on a separate track, in my clumsy attempts at corralling multiple instruments, but I think they destroyed the mood. Now it’s mostly those low notes and some light “bass”. It works as a loop now, too. Much better in every way.

The awkward-platforming room had a particularly tricky jump that turned out to be trickier than I thought — I suddenly couldn’t do it at all when trying to demo the game for Mel. At their suggestion, I made it a bit less terrible, though hopefully still tricky enough that it might need a second try.

13:05 — Hi! Wow! I’ve been super busy! I came up with a new puzzle involving leaving a save point in midair while dropping down a pit. Then I finally added a new item, mostly inspired by how easy it was to implement: a spellbook that makes you float but doesn’t let you jump, so you can only move back and forth horizontally until you turn it off. I also added a thought bubble for how to cycle through the inventory, some really cute sound effects for when you use the book, and an introductory puzzle for it. It’s coming along pretty nicely!

14:13 — Trying to design a good puzzle for the next area. I made a stone door object which can open and close, though the way it does so is pretty gross, and a wooden wheel that opens it. I really like the wheel; my first thought was to use a different color lever, but I wanted the doors to be reusable whereas the platform lever isn’t, and using the same type of mechanism seemed misleading.

I might be trying to cram too much into the same room at the moment? It introduces the spellbook and the doors/wheel, then makes you solve a puzzle using both. I might split this up and try to introduce both ideas separately.

I think around 16:00, I’m gonna stop making puzzle rooms (unless I still have an amazing idea) and focus on cleaning stuff up, fixing weird bugs, and maybe un-hacking some of these hacks.

15:19 — Someone asked if I streamed my dev process, and I realized that this would’ve been a perfect opportunity to do that, since everything happens within a single small box. Oops. I guess I’ll stream the last few hours, though now no one can watch without getting all he puzzle spoiled.

I made a separate room for getting the spellbook, plus another for introducing the stone doors. The pacing is much much better, and now there are more puzzles overall, which is nice.

15:54 — My puzzles seem to be pretty solid, and I’ve only got space for one more on the map, so I’m thinking about what I’d like it to be.

I want something else that combines mechanics, like, using the platforms to block a door from closing all the way. But a door and a platform can’t coexist on the same tile, so the door has to start out partially open. And… what happens if you summon the platform after closing the door all the way? Hm. I wish my physics were more thorough, but right now none of these objects interact with each other terribly well; the stone door in particular just kinda ignores anything in its way until it hits solid wall.

16:04 — Instead of all that, I fixed the animation on the wheel (it wasn’t playing at all?), gave it a sound effect that I love, and finally added an explicit way to control draw order. The savepoint rune had been drawing over the player since the very beginning, which had been bugging me all weekend. Now the player is always on top. Very glad I had sort lying around.

16::57 — I guess I’m done? I filled that last puzzle room with an interesting timing thing that uses the lever, wheel, runes, and floating, but there are a couple different ways to go about it, and one way is 1-cycle. It bugs me a little that the original setup I wanted (repeat the platforming, then discover it won’t get you all the way to the exit and have to rethink it) doesn’t work, but, there’s no reason you’d think to do it the fastest way the first time, and I think being able to notice that adds an extra “aha”. Gotta resist the urge to railroad!

(Editor’s note: I later fixed a bug that removed the 1-cycle solution.)

I’ll call this done and let people playtest it, once I make it fit within the compressed size limit.

17:08 — God, fuck the compressed size limit. I started at 20538; I deleted all the debug and unused stuff inherited from rainblob and UC, and now I’m at 18491. The limit is 15360. God dammit. I don’t want to have to strip all the comments again.

17:39 — I ended up deleting all the comments again. Oh, well. I ran through it from start to finish once, and all seems good! The game is done and online, and all that’s left is figuring out how to put it on the LD website.

18:46 — Time is up, but this is “submission hour” and the rules allow fixing minor bugs, so I fixed a few things people have pointed out:

  • Two obvious places you could get stuck now have spikes. You can reset the room from the menu, but I’m pretty sure nobody noticed the “enter = menu” on the title screen, and a few people have thought they had to reset the entire game.

  • The last spike pit in the spellbook room required you to walk through spikes, which wasn’t what I intended and looks fatal, even though it’s not. The intention was for it to be an exact replica of the previous pit, except that you have to float across it from a tile higher; this solution now works.

  • One of those half-rock-brick tiles somehow ended up in the first room? Not sure how. It’s gone now.

  • Mel expressed annoyance at having to align a float across the wide penultimate room with no kind of hint, so I added a half-rock-brick tile to the place where you need to stand to use the high-up wheel.

Parting thoughts

I enjoyed making this! It definitely accomplished its ultimate goal of giving me more experience shaking ideas loose. Looking back over those notes, the progression is fascinating: I didn’t even know the core mechanic of resurrecting until 16 hours in (a third of the time), and it was inspired by a joke reply on Twitter. At the 41-hour mark, I still only had three and a half puzzle rooms; the final game has ten. The spellbook seriously only exists because “don’t apply gravity” was so trivial to implement, and the floating effect is something I’d already added for making the Flurry dramatically float above its platform. Half the game only exists because I decided a puzzle was too complicated and tried to split it up.

I almost can’t believe I actually churned all this out in 48 hours. I’ve pretty much never made music before, but I ended up really liking the main theme, and I adore the sound effects. The sprites are great, considering the limitations. I’d never drawn a serious sprite animation before, either, but I love Isaac’s death sequence. The cave texture is great, and a last-minute improvement over my original sprite, which looked more like scratched-up wood. I also drew a scroll sprite that I adored, but I never found an excuse to use it in the game, alas.

Almost everyone who’s played it has made it all the way through without too much trouble, but also seemed to enjoy the puzzles. I take that to mean the game has a good learning curve, which I’m really happy about.

I’m glad I already had a little engine, or I would’ve gotten nowhere.

I have some more ideas that I discarded as impractical due to time or size constraints, so I may port the game to LÖVE and try to expand on it. When I say “may”, I mean I started working on this about two hours after finishing the game.

Oh, and I’m writing a book

Right, yes, about that. I’ve been mumbling about this for ages, but I didn’t want to go on about the idea so much that actually doing it lost its appeal. I think I’ve made enough of a dent now that I’m likely to stick with it.

I’m writing a book about game development — the literal act of game development. I made a list of about a dozen free (well, except PICO-8) and cross-platform game engines spanning a wide range of ease-of-use, creative freedom, and age. I’m going to make a little game in each of them and explain what I’m doing as I do it, give background on totally new things, preserve poor choices and show how I recovered from them, say what inspired an idea or how I got past a creative roadblock, etc. The goal is to write something that someone with no experience in programming or art or storytelling can follow from beginning to end, getting at least an impression of what it looks like to create a game from scratch.

It’s kind of a response to the web’s mountains of tutorials and beginner docs that take you from “here’s what a variable is” all the way to “here’s what a function is”, then abandon you. I hate trying to get into a new thing and only finding slow, dull introductions that don’t tell me how to do anything interesting, or even show what kinds of things are possible. I hope that for anyone who learns the way I do, “here’s how I made a whole game” will be more than enough to hit the ground running.

I have part of an early chapter on MegaZeux written; I wanted to finish it by the end of August, but that’s clearly not happening, oops. I also started on a Godot chapter, which will be a little different since it’s for a game that will hopefully have multiple people working on it.

Isaac’s Descent will be the subject of a PICO-8 chapter — that’s why I took the notes! It’ll expand considerably on what I wrote above, starting with going through all the code I inherited from Under Construction (and recreating how I wrote it in the first place). I also have about 20 snapshots of the game as it developed, which I’m holding onto myself for now.

I want to put rough drafts of each chapter on the $4 Patreon tier as I finish them, so keep an eye out for that, though I don’t have any ETA just yet. I imagine MegaZeux or PICO-8 will be ready within the next couple months.

Collision Attacks Against 64-Bit Block Ciphers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/08/collision_attac.html

We’ve long known that 64 bits is too small for a block cipher these days. That’s why new block ciphers like AES have 128-bit, or larger, block sizes. The insecurity of the smaller block is nicely illustrated by a new attack called “Sweet32.” It exploits the ability to find block collisions in Internet protocols to decrypt some traffic, even through the attackers never learn the key.

Paper here. Matthew Green has a nice explanation of the attack. And some news articles. Hacker News thread.

Python FAQ: Why should I use Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/

Part of my Python FAQ, which is doomed to never be finished.

The short answer is: because it’s the actively-developed version of the language, and you should use it for the same reason you’d use 2.7 instead of 2.6.

If you’re here, I’m guessing that’s not enough. You need something to sweeten the deal. Well, friend, I have got a whole mess of sugar cubes just for you.

And once you’re convinced, you may enjoy the companion article, how to port to Python 3! It also has some more details on the diffences between Python 2 and 3, whereas this article doesn’t focus too much on the features removed in Python 3.

Some background

If you aren’t neck-deep in Python, you might be wondering what the fuss is all about, or why people keep telling you that Python 3 will set your computer on fire. (It won’t.)

Python 2 is a good language, but it comes with some considerable baggage. It has two integer types; it may or may not be built in a way that completely mangles 16/17 of the Unicode space; it has a confusing mix of lazy and eager functional tools; it has a standard library that takes “batteries included” to lengths beyond your wildest imagination; it boasts strong typing, then casually insists that None < 3 < "2"; overall, it’s just full of little dark corners containing weird throwbacks to the days of Python 1.

(If you’re really interested, Nick Coghlan has written an exhaustive treatment of the slightly different question of why Python 3 was created. This post is about why Python 3 is great, so let’s focus on that.)

Fixing these things could break existing code, whereas virtually all code written for 2.0 will still work on 2.7. So Python decided to fix them all at once, producing a not-quite-compatible new version of the language, Python 3.

Nothing like this has really happened with a mainstream programming language before, and it’s been a bit of a bumpy ride since then. Python 3 was (seemingly) designed with the assumption that everyone would just port to Python 3, drop Python 2, and that would be that. Instead, it’s turned out that most libraries want to continue to run on both Python 2 and Python 3, which was considerably difficult to make work at first. Python 2.5 was still in common use at the time, too, and it had none of the helpful backports that showed up in Python 2.6 and 2.7; likewise, Python 3.0 didn’t support u'' strings. Writing code that works on both 2.5 and 3.0 was thus a ridiculous headache.

The porting effort also had a dependency problem: if your library or app depends on library A, which depends on library B, which depends on C, which depends on D… then none of those projects can even think about porting until D’s porting effort is finished. Early days were very slow going.

Now, though, things are looking brighter. Most popular libraries work with Python 3, and those that don’t are working on it. Python 3’s Unicode handling, one of its most contentious changes, has had many of its wrinkles ironed out. Python 2.7 consists largely of backported Python 3 features, making it much simpler to target 2 and 3 with the same code — and both 2.5 and 2.6 are no longer supported.

Don’t get me wrong, Python 2 will still be around for a while. A lot of large applications have been written for Python 2 — think websites like Yelp, YouTube, Reddit, Dropbox — and porting them will take some considerable effort. I happen to know that at least one of those websites was still running 2.6 last year, years after 2.6 had been discontinued, if that tells you anything about the speed of upgrades for big lumbering software.

But if you’re just getting started in Python, or looking to start a new project, there aren’t many reasons not to use Python 3. There are still some, yes — but unless you have one specifically in mind, they probably won’t affect you.

I keep having Python beginners tell me that all they know about Python 3 is that some tutorial tried to ward them away from it for vague reasons. (Which is ridiculous, since especially for beginners, Python 2 and 3 are fundamentally not that different.) Even the #python IRC channel has a few people who react, ah, somewhat passive-aggressively towards mentions of Python 3. Most of the technical hurdles have long since been cleared; it seems like one of the biggest roadblocks now standing in the way of Python 3 adoption is the community’s desire to sabotage itself.

I think that’s a huge shame. Not many people seem to want to stand up for Python 3, either.

Well, here I am, standing up for Python 3. I write all my new code in Python 3 now — because Python 3 is great and you should use it. Here’s why.

Hang on, let’s be real for just a moment

None of this is going to 💥blow your mind💥. It’s just a programming language. I mean, the biggest change to Python 2 in the last decade was probably the addition of the with statement, which is nice, but hardly an earth-shattering innovation. The biggest changes in Python 3 are in the same vein: they should smooth out some points of confusion, help avoid common mistakes, and maybe give you a new toy to play with.

Also, if you’re writing a library that needs to stay compatible with Python 2, you won’t actually be able to use any of this stuff. Sorry. In that case, the best reason to port is so application authors can use this stuff, rather than citing your library as the reason they’re trapped on Python 2 forever. (But hey, if you’re starting a brand new library that will blow everyone’s socks off, do feel free to make it Python 3 exclusive.)

Application authors, on the other hand, can go wild.

Unicode by default

Let’s get the obvious thing out of the way.

In Python 2, there are two string types: str is a sequence of bytes (which I would argue makes it not a string), and unicode is a sequence of Unicode codepoints. A literal string in source code is a str, a bytestring. Reading from a file gives you bytestrings. Source code is assumed ASCII by default. It’s an 8-bit world.

If you happen to be an English speaker, it’s very easy to write Python 2 code that seems to work perfectly, but chokes horribly if fed anything outside of ASCII. The right thing involves carefully specifying encodings everywhere and using u'' for virtually all your literal strings, but that’s very tedious and easily forgotten.

Python 3 reshuffles this to put full Unicode support front and center.

Most obviously, the str type is a real text type, similar to Python 2’s unicode. Literal strings are still str, but now that makes them Unicode strings. All of the “structural” strings — names of types, functions, modules, etc. — are likewise Unicode strings. Accordingly, identifiers are allowed to contain any Unicode “letter” characters. repr() no longer escapes printable Unicode characters, though there’s a new ascii() (and corresponding !a format cast and %a placeholder) that does. Unicode completely pervades the language, for better or worse.

And just for the record: this is way better. It is so much better. It is incredibly better. Do you know how much non-ASCII garbage I type? Every single em dash in this damn post was typed by hand, and Python 2 would merrily choke on them.

Source files are now assumed to be UTF-8 by default, so adding an em dash in a comment will no longer break your production website. (I have seen this happen.) You’re still free to specify another encoding explicitly if you want, using a magic comment.

There is no attempted conversion between bytes and text, as in Python 2; b'a' + 'b' is a TypeError. Some modules require you to know what you’re dealing with: zlib.compress only accepts bytes, because zlib is defined in terms of bytes; json.loads only accepts str, because JSON is defined in terms of Unicode codepoints. Calling str() on some bytes will defer to repr, producing something like "b'hello'". (But see -b and -bb below.) Overall it’s pretty obvious when you’ve mixed bytes with text.

Oh, and two huge problem children are fixed: both the csv module and urllib.parse (formerly urlparse) can handle text. If you’ve never tried to make those work, trust me, this is miraculous.

I/O does its best to make everything Unicode. On Unix, this is a little hokey, since the filesystem is explicitly bytes with no defined encoding; Python will trust the various locale environment variables, which on most systems will make everything UTF-8. The default encoding of text-mode file I/O is derived the same way and thus usually UTF-8. (If it’s not what you expect, run locale and see what you get.) Files opened in binary mode, with a 'b', will still read and write bytes.

Python used to come in “narrow” and “wide” builds, where “narrow” builds actually stored Unicode as UTF-16, and this distinction could leak through to user code in subtle ways. On a narrow build, unichr(0x1F4A3) raises ValueError, and the length of u'💣' is 2. Surprise! Maybe your code will work on someone else’s machine, or maybe it won’t. Python 3.3 eliminated narrow builds.

I think those are the major points. For the most part, you should be able to write code as though encodings don’t exist, and the right thing will happen more often. And the wrong thing will immediately explode in your face. It’s good for you.

If you work with binary data a lot, you might be frowning at me at this point; it was a bit of a second-class citizen in Python 3.0. I think things have improved, though: a number of APIs support both bytes and text, the bytes-to-bytes codec issue has largely been resolved, we have bytes.hex() and bytes.fromhex(), bytes and bytearray both support % now, and so on. They’re listening!

Refs: Python 3.0 release notes; myriad mentions all over the documentation

Backported features

Python 3.0 was released shortly after Python 2.6, and a number of features were then backported to Python 2.7. You can use these if you’re only targeting Python 2.7, but if you were stuck with 2.6 for a long time, you might not have noticed them.

  • Set literals:

    1
    {1, 2, 3}
    
  • Dict and set comprehensions:

    1
    2
    {word.lower() for word in words}
    {value: key for (key, value) in dict_to_invert.items()}
    
  • Multi-with:

    1
    2
    with open("foo") as f1, open("bar") as f2:
        ...
    
  • print is now a function, with a couple bells and whistles added: you can change the delimiter with the sep argument, you can change the terminator to whatever you want (including nothing) with the end argument, and you can force a flush with the flush argument. In Python 2.6 and 2.7, you still have to opt into this with from __future__ import print_function.

  • The string representation of a float now uses the shortest decimal number that has the same underlying value — for example, repr(1.1) was '1.1000000000000001' in Python 2.6, but is just '1.1' in Python 2.7 and 3.1+, because both are represented the same way in a 64-bit float.

  • collections.OrderedDict is a dict-like type that remembers the order of its keys.

    Note that you cannot do OrderedDict(a=1, b=2), because the constructor still receives its keyword arguments in a regular dict, losing the order. You have to pass in a sequence of 2-tuples or assign keys one at a time.

  • collections.Counter is a dict-like type for counting a set of things. It has some pretty handy operations that allow it to be used like a multiset.

  • The entire argparse module is a backport from 3.2.

  • str.format learned a , formatting specifier for numbers, which always uses commas and groups of three digits. This is wrong for many countries, and the correct solution involves using the locale module, but it’s useful for quick output of large numbers.

  • re.sub, re.subn, and re.split accept a flags argument. Minor, but, thank fucking God.

Ref: Python 2.7 release notes

Iteration improvements

Everything is lazy

Python 2 has a lot of pairs of functions that do the same thing, except one is eager and one is lazy: range and xrange, map and itertools.imap, dict.keys and dict.iterkeys, and so on.

Python 3.0 eliminated all of the lazy variants and instead made the default versions lazy. Iterating over them works exactly the same way, but no longer creates an intermediate list — for example, range(1000000000) won’t eat all your RAM. If you need to index them or store them for later, you can just wrap them in list(...).

Even better, the dict methods are now “views“. You can keep them around, and they’ll reflect any changes to the underlying dict. They also act like sets, so you can do a.keys() & b.keys() to get the set of keys that exist in both dicts.

Refs: dictionary view docs; Python 3.0 release notes

Unpacking

Unpacking got a huge boost. You could always do stuff like this in Python 2:

1
a, b, c = range(3)  # a = 0, b = 1, c = 2

Python 3.0 introduces:

1
2
a, b, *c = range(5)  # a = 0, b = 1, c = [2, 3, 4]
a, *b, c = range(5)  # a = 0, b = [1, 2, 3], c = 4

Python 3.5 additionally allows use of the * and ** unpacking operators in literals, or multiple times in function calls:

1
2
3
4
5
print(*range(3), *range(3))  # 0 1 2 0 1 2

x = [*range(3), *range(3)]  # x = [0, 1, 2, 0, 1, 2]
y = {*range(3), *range(3)}  # y = {0, 1, 2}  (it's a set, remember!)
z = {**dict1, **dict2}  # finally, syntax for dict merging!

Refs: Python 3.0 release notes; PEP 3132; Python 3.5 release notes; PEP 448

yield from

yield from is an extension of yield. Where yield produces a single value, yield from yields an entire sequence.

1
2
3
4
5
def flatten(*sequences):
    for seq in sequences:
        yield from seq

list(flatten([1, 2], [3, 4]))  # [1, 2, 3, 4]

Of course, for a simple example like that, you could just do some normal yielding in a for loop. The magic of yield from is that it can also take another generator or other lazy iterable, and it’ll effectively pause the current generator until the given one has been exhausted. It also takes care of passing values back into the generator using .send() or .throw().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def foo():
    a = yield 1
    b = yield from bar(a)
    print("foo got back", b)
    yield 4

def bar(a):
    print("in bar", a)
    x = yield 2
    y = yield 3
    print("leaving bar")
    return x + y

gen = foo()
val = None
while True:
    try:
        newval = gen.send(val)
    except StopIteration:
        break
    print("yielded", newval)
    val = newval * 10

# yielded 1
# in bar 10
# yielded 2
# yielded 3
# leaving bar
# foo got back 50
# yielded 4

Oh yes, and you can now return a value from a generator. The return value becomes the result of a yield from, or if the caller isn’t using yield from, it’s available as the argument to the StopIteration exception.

A small convenience, perhaps. The real power here isn’t in the use of generators as lazy iterators, but in the use of generators as coroutines.

A coroutine is a function that can “suspend” itself, like yield does, allowing other code to run until the function is resumed. It’s kind of like an alternative to threading, but only one function is actively running at any given time, and that function has to delierately relinquish control (or end) before anything else can run.

Generators could do this already, more or less, but only one stack frame deep. That is, you can yield from a generator to suspend it, but if the generator calls another function, that other function has no way to suspend the generator. This is still useful, but significantly less powerful than the coroutine functionality in e.g. Lua, which lets any function yield anywhere in the call stack.

With yield from, you can create a whole chain of generators that yield from one another, and as soon as the one on the bottom does a regular yield, the entire chain will be suspended.

This laid the groundwork for making the asyncio module possible. I’ll get to that later.

Refs: docs; Python 3.3 release notes; PEP 380

Syntactic sugar

Keyword-only arguments

Python 3.0 introduces “keyword-only” arguments, which must be given by name. As a corollary, you can now accept a list of args and have more arguments afterwards. The full syntax now looks something like this:

1
2
def foo(a, b=None, *args, c=None, d, **kwargs):
    ...

Here, a and d are required, b and c are optional. c and d must be given by name.

1
2
3
4
5
6
7
8
foo(1)                      # TypeError: missing d
foo(1, 2)                   # TypeError: missing d
foo(d=4)                    # TypeError: missing a
foo(1, d=4)                 # a = 1, d = 4
foo(1, 2, d=4)              # a = 1, b = 2, d = 4
foo(1, 2, 3, d=4)           # a = 1, b = 2, args = (3,), d = 4
foo(1, 2, c=3, d=4)         # a = 1, b = 2, c = 3, d = 4
foo(1, b=2, c=3, d=4, e=5)  # a = 1, b = 2, c = 3, d = f, kwargs = {'e': 5}

This is extremely useful for functions with a lot of arguments, functions with boolean arguments, functions that accept *args (or may do so in the future) but also want some options, etc. I use it a lot!

If you want keyword-only arguments, but you don’t want to accept *args, you just leave off the variable name:

1
2
def foo(*, arg=None):
    ...

Refs: Python 3.0 release notes; PEP 3102

Format strings

Python 3.6 (not yet out) will finally bring us string interpolation, more or less, using the str.format() syntax:

1
2
3
a = 0x133
b = 0x352
print(f"The answer is {a + b:04x}.")

It’s pretty much the same as str.format(), except that instead of a position or name, you can give an entire expression. The formatting suffixes with : still work, the special built-in conversions like !r still work, and __format__ is still invoked.

Refs: docs; Python 3.6 release notes; PEP 498

async and friends

Right, so, about coroutines.

Python 3.4 introduced the asyncio module, which offers building blocks for asynchronous I/O (and bringing together the myriad third-party modules that do it already).

The design is based around coroutines, which are really generators using yield from. The idea, as I mentioned above, is that you can create a stack of generators that all suspend at once:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@coroutine
def foo():
    # do some stuff
    yield from bar()
    # do more stuff

@coroutine
def bar():
    # do some stuff
    response = yield from get_url("https://eev.ee/")
    # do more stuff

When this code calls get_url() (not actually a real function, but see aiohttp), get_url will send a request off into the æther, and then yield. The entire stack of generators — get_url, bar, and foo — will all suspend, and control will return to whatever first called foo, which with asyncio will be an “event loop”.

The event loop’s entire job is to notice that get_url yielded some kind of “I’m doing a network request” thing, remember it, and resume other coroutines in the meantime. (Or just twiddle its thumbs, if there’s nothing else to do.) When a response comes back, the event loop will resume get_url and send it the response. get_url will do some stuff and return it up to bar, who continues on, none the wiser that anything unusual happened.

The magic of this is that you can call get_url several times, and instead of having to wait for each request to completely finish before the next one can even start, you can do other work while you’re waiting. No threads necessary; this is all one thread, with functions cooperatively yielding control when they’re waiting on some external thing to happen.

Now, notice that you do have to use yield from each time you call another coroutine. This is nice in some ways, since it lets you see exactly when and where your function might be suspended out from under you, which can be important in some situations. There are also arguments about why this is bad, and I don’t care about them.

However, yield from is a really weird phrase to be sprinkling all over network-related code. It’s meant for use with iterables, right? Lists and tuples and things. get_url is only one thing. What are we yielding from it? Also, what’s this @coroutine decorator that doesn’t actually do anything?

Python 3.5 smoothed over this nonsense by introducing explicit syntax for these constructs, using new async and await keywords:

1
2
3
4
5
6
7
8
9
async def foo():
    # do some stuff
    await bar()
    # do more stuff

async def bar():
    # do some stuff
    response = await get_url("https://eev.ee/")
    # do more stuff

async def clearly identifies a coroutine, even one that returns immediately. (Before, you’d have a generator with no yield, which isn’t actually a generator, which causes some problems.) await explains what’s actually happening: you’re just waiting for another function to be done.

async for and async with are also available, replacing some particularly clumsy syntax you’d need to use before. And, handily, you can only use any of these things within an async def.

The new syntax comes with corresponding new special methods like __await__, whereas the previous approach required doing weird things with __iter__, which is what yield from ultimately calls.

I could fill a whole post or three with stuff about asyncio, and can’t possibly give it justice in just a few paragraphs. The short version is: there’s built-in syntax for doing network stuff in parallel without threads, and that’s cool.

Refs for asyncio: docs (asyncio); Python 3.4 release notes; PEP 3156

Refs for async and await: docs (await); docs (async); docs (special methods); Python 3.5 release notes; PEP 492

Function annotations

Function arguments and return values can have annotations:

1
2
def foo(a: "hey", b: "what's up") -> "whoa":
    ...

The annotations are accessible via the function’s __annotations__ attribute. They have no special meaning to Python, so you’re free to experiment with them.

Well…

You were free to experiment with them, but the addition of the typing module (mentioned below) has hijacked them for type hints. There’s no clear way to attach a type hint and some other value to the same argument, so you’ll have a tough time making function annotations part of your API.

There’s still no hard requirement that annotations be used exclusively for type hints (and it’s not like Python does anything with type hints, either), but the original PEP suggests it would like that to be the case someday. I guess we’ll see.

If you want to see annotations preserved for other uses as well, it would be a really good idea to do some creative and interesting things with them as soon as possible. Just saying.

Refs: docs; Python 3.0 release notes; PEP 3107

Matrix multiplication

Python 3.5 learned a new infix operator for matrix multiplication, spelled @. It doesn’t do anything for any built-in types, but it’s supported in NumPy. You can implement it yourself with the __matmul__ special method and its r and i variants.

Shh. Don’t tell anyone, but I suspect there are fairly interesting things you could do with an operator called @ — some of which have nothing to do with matrix multiplication at all!

Refs: Python 3.5 release notes; PEP 465

Ellipsis

... is now valid syntax everywhere. It evaluates to the Ellipsis singleton, which does nothing. (This exists in Python 2, too, but it’s only allowed when slicing.)

It’s not of much practical use, but you can use it to indicate an unfinished stub, in a way that’s clearly not intended to be final but will still parse and run:

1
2
3
class ReallyComplexFiddlyThing:
    # fuck it, do this later
    ...

Refs: docs; Python 3.0 release notes

Enhanced exceptions

A slightly annoying property of Python 2’s exception handling is that if you want to do your own error logging, or otherwise need to get at the traceback, you have to use the slightly funky sys.exc_info() API and carry the traceback around separately. As of Python 3.0, exceptions automatically have a __traceback__ attribute, as well as a .with_traceback() method that sets the traceback and returns the exception itself (so you can use it inline).

This makes some APIs a little silly — __exit__ still accepts the exception type and value and traceback, even though all three are readily available from just the exception object itself.

A much more annoying property of Python 2’s exception handling was that custom exception handling would lose track of where the problem actually occurred. Consider the following call stack.

1
2
3
4
5
A
B
C
D
E

Now say an exception happens in E, and it’s caught by code like this in C.

1
2
3
4
try:
    D()
except Exception as e:
    raise CustomError("Failed to call D")

Because this creates and raises a new exception, the traceback will start from this point and not even mention E. The best workaround for this involves manually creating a traceback between C and E, formatting it as a string, and then including that in the error message. Preposterous.

Python 3.0 introduced exception chaining, which allows you to do this:

1
raise CustomError("Failed to call D") from e

Now, if this exception reaches the top level, Python will format it as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Traceback (most recent call last):
File C, blah blah
File D, blah blah
File E, blah blah
SomeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File A, blah blah
File B, blah blah
File C, blah blah
CustomError: Failed to call D

The best part is that you don’t need to explicitly say from e at all — if you do a plain raise while there’s already an active exception, Python will automatically chain them together. Even internal Python exceptions will have this behavior, so a broken exception handler won’t lose the original exception. (In the implicit case, the intermediate text becomes “During handling of the above exception, another exception occurred:”.)

The chained exception is stored on the new exception as either __cause__ (if from an explicit raise ... from) or __context__ (if automatic).

If you direly need to hide the original exception, Python 3.3 introduced raise ... from None.

Speaking of exceptions, the error messages for missing arguments have been improved. Python 2 does this:

1
TypeError: foo() takes exactly 1 argument (0 given)

Python 3 does this:

1
TypeError: foo() missing 1 required positional argument: 'a'

Refs:

Cooler classes

super() with no arguments

You can call super() with no arguments. It Just Works. Hallelujah.

Also, you can call super() with no arguments. That’s so great that I could probably just fill the rest of this article with it and be satisfied.

Did I mention you can call super() with no arguments?

Refs: docs; Python 3.0 release notes; PEP 3135

New metaclass syntax and kwargs for classes

Compared to that, everything else in this section is going to sound really weird and obscure.

For example, __metaclass__ is gone. It’s now a keyword-only argument to the class statement.

1
2
class Foo(metaclass=FooMeta):
    ...

That doesn’t sound like much, right? Just some needless syntax change that makes porting harder, right?? Right??? Haha nope watch this because it’s amazing but it barely gets any mention at all.

1
2
class Foo(metaclass=FooMeta, a=1, b=2, c=3):
    ...

You can include arbitrary keyword arguments in the class statement, and they will be passed along to the metaclass call as keyword arguments. (You have to catch them in both __new__ and __init__, since they always get the same arguments.) (Also, the class statement now has the general syntax of a function call, so you can put *args and **kwargs in it.)

This is pretty slick. Consider SQLAlchemy, which uses a metaclass to let you declare a table with a class.

1
2
3
4
class SomeTable(TableBase):
    __tablename__ = 'some_table'
    id = Column()
    ...

Note that SQLAlchemy has you put the name of the table in the clumsy __tablename__ attribute, which it invented. Why not just name? Well, because then you couldn’t declare a column called name! Any “declarative” metaclass will have the same problem of separating the actual class contents from configuration. Keyword arguments offer an easy way out.

1
2
3
4
# only hypothetical, alas
class SomeTable(TableBase, name='some_table'):
    id = Column()
    ...

Refs: docs; Python 3.0 release notes; PEP 3115

__prepare__

Another new metaclass feature is the introduction of the __prepare__ method.

You may have noticed that the body of a class is just a regular block, which can contain whatever code you want. Before decorators were a thing, you’d actually declare class methods in two stages:

1
2
3
4
class Foo:
    def do_the_thing(cls):
        ...
    do_the_thing = classmethod(do_the_thing)

That’s not magical class-only syntax; that’s just regular code assigning to a variable. You can put ifs and fors and whiles and dels inside a class body, too; you just don’t see it very often because there aren’t very many useful reasons to do it.

A class body is a kind of weird pseudo-scope. It can create locals, and it can read values from outer scopes, but methods don’t see the class body as an outer scope. Once the class body reaches its end, any remaining locals are passed to the type constructor and become the new class’s attributes. (This is why, for example, you can’t refer to a class directly within its own body — the class doesn’t and can’t exist until after the body has executed.)

All of this is to say: __prepare__ is a new hook that returns the dict the class body’s locals go into.

Maybe that doesn’t sound particularly interesting, but consider: the value you return doesn’t have to be an actual dict. It can be anything that understands __setitem__. You could, say, use an OrderedDict, and keep track of the order your attributes were declared. That’s useful for declarative metaclasses, where the order of attributes may be important (consider a C struct).

But you can go further. You might allow more than one attribute of the same name. You might do something special with the attributes as soon as they’re assigned, rather than at the end of the body. You might predeclare some attributes. __prepare__ is passed the class’s kwargs, so you might alter the behavior based on those.

For a nice practical example, consider the new enum module, which I briefly mention later on. One drawback of this module is that you have to specify a value for every variant, since variants are defined as class attributes, which must have a value. There’s an example of automatic numbering, but it still requires assigning a dummy value like (). Clever use of __prepare__ would allow lifting this restriction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# XXX: Should prefer MutableMapping here, but the ultimate call to type()
# raises a TypeError if you pass a namespace object that doesn't inherit
# from dict!  Boo.
class EnumLocals(dict):
    def __init__(self):
        self.nextval = 1

    def __getitem__(self, key):
        if key not in self and not key.startswith('_') and not key.endswith('_'):
            self[key] = self.nextval
            self.nextval += 1
        return super().__getitem__(key)

class EnumMeta(type):
    @classmethod
    def __prepare__(meta, name, bases):
        return EnumLocals()

class Colors(metaclass=EnumMeta):
    red
    green
    blue

print(Colors.red, Colors.green, Colors.blue)
# 1 2 3

Deciding whether this is a good idea is left as an exercise.

This is an exceptionally obscure feature that gets very little attention — it’s not even mentioned explicitly in the 3.0 release notes — but there’s nothing else like it in the language. Between __prepare__ and keyword arguments, the class statement has transformed into a much more powerful and general tool for creating all kinds of objects. I almost wish it weren’t still called class.

Refs: docs; Python 3.0 release notes; PEP 3115

Attribute definition order

If that’s still too much work, don’t worry: a proposal was just accepted for Python 3.6 that makes this even easier. Now every class will have a __definition_order__ attribute, a tuple listing the names of all the attributes assigned within the class body, in order. (To make this possible, the default return value of __prepare__ will become an OrderedDict, but the __dict__ attribute will remain a regular dict.)

Now you don’t have to do anything at all: you can always check to see what order any class’s attributes were defined in.


Additionally, descriptors can now implement a __set_name__ method. When a class is created, any descriptor implementing the method will have it called with the containing class and the name of the descriptor.

I’m very excited about this, but let me try to back up. A descriptor is a special Python object that can be used to customize how a particular class attribute works. The built-in property decorator is a descriptor.

1
2
3
4
5
6
class MyClass:
    foo = SomeDescriptor()

c = MyClass()
c.foo = 5  # calls SomeDescriptor.__set__!
print(c.foo)  # calls SomeDescriptor.__get__!

This is super cool and can be used for all sorts of DSL-like shenanigans.

Now, most descriptors ultimately want to store a value somewhere, and the obvious place to do that is in the object’s __dict__. Above, SomeDescriptor might want to store its value in c.__dict__['foo'], which is fine since Python will still consult the descriptor first. If that weren’t fine, it could also use the key '_foo', or whatever. It probably wants to use its own name somehow, because otherwise… what would happen if you had two SomeDescriptors in the same class?

Therein lies the problem, and one of my long-running and extremely minor frustrations with Python. Descriptors have no way to know their own name! There are only really two solutions to this:

  1. Require the user to pass the name in as an argument, too: foo = SomeDescriptor('foo'). Blech!

  2. Also have a metaclass (or decorator, or whatever), which can iterate over all the class’s attributes, look for SomeDescriptor objects, and tell them what their names are. Needing a metaclass means you can’t make general-purpose descriptors meant for use in arbitrary classes; a decorator would work, but boy is that clumsy.

Both of these suck and really detract from what could otherwise be very neat-looking syntax trickery.

But now! Now, when MyClass is created, Python will have a look through its attributes. If it sees that the foo object has a __set_name__ method, it’ll call that method automatically, passing it both the owning class and the name 'foo'! Huzzah!

This is so great I am so happy you have no idea.


Lastly, there’s now an __init_subclass__ class method, which is called when the class is subclassed. A great many metaclasses exist just to do a little bit of work for each new subclass; now, you don’t need a metaclass at all in many simple cases. You want a plugin registry? No problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Plugin:
    _known_plugins = {}

    def __init_subclass__(cls, *, name, **kwargs):
        cls._known_plugins[name] = cls
        super().__init_subclass__(**kwargs)

    @classmethod
    def get_plugin(cls, name):
        return cls._known_plugins[name]

    # ...probably some interface stuff...

class FooPlugin(Plugin, name="foo"):
    ...

No metaclass needed at all.

Again, none of this stuff is available yet, but it’s all slated for Python 3.6, due out in mid-December. I am super pumped.

Refs: docs (customizing class creation); docs (descriptors); Python 3.6 release notes; PEP 520 (attribute definition order); PEP 487 (__init_subclass__ and __set_name__)

Math stuff

int and long have been merged, and there is no longer any useful distinction between small and very large integers. I’ve actually run into code that breaks if you give it 1 instead of 1L, so, good riddance. (Python 3.0 release notes; PEP 237)

The / operator always does “true” division, i.e., gives you a float. If you want floor division, use //. Accordingly, the __div__ magic method is gone; it’s split into two parts, __truediv__ and __floordiv__. (Python 3.0 release notes; PEP 238)

decimal.Decimal, fractions.Fraction, and floats now interoperate a little more nicely: numbers of different types hash to the same value; all three types can be compared with one another; and most notably, the Decimal and Fraction constructors can accept floats directly. (docs (decimal); docs (fractions); Python 3.2 release notes)

math.gcd returns the greatest common divisor of two integers. This existed before, but was in the fractions module, where nobody knew about it. (docs; Python 3.5 release notes)

math.inf is the floating-point infinity value. Previously, this was only available by writing float('inf'). There’s also a math.nan, but let’s not? (docs; Python 3.5 release notes)

math.isclose (and the corresponding complex version, cmath.isclose) determines whether two values are “close enough”. Intended to do the right thing when comparing floats. (docs; Python 3.5 release notes; PEP 485)

More modules

The standard library has seen quite a few improvements. In fact, Python 3.2 was developed with an explicit syntax freeze, so it consists almost entirely of standard library enhancements. There are far more changes across six and a half versions than I can possibly list here; these are the ones that stood out to me.

The module shuffle

Python 2, rather inexplicably, had a number of top-level modules that were named after the single class they contained, CamelCase and all. StringIO and SimpleHTTPServer are two obvious examples. In Python 3, the StringIO class lives in io (along with BytesIO), and SimpleHTTPServer has been renamed to http.server. If you’re anything like me, you’ll find this deeply satisfying.

Wait, wait, there’s a practical upside here. Python 2 had several pairs of modules that did the same thing with the same API, but one was pure Python and one was much faster C: pickle/cPickle, profile/cProfile, and StringIO/cStringIO. I’ve seen code (cough, older versions of Babel, cough) that spent a considerable amount of its startup time reading pickles with the pure Python version, because it did the obvious thing and used the pickle module. Now, these pairs have been merged: importing pickle gives you the faster C implementation, importing profile gives you the faster C implementation, and BytesIO/StringIO are the fast C implementations in the io module.

Refs: docs (sort of); Python 3.0 release notes; PEP 3108 (exhaustive list of removed and renamed modules)

Additions to existing modules

A number of file format modules, like bz2 and gzip, went through some cleanup and modernization in 3.2 through 3.4: some learned a more straightforward open function, some gained better support for the bytes/text split, and several learned to use their file types as context managers (i.e., with with).

collections.ChainMap is a mapping type that consults some number of underlying mappings in order, allowing for a “dict with defaults” without having to merge them together. (docs; Python 3.3 release notes)

configparser dropped its ridiculous distinction between ConfigParser and SafeConfigParser; there is now only ConfigParser, which is safe. The parsed data now preserves order by default and can be read or written using normal mapping syntax. Also there’s a fancier alternative interpolation parser. (docs; Python 3.2 release notes)

contextlib.ContextDecorator is some sort of devilry that allows writing a context manager which can also be used as a decorator. It’s used to implement the @contextmanager decorator, so those can be used as decorators as well. (docs; Python 3.2 release notes)

contextlib.ExitStack offers cleaner and more fine-grained handling of multiple context managers, as well as resources that don’t have their own context manager support. (docs; Python 3.3 release notes)

contextlib.suppress is a context manager that quietly swallows a given type of exception. (docs; Python 3.4 release notes)

contextlib.redirect_stdout is a context manager that replaces sys.stdout for the duration of a block. (docs; Python 3.4 release notes)

datetime.timedelta already existed, of course, but now it supports being multiplied and divided by numbers or divided by other timedeltas. The upshot of this is that timedelta finally, finally has a .total_seconds() method which does exactly what it says on the tin. (docs; Python 3.2 release notes)

datetime.timezone is a new concrete type that can represent fixed offsets from UTC. There has long been a datetime.tzinfo, but it was a useless interface, and you were left to write your own actual class yourself. datetime.timezone.utc is a pre-existing instance that represents UTC, an offset of zero. (docs; Python 3.2 release notes)

functools.lru_cache is a decorator that caches the results of a function, keyed on the arguments. It also offers cache usage statistics and a method for emptying the cache. (docs; Python 3.2 release notes)

functools.partialmethod is like functools.partial, but the resulting object can be used as a descriptor (read: method). (docs; Python 3.4 release notes)

functools.singledispatch allows function overloading, based on the type of the first argument. (docs; Python 3.4 release notes; PEP 443)

functools.total_ordering is a class decorator that allows you to define only __eq__ and __lt__ (or any other) and defines the other comparison methods in terms of them. Note that since Python 3.0, __ne__ is automatically the inverse of __eq__ and doesn’t need defining. Note also that total_ordering doesn’t correctly support NotImplemented until Python 3.4. For an even easier way to do this, consider my classtools.keyed_ordering decorator. (docs; Python 3.2 release notes)

inspect.getattr_static fetches an attribute like getattr but avoids triggering dynamic lookup like @property. (docs; Python 3.2 release notes)

inspect.signature fetches the signature of a function as the new and more featureful Signature object. It also knows to follow the __wrapped__ attribute set by functools.wraps since Python 3.2, so it can see through well-behaved wrapper functions to the “original” signature. (docs; Python 3.3 release notes; PEP 362)

The logging module can, finally, use str.format-style string formatting by passing style='{' to Formatter. (docs; Python 3.2 release notes)

The logging module spits warnings and higher to stderr if logging hasn’t been otherwise configured. This means that if your app doesn’t use logging, but it uses a library that does, you’ll get actual output rather than the completely useless “No handlers could be found for logger ‘foo’”. (docs; Python 3.2 release notes)

os.scandir lists the contents of a directory while avoiding stat calls as much as possible, making it significantly faster. (docs; Python 3.5 release notes; PEP 471)

re.fullmatch checks for a match against the entire input string, not just a substring. (docs; Python 3.4 release notes)

reprlib.recursive_repr is a decorator for __repr__ implementations that can detect recursive calls to the same object and replace them with ..., just like the built-in structures. Believe it or not, reprlib is an existing module, though in Python 2 it was called repr. (docs; Python 3.2 release notes)

shutil.disk_usage returns disk space statistics for a given path with no fuss. (docs; Python 3.3 release notes)

shutil.get_terminal_size tries very hard to detect the size of the terminal window. (docs; Python 3.3 release notes)

subprocess.run is a new streamlined function that consolidates several other helpers in the subprocess module. It returns an object that describes the final state of the process, and it accepts arguments for a timeout, requiring that the process return success, and passing data as stdin. This is now the recommended way to run a single subprocess. (docs; Python 3.5 release notes)

tempfile.TemporaryDirectory is a context manager that creates a temporary directory, then destroys it and its contents at the end of the block. (docs; Python 3.2 release notes)

textwrap.indent can add an arbitrary prefix to every line in a string. (docs; Python 3.3 release notes)

time.monotonic returns the value of a monotonic clock — i.e., it will never go backwards. You should use this for measuring time durations within your program; using time.time() will produce garbage results if the system clock changes due to DST, a leap second, NTP, manual intervention, etc. (docs; Python 3.3 release notes; PEP 418)

time.perf_counter returns the value of the highest-resolution clock available, but is only suitable for measuring a short duration. (docs; Python 3.3 release notes; PEP 418)

time.process_time returns the total system and user CPU time for the process, excluding sleep. Note that the starting time is undefined, so only durations are meaningful. (docs; Python 3.3 release notes; PEP 418)

traceback.walk_stack and traceback.walk_tb are small helper functions that walk back along a stack or traceback, so you can use simple iteration rather than the slightly clumsier linked-list approach. (docs; Python 3.5 release notes)

types.MappingProxyType offers a read-only proxy to a dict. Since it holds a reference to the dict in C, you can return MappingProxyType(some_dict) to effectively create a read-only dict, as the original dict will be inaccessible from Python code. This is the same type used for the __dict__ of an immutable object. Note that this has existed in various forms for a while, but wasn’t publicly exposed or documented; see my module dictproxyhack for something that does its best to work on every Python version. (docs; Python 3.3 release notes)

types.SimpleNamespace is a blank type for sticking arbitrary unstructed attributes to. Previously, you would have to make a dummy subclass of object to do this. (docs; Python 3.3 release notes)

weakref.finalize allows you to add a finalizer function to an arbitrary (weakrefable) object from the “outside”, without needing to add a __del__. The finalize object will keep itself alive, so there’s no need to hold onto it. (docs; Python 3.4 release notes)

New modules with backports

These are less exciting, since they have backports on PyPI that work in Python 2 just as well. But they came from Python 3 development, so I credit Python 3 for them, just like I credit NASA for inventing the microwave.

asyncio is covered above, but it’s been backported as trollius for 2.6+, with the caveat that Pythons before 3.3 don’t have yield from and you have to use yield From(...) as a workaround. That caveat means that third-party asyncio libraries will almost certainly not work with trollius! For this and other reasons, the maintainer is no longer supporting it. Alas. Guess you’ll have to upgrade to Python 3, then.

enum finally provides an enumeration type, something which has long been desired in Python and solved in myriad ad-hoc ways. The variants become instances of a class, can be compared by identity, can be converted between names and values (but only explicitly), can have custom methods, and can implement special methods as usual. There’s even an IntEnum base class whose values end up as subclasses of int (!), making them perfectly compatible with code expecting integer constants. Enums have a surprising amount of power, far more than any approach I’ve seen before; I heartily recommend that you skim the examples in the documentation. Backported as enum34 for 2.4+. (docs; Python 3.4 release notes; PEP 435)

ipaddress offers types for representing IPv4 and IPv6 addresses and subnets. They can convert between several representations, perform a few set-like operations on subnets, identify special addresses, and so on. Backported as ipaddress for 2.6+. (There’s also a py2-ipaddress, but its handling of bytestrings differs from Python 3’s built-in module, which is likely to cause confusing compatibility problems.) (docs; Python 3.3 release notes; PEP 3144)

pathlib provides the Path type, representing a filesystem path that you can manipulate with methods rather than the mountain of functions in os.path. It also overloads / so you can do path / 'file.txt', which is kind of cool. PEP 519 intends to further improve interoperability of Paths with classic functions for the not-yet-released Python 3.6. Backported as pathlib2 for 2.6+; there’s also a pathlib, but it’s no longer maintained, and I don’t know what happened there. (docs; Python 3.4 release notes; PEP 428)

selectors (created as part of the work on asyncio) attempts to wrap select in a high-level interface that doesn’t make you want to claw your eyes out. A noble pursuit. Backported as selectors34 for 2.6+. (docs; Python 3.4 release notes)

statistics contains a number of high-precision statistical functions. Backported as backports.statistics for 2.6+. (docs; Python 3.4 release notes; PEP 450)

unittest.mock provides multiple ways for creating dummy objects, temporarily (with a context manager or decorator) replacing an object or some of its attributes, and verifying that some sequence of operations was performed on a dummy object. I’m not a huge fan of mocking so much that your tests end up mostly testing that your source code hasn’t changed, but if you have to deal with external resources or global state, some light use of unittest.mock can be very handy — even if you aren’t using the rest of unittest. Backported as mock for 2.6+. (docs; Python 3.3, but no release notes)

New modules without backports

Perhaps more exciting because they’re Python 3 exclusive! Perhaps less exciting because they’re necessarily related to plumbing.

faulthandler

faulthandler is a debugging aid that can dump a Python traceback during a segfault or other fatal signal. It can also be made to hook on an arbitrary signal, and can intervene even when Python code is deadlocked. You can use the default behavior with no effort by passing -X faulthandler on the command line, by setting the PYTHONFAULTHANDLER environment variable, or by using the module API manually.

I think -X itself is new as of Python 3.2, though it’s not mentioned in the release notes. It’s reserved for implementation-specific options; there are a few others defined for CPython, and the options can be retrieved from Python code via sys._xoptions.

Refs: docs; Python 3.3 release notes

importlib

importlib is the culmination of a whole lot of work, performed in multiple phases across numerous Python releases, to extend, formalize, and cleanly reimplement the entire import process.

I can’t possibly describe everything the import system can do and what Python versions support what parts of it. Suffice to say, it can do a lot of things: Python has built-in support for importing from zip files, and I’ve seen third-party import hooks that allow transparently importing modules written in another programming language.

If you want to mess around with writing your own custom importer, importlib has a ton of tools for helping you do that. It’s possible in Python 2, too, using the imp module, but that’s a lot rougher around the edges.

If not, the main thing of interest is the import_module function, which imports a module by name without all the really weird semantics of __import__. Seriously, don’t use __import__. It’s so weird. It probably doesn’t do what you think. importlib.import_module even exists in Python 2.7.

Refs: docs; Python 3.3 release notes; PEP 302?

tracemalloc

tracemalloc is another debugging aid which tracks Python’s memory allocations. It can also compare two snapshots, showing how much memory has been allocated or released between two points in time, and who was responsible. If you have rampant memory use issues, this is probably more helpful than having Python check its own RSS.

Technically, tracemalloc can be used with Python 2.7… but that involves patching and recompiling Python, so I hesitate to call it a backport. Still, if you really need it, give it a whirl.

Refs: docs; Python 3.4 release notes; PEP 454

typing

typing offers a standard way to declare type hints — the expected types of arguments and return values. Type hints are given using the function annotation syntax.

Python itself doesn’t do anything with the annotations, though they’re accessible and inspectable at runtime. An external tool like mypy can perform static type checking ahead of time, using these standard types. mypy is an existing project that predates typing (and works with Python 2), but the previous syntax relied on magic comments; typing formalizes the constructs and puts them in the standard library.

I haven’t actually used either the type hints or mypy myself, so I can’t comment on how helpful or intrusive they are. Give them a shot if they sound useful to you.

Refs: docs; Python 3.5 release notes; PEP 484

venv and ensurepip

I mean, yes, of course, virtualenv and pip are readily available in Python 2. The whole point of these is that they are bundled with Python, so you always have them at your fingertips and never have to worry about installing them yourself.

Installing Python should now give you pipX and pipX.Y commands automatically, corresponding to the latest stable release of pip when that Python version was first released. You’ll also get pyvenv, which is effectively just virtualenv.

There’s also a module interface: python -m ensurepip will install pip (hopefully not necessary), python -m pip runs pip with a specific Python version (a feature of pip and not new to the bundling), and python -m venv runs the bundled copy of virtualenv with a specific Python version.

There was a time where these were completely broken on Debian, because Debian strongly opposes vendoring (the rationale being that it’s easiest to push out updates if there’s only one copy of a library in the Debian package repository), so they just deleted ensurepip and venv? Which completely defeated the point of having them in the first place? I think this has been fixed by now, but it might still bite you if you’re on the Ubuntu 14.04 LTS.

Refs: ensurepip docs; pyvenv docs; Python 3.4 release notes; PEP 453

zipapp

zipapp makes it easy to create executable zip applications, which have been a thing since 2.6 but have languished in obscurity. Well, no longer.

This wasn’t particularly difficult before: you just zip up some code, make sure there’s a __main__.py in the root, and pass it to Python. Optionally, you can set it executable and add a shebang line, since the ZIP format ignores any leading junk in the file. That’s basically all zipapp does. (It does not magically infer your dependencies and bundle them as well; you’re on your own there.)

I can’t find a backport, which is a little odd, since I don’t think this module does anything too special.

Refs: docs; Python 3.5 release notes; PEP 441

Miscellaneous nice enhancements

There were a lot of improvements to language semantics that don’t fit anywhere else above, but make me a little happier.

The interactive interpreter does tab-completion by default. I say “by default” because I’ve been told that it was supported before, but you had to do some kind of goat blood sacrifice to get it to work. Also, command history persists between runs. (docs; Python 3.4 release notes)

The -b command-line option produces a warning when calling str() on a bytes or bytearray, or when comparing text to bytes. -bb produces an error. (docs)

The -I command-like option runs Python in “isolated mode”: it ignores all PYTHON* environment variables and leaves the current directory and user site-packages directories off of sys.path. The idea is to use this when running a system script (or in the shebang line of a system script) to insulate it from any weird user-specific stuff. (docs; Python 3.4 release notes)

Functions and classes learned a __qualname__ attribute, which is a dotted name describing (lexically) where they were defined. For example, a method’s __name__ might be foo, but its __qualname__ would be something like SomeClass.foo. Similarly, a class or function defined within another function will list that containing function in its __qualname__. (docs; Python 3.3 release notes; PEP 3155)

Generators signal their end by raising StopIteration internally, but it was also possible to raise StopIteration directly within a generator — most notably, when calling next() on an exhausted iterator. This would cause the generator to end prematurely and silently. Now, raising StopIteration inside a generator will produce a warning, which will become a RuntimeError in Python 3.7. You can opt into the fatal behavior early with from __future__ import generator_stop. (Python 3.5 release notes; PEP 479)

Implicit namespace packages allow a package to span multiple directories. The most common example is a plugin system, foo.plugins.*, where plugins may come from multiple libraries, but all want to share the foo.plugins namespace. Previously, they would collide, and some sys.path tricks were necessary to make it work; now, support is built in. (This feature also allows you to have a regular package without an __init__.py, but I’d strongly recommend still having one.) (Python 3.3 release notes; PEP 420)

Object finalization behaves in less quirky ways when destroying an isolated reference cycle. Also, modules no longer have their contents changed to None during shutdown, which fixes a long-running type of error when a __del__ method tries to call, say, os.path.join() — if you were unlucky, os.path would have already have had its contents replaced with Nones, and you’d get an extremely confusing TypeError from trying to call a standard library function. (Python 3.4 release notes; PEP 442)

str.format_map is like str.format, but it accepts a mapping object directly (instead of having to flatten it with **kwargs). This allows some fancy things that weren’t previously possible, like passing a fake map that creates values on the fly based on the keys looked up in it. (docs; Python 3.2 release notes)

When a blocking system call is interrupted by a signal, it returns EINTR, indicating that the calling code should try the same system call again. In Python, this becomes OSError or InterruptedError. I have never in my life seen any C or Python code that actually deals with this correctly. Now, Python will do it for you: all the built-in and standard library functions that make use of system calls will automatically retry themselves when interrupted. (Python 3.5 release notes; PEP 475)

File descriptors created by Python code are now flagged “non-inheritable”, meaning they’re closed automatically when spawning a child process. (docs; Python 3.4 release notes; PEP 446)

A number of standard library functions now accept file descriptors in addition to paths. (docs; Python 3.3 release notes)

Several different OS and I/O exceptions were merged into a single and more fine-grained hierarchy, rooted at OSError. Code can now catch a specific subclass in most cases, rather than examine .errno. (docs; Python 3.3 release notes; PEP 3151)

ResourceWarning is a new kind of warning for issues with resource cleanup. One is produced if a file object is destroyed, but was never closed, which can cause issues on Windows or with garbage-collected Python implementations like PyPy; one is also produced if uncollectable objects still remain when Python shuts down, indicating some severe finalization problems. The warning is ignored by default, but can be enabled with -W default on the command line. (Python 3.2 release notes)

hasattr() only catches (and returns False for) AttributeErrors. Previously, any exception would be considered a sign that the attribute doesn’t exist, even though an unusual exception like an OSError usually means the attribute is computed dynamically, and that code is broken somehow. Now, exceptions other than AttributeError are allowed to propagate to the caller. (docs; Python 3.2 release notes)

Hash randomization is on by default, meaning that dict and set iteration order is different per Python runs. This protects against some DoS attacks, but more importantly, it spitefully forces you not to rely on incidental ordering. (docs; Python 3.3 release notes)

List comprehensions no longer leak their loop variables into the enclosing scope. (Python 3.0 release notes)

nonlocal allows writing to a variable in an enclosing (but non-global) scope. (docs; Python 3.0 release notes; PEP 3104)

Comparing objects of incompatible types now produces a TypeError, rather than using Python 2’s very silly fallback. (Python 3.0 release notes)

!= defaults to returning the opposite of ==. (Python 3.0 release notes)

Accessing a method as a class attribute now gives you a regular function, not an “unbound method” object. (Python 3.0 release notes)

The input builtin no longer performs an eval (!), removing a huge point of confusion for beginners. This is the behavior of raw_input in Python 2. (docs; Python 3.0 release notes; PEP 3111)

Fast and furious

These aren’t necessarily compelling, and they may not even make any appreciable difference for your code, but I think they’re interesting technically.

Objects’ __dict__s can now share their key storage internally. Instances of the same type generally have the same attribute names, so this provides a modest improvement in speed and memory usage for programs that create a lot of user-defined objects. (Python 3.3 release notes; PEP 412)

OrderedDict is now implemented in C, making it “4 to 100” (!) times faster. Note that the backport in the 2.7 standard library is pure Python. So, there’s a carrot. (Python 3.5 release notes)

The GIL was made more predictable. My understanding is that the old behavior was to yield after some number of Python bytecode operations, which could take wildly varying amounts of time; the new behavior yields after a given duration, by default 5ms. (Python 3.2 release notes)

The io library was rewritten in C, making it more fast. Again, the Python 2.7 implementation is pure Python. (Python 3.1 release notes)

Tuples and dicts containing only immutable objects — i.e., objects that cannot possibly contain circular references — are ignored by the garbage collector. This was backported to Python 2.7, too, but I thought it was super interesting. (Python 3.1 release notes)

That’s all I’ve got

Huff, puff.

I hope something here appeals to you as a reason to at least experiment with Python 3. It’s fun over here. Give it a try.

Some stuff about color

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/16/some-stuff-about-color/

I’ve been trying to paint more lately, which means I have to actually think about color. Like an artist, I mean. I’m okay at thinking about color as a huge nerd, but I’m still figuring out how to adapt that.

While I work on that, here is some stuff about color from the huge nerd perspective, which may or may not be useful or correct.

Hue

Hues are what we usually think of as “colors”, independent of how light or dim or pale they are: general categories like purple and orange and green.

Strictly speaking, a hue is a specific wavelength of light. I think it’s really weird to think about light as coming in a bunch of wavelengths, so I try not to think about the precise physical mechanism too much. Instead, here’s a rainbow.

rainbow spectrum

These are all the hues the human eye can see. (Well, the ones this image and its colorspace and your screen can express, anyway.) They form a nice spectrum, which wraps around so the two red ends touch.

(And here is the first weird implication of the physical interpretation: purple is not a real color, in the sense that there is no single wavelength of light that we see as purple. The actual spectrum runs from red to blue; when we see red and blue simultaneously, we interpret it as purple.)

The spectrum is divided by three sharp lines: yellow, cyan, and magenta. The areas between those lines are largely dominated by red, green, and blue. These are the two sets of primary colors, those hues from which any others can be mixed.

Red, green, and blue (RGB) make up the additive primary colors, so named because they add light on top of black. LCD screens work exactly this way: each pixel is made up of three small red, green, and blue rectangles. It’s also how the human eye works, which is fascinating but again a bit too physical.

Cyan, magenta, and yellow are the subtractive primary colors, which subtract light from white. This is how ink, paint, and other materials work. When you look at an object, you’re seeing the colors it reflects, which are the colors it doesn’t absorb. A red ink reflects red light, which means it absorbs green and blue light. Cyan ink only absorbs red, and yellow ink only absorbs blue; if you mix them, you’ll get ink that absorbs both red and blue green, and thus will appear green. A pure black is often included to make CMYK; mixing all three colors would technically get you black, but it might be a bit muddy and would definitely use three times as much ink.

The great kindergarten lie

Okay, you probably knew all that. What confused me for the longest time was how no one ever mentioned the glaring contradiction with what every kid is taught in grade school art class: that the primary colors are red, blue, and yellow. Where did those come from, and where did they go?

I don’t have a canonical answer for that, but it does make some sense. Here’s a comparison: the first spectrum is a full rainbow, just like the one above. The second is the spectrum you get if you use red, blue, and yellow as primary colors.

a full spectrum of hues, labeled with color names that are roughly evenly distributed
a spectrum of hues made from red, blue, and yellow

The color names come from xkcd’s color survey, which asked a massive number of visitors to give freeform names to a variety of colors. One of the results was a map of names for all the fully-saturated colors, providing a rough consensus for how English speakers refer to them.

The first wheel is what you get if you start with red, green, and blue — but since we’re talking about art class here, it’s really what you get if you start with cyan, magenta, and yellow. The color names are spaced fairly evenly, save for blue and green, which almost entirely consume the bottom half.

The second wheel is what you get if you start with red, blue, and yellow. Red has replaced magenta, and blue has replaced cyan, so neither color appears on the wheel — red and blue are composites in the subtractive model, and you can’t make primary colors like cyan or magenta out of composite colors.

Look what this has done to the distribution of names. Pink and purple have shrunk considerably. Green is half its original size and somewhat duller. Red, orange, and yellow now consume a full half of the wheel.

There’s a really obvious advantage here, if you’re a painter: people are orange.

Yes, yes, we subdivide orange into a lot of more specific colors like “peach” and “brown”, but peach is just pale orange, and brown is just dark orange. Everyone, of every race, is approximately orange. Sunburn makes you redder; fear and sickness make you yellower.

People really like to paint other people, so it makes perfect sense to choose primary colors that easily mix to make people colors.

Meanwhile, cyan and magenta? When will you ever use those? Nothing in nature remotely resembles either of those colors. The true color wheel is incredibly, unnaturally bright. The reduced color wheel is much more subdued, with only one color that stands out as bright: yellow, the color of sunlight.

You may have noticed that I even cheated a little bit. The blue in the second wheel isn’t the same as the blue from the first wheel; it’s halfway between cyan and blue, a tertiary color I like to call azure. True pure blue is just as unnatural as true cyan; azure is closer to the color of the sky, which is reflected as the color of water.

People are orange. Sunlight is yellow. Dirt and rocks and wood are orange. Skies and oceans are blue. Blush and blood and sunburn are red. Sunsets are largely red and orange. Shadows are blue, the opposite of yellow. Plants are green, but in sun or shade they easily skew more blue or yellow.

All of these colors are much easier to mix if you start with red, blue, and yellow. It may not match how color actually works, but it’s a useful approximation for humans. (Anyway, where will you find dyes that are cyan or magenta? Blue is hard enough.)

I’ve actually done some painting since I first thought about this, and would you believe they sell paints in colors other than bright red, blue, and yellow? You can just pick whatever starting colors you want and the whole notion of “primary” goes a bit out the window. So maybe this is all a bit moot.

More on color names

The way we name colors fascinates me.

A “basic color term” is a single, unambiguous, very common name for a group of colors. English has eleven: red, orange, yellow, green, blue, purple, black, white, gray, pink, and brown.

Of these, orange is the only tertiary hue; brown is the only name for a specifically low-saturation color; pink and grey are the only names for specifically light shades. I can understand grey — it’s handy to have a midpoint between black and white — but the other exceptions are quite interesting.

Looking at the first color wheel again, “blue” and “green” together consume almost half of the spectrum. That seems reasonable, since they’re both primary colors, but “red” is relatively small; large chunks of it have been eaten up by its neighbors.

Orange is a tertiary color in either RGB or CMYK: it’s a mix of red and yellow, a primary and secondary color. Yet we ended up with a distinct name for it. I could understand if this were to give white folks’ skin tones their own category, similar to the reasons for the RBY art class model, but we don’t generally refer to white skin as “orange”. So where did this color come from?

Sometimes I imagine a parallel universe where we have common names for other tertiary colors. How much richer would the blue/green side of the color wheel be if “chartreuse” or “azure” were basic color terms? Can you even imagine treating those as distinct colors, not just variants or green or blue? That’s exactly how we treat orange, even though it’s just a variant of red.

I can’t speak to whether our vocabulary truly influences how we perceive or think (and that often-cited BBC report seems to have no real source). But for what it’s worth, I’ve been trying to think of “azure” as distinct for a few years now, and I’ve had a much easier time dealing with blues in art and design. Giving the cyan end of blue a distinct and common name has given me an anchor, something to arrange thoughts around.

Come to think of it, yellow is an interesting case as well. A decent chunk of the spectrum was ultimately called “yellow” in the xkcd map; here’s that chunk zoomed in a bit.

full range of xkcd yellows

How much of this range would you really call yellow, rather than green (or chartreuse!) or orange? Yellow is a remarkably specific color: mixing it even slightly with one of its neighbors loses some of its yellowness, and darkening it moves it swiftly towards brown.

I wonder why this is. When we see a yellowish-orange, are we inclined to think of it as orange because it looks like orange under yellow sunlight? Is it because yellow is between red and green, and the red and green receptors in the human eye pick up on colors that are very close together?


Most human languages develop their color terms in a similar order, with a split between blue and green often coming relatively late in a language’s development. Of particular interest to me is that orange and pink are listed as a common step towards the end — I’m really curious as to whether that happens universally and independently, or it’s just influence from Western color terms.

I’d love to see a list of the basic color terms in various languages, but such a thing is proving elusive. There’s a neat map of how many colors exist in various languages, but it doesn’t mention what the colors are. It’s easy enough to find a list of colors in various languages, like this one, but I have no idea whether they’re basic in each language. Note also that this chart only has columns for English’s eleven basic colors, even though Russian and several other languages have a twelfth basic term for azure. The page even mentions this, but doesn’t include a column for it, which seems ludicrous in an “omniglot” table.

The only language I know many color words in is Japanese, so I went delving into some of its color history. It turns out to be a fascinating example, because you can see how the color names developed right in the spelling of the words.

See, Japanese has a couple different types of words that function like adjectives. Many of the most common ones end in -i, like kawaii, and can be used like verbs — we would translate kawaii as “cute”, but it can function just as well as “to be cute”. I’m under the impression that -i adjectives trace back to Old Japanese, and new ones aren’t created any more.

That’s really interesting, because to my knowledge, only five Japanese color names are in this form: kuroi (black), shiroi (white), akai (red), aoi (blue), and kiiroi (yellow). So these are, necessarily, the first colors the language could describe. If you compare to the chart showing progression of color terms, this is the bottom cell in column IV: white, red, yellow, green/blue, and black.

A great many color names are compounds with iro, “color” — for example, chairo (brown) is cha (tea) + iro. Of the five basic terms above, kiiroi is almost of that form, but unusually still has the -i suffix. (You might think that shiroi contains iro, but shi is a single character distinct from i. kiiroi is actually written with the kanji for iro.) It’s possible, then, that yellow was the latest of these five words — and that would give Old Japanese words for white, red/yellow, green/blue, and black, matching the most common progression.

Skipping ahead some centuries, I was surprised to learn that midori, the word for green, was only promoted to a basic color fairly recently. It’s existed for a long time and originally referred to “greenery”, but it was considered to be a shade of blue (ao) until the Allied occupation after World War II, when teaching guidelines started to mention a blue/green distinction. (I would love to read more details about this, if you have any; the West’s coming in and adding a new color is a fascinating phenomenon, and I wonder what other substantial changes were made to education.)

Japanese still has a number of compound words that use ao (blue!) to mean what we would consider green: aoshingou is a green traffic light, aoao means “lush” in a natural sense, aonisai is a greenhorn (presumably from the color of unripe fruit), aojiru is a drink made from leafy vegetables, and so on.

This brings us to at least six basic colors, the fairly universal ones: black, white, red, yellow, blue, and green. What others does Japanese have?

From here, it’s a little harder to tell. I’m not exactly fluent and definitely not a native speaker, and resources aimed at native English speakers are more likely to list colors familiar to English speakers. (I mean, until this week, I never knew just how common it was for aoi to mean green, even though midori as a basic color is only about as old as my parents.)

I do know two curious standouts: pinku (pink) and orenji (orange), both English loanwords. I can’t be sure that they’re truly basic color terms, but they sure do come up a lot. The thing is, Japanese already has names for these colors: momoiro (the color of peach — flowers, not the fruit!) and daidaiiro (the color of, um, an orange). Why adopt loanwords for concepts that already exist?

I strongly suspect, but cannot remotely qualify, that pink and orange weren’t basic colors until Western culture introduced the idea that they could be — and so the language adopted the idea and the words simultaneously. (A similar thing happened with grey, natively haiiro and borrowed as guree, but in my limited experience even the loanword doesn’t seem to be very common.)

Based on the shape of the words and my own unqualified guesses of what counts as “basic”, the progression of basic colors in Japanese seems to be:

  1. black, white, red (+ yellow), blue (+ green) — Old Japanese
  2. yellow — later Old Japanese
  3. brown — sometime in the past millenium
  4. green — after WWII
  5. pink, orange — last few decades?

And in an effort to put a teeny bit more actual research into this, I searched the Leeds Japanese word frequency list (drawn from websites, so modern Japanese) for some color words. Here’s the rank of each. Word frequency is generally such that the actual frequency of a word is inversely proportional to its rank — so a word in rank 100 is twice as common as a word in rank 200. The five -i colors are split into both noun and adjective forms, so I’ve included an adjusted rank that you would see if they were counted as a single word, using ab / (a + b).

  • white: 1010 ≈ 1959 (as a noun) + 2083 (as an adjective)
  • red: 1198 ≈ 2101 (n) + 2790 (adj)
  • black: 1253 ≈ 2017 (n) + 3313 (adj)
  • blue: 1619 ≈ 2846 (n) + 3757 (adj)
  • green: 2710
  • yellow: 3316 ≈ 6088 (n) + 7284 (adj)
  • orange: 4732 (orenji), n/a (daidaiiro)
  • pink: 4887 (pinku), n/a (momoiro)
  • purple: 6502 (murasaki)
  • grey: 8472 (guree), 10848 (haiiro)
  • brown: 10622 (chairo)
  • gold: 12818 (kin’iro)
  • silver: n/a (gin’iro)
  • navy: n/a (kon)

n/a” doesn’t mean the word is never used, only that it wasn’t in the top 15,000.

I’m not sure where the cutoff is for “basic” color terms, but it’s interesting to see where the gaps lie. I’m especially surprised that yellow is so far down, and that purple (which I hadn’t even mentioned here) is as high as it is. Also, green is above yellow, despite having been a basic color for less than a century! Go, green.

For comparison, in American English:

  • black: 254
  • white: 302
  • red: 598
  • blue: 845
  • green: 893
  • yellow: 1675
  • brown: 1782
  • golden: 1835
  • græy: 1949
  • pink: 2512
  • orange: 3171
  • purple: 3931
  • silver: n/a
  • navy: n/a

Don’t read too much into the actual ranks; the languages and corpuses are both very different.

Color models

There are numerous ways to arrange and identify colors, much as there are numerous ways to identify points in 3D space. There are also benefits and drawbacks to each model, but I’m often most interested in how much sense the model makes to me as a squishy human.

RGB is the most familiar to anyone who does things with computers — it splits a color into its red, green, and blue channels, and measures the amount of each from “none” to “maximum”. (HTML sets this range as 0 to 255, but you could just as well call it 0 to 1, or -4 to 7600.)

RGB has a couple of interesting problems. Most notably, it’s kind of difficult to read and write by hand. You can sort of get used to how it works, though I’m still not particularly great at it. I keep in mind these rules:

  1. The largest channel is roughly how bright the color is.

    This follows pretty easily from the definition of RGB: it’s colored light added on top of black. The maximum amount of every color makes white, so less than the maximum must be darker, and of course none of any color stays black.

  2. The smallest channel is how pale (desaturated) the color is.

    Mixing equal amounts of red, green, and blue will produce grey. So if the smallest channel is green, you can imagine “splitting” the color between a grey (green, green, green), and the leftovers (red – green, 0, blue – green). Mixing grey with a color will of course make it paler — less saturated, closer to grey — so the bigger the smallest channel, the greyer the color.

  3. Whatever’s left over tells you the hue.

It might be time for an illustration. Consider the color (50%, 62.5%, 75%). The brightness is “capped” at 75%, the largest channel; the desaturation is 50%, the smallest channel. Here’s what that looks like.

illustration of the color (50%, 62.5%, 75%) split into three chunks of 50%, 25%, and 25%

Cutting out the grey and the darkness leaves a chunk in the middle of actual differences between the colors. Note that I’ve normalized it to (0%, 50%, 100%), which is the percentage of that small middle range. Removing the smallest and largest channels will always leave you with a middle chunk where at least one channel is 0% and at least one channel is 100%. (Or it’s grey, and there is no middle chunk.)

The odd one out is green at 50%, so the hue of this color is halfway between cyan (green + blue) and blue. That hue is… azure! So this color is a slightly darkened and fairly dull azure. (The actual amount of “greyness” is the smallest relative to the largest, so in this case it’s about ⅔ grey, or about ⅓ saturated.) Here’s that color.

a slightly darkened, fairly dull azure

This is a bit of a pain to do in your head all the time, so why not do it directly?

HSV is what you get when you directly represent colors as hue, saturation, and value. It’s often depicted as a cylinder, with hue represented as an angle around the color wheel: 0° for red, 120° for green, and 240° for blue. Saturation ranges from grey to a fully-saturated color, and value ranges from black to, er, the color. The azure above is (210°, ⅓, ¾) in HSV — 210° is halfway between 180° (cyan) and 240° (blue), ⅓ is the saturation measurement mentioned before, and ¾ is the largest channel.

It’s that hand-waved value bit that gives me trouble. I don’t really know how to intuitively explain what value is, which makes it hard to modify value to make the changes I want. I feel like I should have a better grasp of this after a year and a half of drawing, but alas.

I prefer HSL, which uses hue, saturation, and lightness. Lightness ranges from black to white, with the unperturbed color in the middle. Here’s lightness versus value for the azure color. (Its lightness is ⅝, the average of the smallest and largest channels.)

comparison of lightness and value for the azure color

The lightness just makes more sense to me. I can understand shifting a color towards white or black, and the color in the middle of that bar feels related to the azure I started with. Value looks almost arbitrary; I don’t know where the color at the far end comes from, and it just doesn’t seem to have anything to do with the original azure.

I’d hoped Wikipedia could clarify this for me. It tells me value is the same thing as brightness, but the mathematical definition on that page matches the definition of intensity from the little-used HSI model. I looked up lightness instead, and the first sentence says it’s also known as value. So lightness is value is brightness is intensity, but also they’re all completely different.

Wikipedia also says that HSV is sometimes known as HSB (where the “B” is for “brightness”), but I swear I’ve only ever seen HSB used as a synonym for HSL. I don’t know anything any more.

Oh, and in case you weren’t confused enough, the definition of “saturation” is different in HSV and HSL. Good luck!

Wikipedia does have some very nice illustrations of HSV and HSL, though, including depictions of them as a cone and double cone.

(Incidentally, you can use HSL directly in CSS now — there are hsl() and hsla() CSS3 functions which evaluate as colors. Combining these with Sass’s scale-color() function makes it fairly easy to come up with decent colors by hand, without having to go back and forth with an image editor. And I can even sort of read them later!)

An annoying problem with all of these models is that the idea of “lightness” is never quite consistent. Even in HSL, a yellow will appear much brighter than a blue with the same saturation and lightness. You may even have noticed in the RGB split diagram that I used dark red and green text, but light blue — the pure blue is so dark that a darker blue on top is hard to read! Yet all three colors have the same lightness in HSL, and the same value in HSV.

Clearly neither of these definitions of lightness or brightness or whatever is really working. There’s a thing called luminance, which is a weighted sum of the red, green, and blue channels that puts green as a whopping ten times brighter than blue. It tends to reflect how bright colors actually appear.

Unfortunately, luminance and related values are only used in fairly obscure color models, like YUV and Lab. I don’t mean “obscure” in the sense that nobody uses them, but rather that they’re very specialized and not often seen outside their particular niches: YUV is very common in video encoding, and Lab is useful for serious photo editing.

Lab is pretty interesting, since it’s intended to resemble how human vision works. It’s designed around the opponent process theory, which states that humans see color in three pairs of opposites: black/white, red/green, and yellow/blue. The idea is that we perceive color as somewhere along these axes, so a redder color necessarily appears less green — put another way, while it’s possible to see “yellowish green”, there’s no such thing as a “yellowish blue”.

(I wonder if that explains our affection for orange: we effectively perceive yellow as a fourth distinct primary color.)

Lab runs with this idea, making its three channels be lightness (but not the HSL lightness!), a (green to red), and b (blue to yellow). The neutral points for a and b are at zero, with green/blue extending in the negative direction and red/yellow extending in the positive direction.

Lab can express a whole bunch of colors beyond RGB, meaning they can’t be shown on a monitor, or even represented in most image formats. And you now have four primary colors in opposing pairs. That all makes it pretty weird, and I’ve actually never used it myself, but I vaguely aspire to do so someday.

I think those are all of the major ones. There’s also XYZ, which I think is some kind of master color model. Of course there’s CMYK, which is used for printing, but it’s effectively just the inverse of RGB.

With that out of the way, now we can get to the hard part!

Colorspaces

I called RGB a color model: a way to break colors into component parts.

Unfortunately, RGB alone can’t actually describe a color. You can tell me you have a color (0%, 50%, 100%), but what does that mean? 100% of what? What is “the most blue”? More importantly, how do you build a monitor that can display “the most blue” the same way as other monitors? Without some kind of absolute reference point, this is meaningless.

A color space is a color model plus enough information to map the model to absolute real-world colors. There are a lot of these. I’m looking at Krita’s list of built-in colorspaces and there are at least a hundred, most of them RGB.

I admit I’m bad at colorspaces and have basically done my best to not ever have to think about them, because they’re a big tangled mess and hard to reason about.

For example! The effective default RGB colorspace that almost everything will assume you’re using by default is sRGB, specifically designed to be this kind of global default. Okay, great.

Now, sRGB has gamma built in. Gamma correction means slapping an exponent on color values to skew them towards or away from black. The color is assumed to be in the range 0–1, so any positive power will produce output from 0–1 as well. An exponent greater than 1 will skew towards black (because you’re multiplying a number less than 1 by itself), whereas an exponent less than 1 will skew away from black.

What this means is that halfway between black and white in sRGB isn’t (50%, 50%, 50%), but around (73%, 73%, 73%). Here’s a great example, borrowed from this post (with numbers out of 255):

alternating black and white lines alongside gray squares of 128 and 187

Which one looks more like the alternating bands of black and white lines? Surely the one you pick is the color that’s actually halfway between black and white.

And yet, in most software that displays or edits images, interpolating white and black will give you a 50% gray — much darker than the original looked. A quick test is to scale that image down by half and see whether the result looks closer to the top square or the bottom square. (Firefox, Chrome, and GIMP get it wrong; Krita gets it right.)

The right thing to do here is convert an image to a linear colorspace before modifying it, then convert it back for display. In a linear colorspace, halfway between white and black is still 50%, but it looks like the 73% grey. This is great fun: it involves a piecewise function and an exponent of 2.4.

It’s really difficult to reason about this, for much the same reason that it’s hard to grasp text encoding problems in languages with only one string type. Ultimately you still have an RGB triplet at every stage, and it’s very easy to lose track of what kind of RGB that is. Then there’s the fact that most images don’t specify a colorspace in the first place so you can’t be entirely sure whether it’s sRGB, linear sRGB, or something entirely; monitors can have their own color profiles; you may or may not be using a program that respects an embedded color profile; and so on. How can you ever tell what you’re actually looking at and whether it’s correct? I can barely keep track of what I mean by “50% grey”.

And then… what about transparency? Should a 50% transparent white atop solid black look like 50% grey, or 73% grey? Krita seems to leave it to the colorspace: sRGB gives the former, but linear sRGB gives the latter. Does this mean I should paint in a linear colorspace? I don’t know! (Maybe I’ll give it a try and see what happens.)

Something I genuinely can’t answer is what effect this has on HSV and HSL, which are defined in terms of RGB. Is there such a thing as linear HSL? Does anyone ever talk about this? Would it make lightness more sensible?

There is a good reason for this, at least: the human eye is better at distinguishing dark colors than light ones. I was surprised to learn that, but of course, it’s been hidden from me by sRGB, which is deliberately skewed to dedicate more space to darker colors. In a linear colorspace, a gradient from white to black would have a lot of indistinguishable light colors, but appear to have severe banding among the darks.

several different black to white gradients

All three of these are regular black-to-white gradients drawn in 8-bit color (i.e., channels range from 0 to 255). The top one is the naïve result if you draw such a gradient in sRGB: the midpoint is the too-dark 50% grey. The middle one is that same gradient, but drawn in a linear colorspace. Obviously, a lot of dark colors are “missing”, in the sense that we could see them but there’s no way to express them in linear color. The bottom gradient makes this more clear: it’s a gradient of all the greys expressible in linear sRGB.

This is the first time I’ve ever delved so deeply into exactly how sRGB works, and I admit it’s kind of blowing my mind a bit. Straightforward linear color is so much lighter, and this huge bias gives us a lot more to work with. Also, 73% being the midpoint certainly explains a few things about my problems with understanding brightness of colors.

There are other RGB colorspaces, of course, and I suppose they all make for an equivalent CMYK colorspace. YUV and Lab are families of colorspaces, though I think most people talking about Lab specifically mean CIELAB (or “L*a*b*”), and there aren’t really any competitors. HSL and HSV are defined in terms of RGB, and image data is rarely stored directly as either, so there aren’t really HSL or HSV colorspaces.

I think that exhausts all the things I know.

Real world color is also a lie

Just in case you thought these problems were somehow unique to computers. Surprise! Modelling color is hard because color is hard.

I’m sure you’ve seen the checker shadow illusion, possibly one of the most effective optical illusions, where the presence of a shadow makes a gray square look radically different than a nearby square of the same color.

Our eyes are very good at stripping away ambient light effects to tell what color something “really” is. Have you ever been outside in bright summer weather for a while, then come inside and everything is starkly blue? Lingering compensation for the yellow sunlight shifting everything to be slightly yellow; the opposite of yellow is blue.

Or, here, I like this. I’m sure there are more drastic examples floating around, but this is the best I could come up with. Here are some Pikachu I found via GIS.

photo of Pikachu plushes on a shelf

My question for you is: what color is Pikachu?

Would you believe… orange?

photo of Pikachu plushes on a shelf, overlaid with color swatches; the Pikachu in the background are orange

In each box, the bottom color is what I color-dropped, and the top color is the same hue with 100% saturation and 50% lightness. It’s the same spot, on the same plush, right next to each other — but the one in the background is orange, not yellow. At best, it’s brown.

What we see as “yellow in shadow” and interpret to be “yellow, but darker” turns out to be another color entirely. (The grey whistles are, likewise, slightly blue.)

Did you know that mirrors are green? You can see it in a mirror tunnel: the image gets slightly greener as it goes through the mirror over and over.

Distant mountains and other objects, of course, look bluer.

This all makes painting rather complicated, since it’s not actually about painting things the color that they “are”, but painting them in such a way that a human viewer will interpret them appropriately.

I, er, don’t know enough to really get very deep here. I really should, seeing as I keep trying to paint things, but I don’t have a great handle on it yet. I’ll have to defer to Mel’s color tutorial. (warning: big)

Blending modes

You know, those things in Photoshop.

I’ve always found these remarkably unintuitive. Most of them have names that don’t remotely describe what they do, the math doesn’t necessarily translate to useful understanding, and they’re incredibly poorly-documented. So I went hunting for some precise definitions, even if I had to read GIMP’s or Krita’s source code.

In the following, A is a starting image, and B is something being drawn on top with the given blending mode. (In the case of layers, B is the layer with the mode, and A is everything underneath.) Generally, the same operation is done on each of the RGB channels independently. Everything is scaled to 0–1, and results are generally clamped to that range.

I believe all of these treat layer alpha the same way: linear interpolation between A and the combination of A and B. If B has alpha t, and the blending mode is a function f, then the result is t × f(A, B) + (1 - t) × A.

If A and B themselves have alpha, the result is a little more complicated, and probably not that interesting. It tends to work how you’d expect. (If you’re really curious, look at the definition of BLEND() in GIMP’s developer docs.)

  • Normal: B. No blending is done; new pixels replace old pixels.

  • Multiply: A × B. As the name suggests, the channels are multiplied together. This is very common in digital painting for slapping on a basic shadow or tinting a whole image.

    I think the name has always thrown me off just a bit because “Multiply” sounds like it should make things bigger and thus brighter — but because we’re dealing with values from 0 to 1, Multiply can only ever make colors darker.

    Multiplying with black produces black. Multiplying with white leaves the other color unchanged. Multiplying with a gray is equivalent to blending with black. Multiplying a color with itself squares the color, which is similar to applying gamma correction.

    Multiply is commutative — if you swap A and B, you get the same result.

  • Screen: 1 - (1 - A)(1 - B). This is sort of an inverse of Multiply; it multiplies darkness rather than lightness. It’s defined as inverting both colors, multiplying, and inverting the result. Accordingly, Screen can only make colors lighter, and is also commutative. All the properties of Multiply apply to Screen, just inverted.

  • Hard Light: Equivalent to Multiply if B is dark (i.e., less than 0.5), or Screen if B is light. There’s an additional factor of 2 included to compensate for how the range of B is split in half: Hard Light with B = 0.4 is equivalent to Multiply with B = 0.8, since 0.4 is 0.8 of the way to 0.5. Right.

    This seems like a possibly useful way to apply basic highlights and shadows with a single layer? I may give it a try.

    The math is commutative, but since B is checked and A is not, Hard Light is itself not commutative.

  • Soft Light: Like Hard Light, but softer. No, really. There are several different versions of this, and they’re all a bit of a mess, not very helpful for understanding what’s going on.

    If you graphed the effect various values of B had on a color, you’d have a straight line from 0 up to 1 (at B = 0.5), and then it would abruptly change to a straight line back down to 0. Soft Light just seeks to get rid of that crease. Here’s Hard Light compared with GIMP’s Soft Light, where A is a black to white gradient from bottom to top, and B is a black to white gradient from left to right.

    graphs of combinations of all grays with Hard Light versus Soft Light

    You can clearly see the crease in the middle of Hard Light, where B = 0.5 and it transitions from Multiply to Screen.

  • Overlay: Equivalent to either Hard Light or Soft Light, depending on who you ask. In GIMP, it’s Soft Light; in Krita, it’s Hard Light except the check is done on A rather than B. Given the ambiguity, I think I’d rather just stick with Hard Light or Soft Light explicitly.

  • Difference: abs(A - B). Does what it says on the tin. I don’t know why you would use this? Difference with black causes no change; Difference with white inverts the colors. Commutative.

  • Addition and Subtract: A + B and A - B. I didn’t think much of these until I discovered that Krita has a built-in brush that uses Addition mode. It’s essentially just a soft spraypaint brush, but because it uses Addition, painting over the same area with a dark color will gradually turn the center white, while the fainter edges remain dark. The result is a fiery glow effect, which is pretty cool. I used it manually as a layer mode for a similar effect, to make a field of sparkles. I don’t know if there are more general applications.

    Addition is commutative, of course, but Subtract is not.

  • Divide: A ÷ B. Apparently this is the same as changing the white point to 1 - B. Accordingly, the result will blow out towards white very quickly as B gets darker.

  • Dodge and Burn: A ÷ (1 - B) and 1 - (1 - A) ÷ B. Inverses in the same way as Multiply and Screen. Similar to Divide, but with B inverted — so Dodge changes the white point to B, with similar caveats as Divide. I’ve never seen either of these effects not look horrendously gaudy, but I think photographers manage to use them, somehow.

  • Darken Only and Lighten Only: min(A, B) and max(A, B). Commutative.

  • Linear Light: (2 × A + B) - 1. I think this is the same as Sai’s “Lumi and Shade” mode, which is very popular, at least in this house. It works very well for simple lighting effects, and shares the Soft/Hard Light property that darker colors darken and lighter colors lighten, but I don’t have a great grasp of it yet and don’t know quite how to explain what it does. So I made another graph:

    graph of Linear Light, with a diagonal band of shading going from upper left to bottom right

    Super weird! Half the graph is solid black or white; you have to stay in that sweet zone in the middle to get reasonable results.

    This is actually a combination of two other modes, Linear Dodge and Linear Burn, combined in much the same way as Hard Light. I’ve never encountered them used on their own, though.

  • Hue, Saturation, Value: Work like you might expect: converts A to HSV and replaces either its hue, saturation, or value with Bs.

  • Color: Uses HSL, unlike the above three. Combines Bs hue and saturation with As lightness.

  • Grain Extract and Grain Merge: A - B + 0.5 and A + B - 0.5. These are clearly related to film grain, somehow, but their exact use eludes me.

    I did find this example post where someone combines a photo with a blurred copy using Grain Extract and Grain Merge. Grain Extract picked out areas of sharp contrast, and Grain Merge emphasized them, which seems relevant enough to film grain. I might give these a try sometime.

Those are all the modes in GIMP (except Dissolve, which isn’t a real blend mode; also, GIMP doesn’t have Linear Light). Photoshop has a handful more. Krita has a preposterous number of other modes, no, really, it is absolutely ridiculous, you cannot even imagine.

I may be out of things

There’s plenty more to say about color, both technically and design-wise — contrast and harmony, color blindness, relativity, dithering, etc. I don’t know if I can say any of it with any particular confidence, though, so perhaps it’s best I stop here.

I hope some of this was instructive, or at least interesting!

The Scratch Olympics

Post Syndicated from Rik Cross original https://www.raspberrypi.org/blog/the-scratch-olympics-2/

Since the Raspberry Pi Foundation merged with Code Club, the newly enlarged Education Team has been working hard to put the power of digital making into the hands of people all over the world.

Among the work we’ve been doing, we’ve created a set of Scratch projects to celebrate the 2016 Olympic Games in Rio.

The initial inspiration for these projects were the games that we used to love as children, commonly referred to as ‘button mashers’. There was little skill required in these games: all you needed was the ability to smash two keys as fast as humanly possible. Examples of this genre include such classics as Geoff Capes Strongman and Daley Thompson’s Decathlon.

With the 2016 Olympics fast approaching, we began to reminisce about these old sports-themed games, and realised what fun today’s kids are missing out on. With that, the Scratch Olympics were born!

There are two resources available on the resources section of this site, the first of which is the Olympic Weightlifter project. With graphics conceived by Sam Alder and produced by Alex Carter, the project helps you create of a button masher masterpiece, producing your very own 1980s-style keyboard-killer that’s guaranteed to strike fear into the hearts of parents all over the world. Physical buttons are an optional extra for the faint of heart.

A pixellated weightlifter blows steam from his ears as he lifts a barbell above his head in an animated gif

The second game in the series is Olympics Hurdles, where you will make a hurdling game which requires the player to hit the keyboard rapidly to make the hurdler run, and use expert timing to make them jump over the hurdles at the right time.

Pixellated athletes approach, leap and clear a hurdle on an athletics track

You’ll also find three new projects over on the Code Club projects website. The first of these is Synchronised Swimming, where you’ll learn how to code a synchronised swimming routine for Scratch the cat, by using loops and creating clones.

Six copies of the Scratch cat against an aqua blue background form a hexagonal synchronised swimming formation

There’s also an Archery project, where you must overcome an archer’s shaky arm to shoot arrows as close to the bullseye as you can, and Sprint!, which uses a 3D perspective to make the player feel as though they’re running towards a finish line. This project can even be coded to work with a homemade running mat! These two projects are only available to registered Code Clubs, and require an ID and PIN to access.

An archery target overlaid with a crosshair
A straight running track converges towards a flat horizon, with a "FINISH" ribbon and "TIME" and "DISTANCE" counters

Creating new Olympics projects is just one of the ways in which the Raspberry Pi Foundation and Code Club are working together to create awesome new resources, and there’s much more to come!

The post The Scratch Olympics appeared first on Raspberry Pi.

On journeys

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2015/03/on-journeys.html

– 1 –

Poland is an ancient country whose history is deeply intertwined with that of the western civilization. In its glory days, the Polish-Lithuanian Commonwealth sprawled across vast expanses of land in central Europe, from Black Sea to Baltic Sea. But over the past two centuries, it suffered a series of military defeats and political partitions at the hands of its closest neighbors: Russia, Austria, Prussia, and – later – Germany.

After more than a hundred years of foreign rule, Poland re-emerged as an independent state in 1918, only to face the armies of Nazi Germany at the onset of World War II. With Poland’s European allies reneging on their earlier military guarantees, the fierce fighting left the country in ruins. Some six million people have died within its borders – more than ten times the death toll in France or in the UK. Warsaw was reduced to a sea of rubble, with perhaps one in ten buildings still standing by the end of the war.

With the collapse of the Third Reich, Franklin D. Roosevelt, Winston Churchill, and Joseph Stalin held a meeting in Yalta to decide the new order for war-torn Europe. At Stalin’s behest, Poland and its neighboring countries were placed under Soviet political and military control, forming what has become known as the Eastern Bloc.

Over the next several decades, the Soviet satellite states experienced widespread repression and economic decline. But weakened by the expense of the Cold War, the communist chokehold on the region eventually began to wane. In Poland, even the introduction of martial law in 1981 could not put an end to sweeping labor unrest. Narrowly dodging the specter of Soviet intervention, the country regained its independence in 1989 and elected its first democratic government; many other Eastern Bloc countries soon followed suit.

Ever since then, Poland has enjoyed a period of unprecedented growth and has emerged as one of the more robust capitalist democracies in the region. In just two decades, it shed many of its backwardly, state-run heavy industries and adopted a modern, service-oriented economy. But the effects of the devastating war and the lost decades under communist rule still linger on – whether you look at the country’s infrastructure, at its socrealist cityscapes, at its political traditions, or at the depressingly low median wage.

When thinking about the American involvement in the Cold War, people around the world may recall Vietnam, Bay of Pigs, or the proxy wars fought in the Middle East. But in Poland and many of its neighboring states, the picture you remember the most is the fall of the Berlin Wall.

– 2 –

I was born in Warsaw in the winter of 1981, at the onset of martial law, with armored vehicles rolling onto Polish streets. My mother, like many of her generation, moved to the capital in the sixties as a part of an effort to rebuild and repopulate the war-torn city. My grandma would tell eerie stories of Germans and Soviets marching through their home village somewhere in the west. I liked listening to the stories; almost every family in Poland had some to tell.

I did not get to know my father. I knew his name; he was a noted cinematographer who worked on big-ticket productions back in the day. He left my mother when I was very young and never showed interest in staying in touch. He had a wife and other children, so it might have been that.

Compared to him, mom hasn’t done well for herself. We ended up in social housing in one of the worst parts of the city, on the right bank of the Vistula river. My early memories from school are that of classmates sniffing glue from crumpled grocery bags. I remember my family waiting in lines for rationed toilet paper and meat. As a kid, you don’t think about it much.

The fall of communism came suddenly. I have a memory of grandma listening to broadcasts from Radio Free Europe, but I did not understand what they were all about. I remember my family cheering one afternoon, transfixed to a black-and-white TV screen. I recall my Russian language class morphing into English; I had my first taste of bananas and grapefruits. There is the image of the monument of Feliks Dzierżyński coming down. I remember being able to go to a better school on the other side of Warsaw – and getting mugged many times on the way.

The transformation brought great wealth to some, but many others have struggled to find their place in the fledgling and sometimes ruthless capitalist economy. Well-educated and well read, my mom ended up in the latter pack, at times barely making ends meet. I think she was in part a victim of circumstance, and in part a slave to way of thinking that did not permit the possibility of taking chances or pursuing happiness.

– 3 –

Mother always frowned upon popular culture, seeing it as unworthy of an educated mind. For a time, she insisted that I only listen to classical music. She angrily shunned video games, comic books, and cartoons. I think she perceived technology as trivia; the only field of science she held in high regard was abstract mathematics, perhaps for its detachment from the mundane world. She hoped that I would learn Latin, a language she could read and write; that I would practice drawing and painting; or that I would read more of the classics of modernist literature.

Of course, I did almost none of that. I hid my grunge rock tapes between Tchaikovsky, listened to the radio under the sheets, and watched the reruns of The A-Team while waiting for her to come back from work. I liked electronics and chemistry a lot more than math. And when I laid my hands on my first computer – an 8-bit relic of British engineering from 1982 – I soon knew that these machines, in their incredible complexity and flexibility, were what I wanted to spend my time on.

I suspected I could become a competent programmer, but never had enough faith in my skill. Yet, in learning about computers, I realized that I had a knack for understanding complex systems and poking holes in how they work. With a couple of friends, we joined the nascent information security community in Europe, comparing notes on mailing lists. Before long, we were taking on serious consulting projects for banks and the government – usually on weekends and after school, but sometimes skipping a class or two. Well, sometimes more than that.

All of the sudden, I was facing an odd choice. I could stop, stay in school and try to get a degree – going back every night to a cramped apartment, my mom sleeping on a folding bed in the kitchen, my personal space limited to a bare futon and a tiny desk. Or, I could seize the moment and try to make it on my own, without hoping that one day, my family would be able to give me a head start.

I moved out, dropped out of school, and took on a full-time job. It paid somewhere around $12,000 a year – a pittance anywhere west of the border, but a solid wage in Poland even today. Not much later, I was making two times as much, about the upper end of what one could hope for in this line of work. I promised myself to keep taking courses after hours, but I wasn’t good at sticking to the plan. I moved in with my girlfriend, and at the age of 19, I felt for the first time that things were going to be all right.

– 4 –

Growing up in Europe, you get used to the barrage of low-brow swipes taken at the United States. Your local news will never pass up the opportunity to snicker about the advances of creationism somewhere in Kentucky. You can stay tuned for a panel of experts telling you about the vastly inferior schools, the medieval justice system, and the striking social inequality on the other side of the pond. You don’t doubt their words – but deep down inside, no matter how smug the critics are, or how seemingly convincing their arguments, the American culture still draws you in.

My moment of truth came in the summer of 2000. A company from Boston asked me if I’d like to talk about a position on their research team; I looked at the five-digit figure and could not believe my luck. Moving to the US was an unreasonable risk for a kid who could barely speak English and had no safety net to fall back to. But that did not matter: I knew I had no prospects of financial independence in Poland – and besides, I simply needed to experience the New World through my own eyes.

Of course, even with a job offer in hand, getting into the United States is not an easy task. An engineering degree and a willing employer opens up a straightforward path; it is simple enough that some companies would abuse the process to source cheap labor for menial, low-level jobs. With a visa tied to the petitioning company, such captive employees could not seek better wages or more rewarding work.

But without a degree, the options shrink drastically. For me, the only route would be a seldom-granted visa reserved for extraordinary skill – meant for the recipients of the Nobel Prize and other folks who truly stand out in their field of expertise. The attorneys looked over my publication record, citations, and the supporting letters from other well-known people in the field. Especially given my age, they thought we had a good shot. A few stressful months later, it turned out that they were right.

On the week of my twentieth birthday, I packed two suitcases and boarded a plane to Boston. My girlfriend joined me, miraculously securing a scholarship at a local university to continue her physics degree; her father helped her with some of the costs. We had no idea what we were doing; we had perhaps few hundred bucks on us, enough to get us through the first couple of days. Four thousand miles away from our place of birth, we were starting a brand new life.

– 5 –

The cultural shock gets you, but not in the sense you imagine. You expect big contrasts, a single eye-opening day to remember for the rest of your life. But driving down a highway in the middle of a New England winter, I couldn’t believe how ordinary the world looked: just trees, boxy buildings, and pavements blanketed with dirty snow.

Instead of a moment of awe, you drown in a sea of small, inconsequential things, draining your energy and making you feel helpless and lost. It’s how you turn on the shower; it’s where you can find a grocery store; it’s what they meant by that incessant “paper or plastic” question at the checkout line. It’s how you get a mailbox key, how you make international calls, it’s how you pay your bills with a check. It’s the rules at the roundabout, it’s your social security number, it’s picking the right toll lane, it’s getting your laundry done. It’s setting up a dial-up account and finding the food you like in the sea of unfamiliar brands. It’s doing all this without Google Maps or a Facebook group to connect with other expats nearby.

The other thing you don’t expect is losing touch with your old friends; you can call or e-mail them every day, but your social frames of reference begin to drift apart, leaving less and less to talk about. The acquaintances you make in the office will probably never replace the folks you grew up with. We managed, but we weren’t prepared for that.

– 6 –

In the summer, we had friends from Poland staying over for a couple of weeks. By the end of their trip, they asked to visit New York City one more time; we liked the Big Apple, so we took them on a familiar ride down I-95. One of them went to see the top of World Trade Center; the rest of us just walked around, grabbing something to eat before we all headed back. A few days later, we were all standing in front of a TV, watching September 11 unfold in real time.

We felt horror and outrage. But when we roamed the unsettlingly quiet streets of Boston, greeted by flags and cardboard signs urging American drivers to honk, we understood that we were strangers a long way from home – and that our future in this country hanged in the balance more than we would have thought.

Permanent residency is a status that gives a foreigner the right to live in the US and do almost anything they please – change jobs, start a business, or live off one’s savings all the same. For many immigrants, the pursuit of this privilege can take a decade or more; for some others, it stays forever out of reach, forcing them to abandon the country in a matter of days as their visas expire or companies fold. With my O-1 visa, I always counted myself among the lucky ones. Sure, it tied me to an employer, but I figured that sorting it out wouldn’t be a big deal.

That proved to be a mistake. In the wake of 9/11, an agency known as Immigration and Naturalization Services was being dismantled and replaced by a division within the Department of Homeland Security. My own seemingly straightforward immigration petition ended up somewhere in the bureaucratic vacuum that formed in between the two administrative bodies. I waited patiently, watching the deepening market slump, and seeing my employer’s prospects get dimmer and dimmer every month. I was ready for the inevitable, with other offers in hand, prepared to make my move perhaps the very first moment I could. But the paperwork just would not come through. With the Boston office finally shutting down, we packed our bags and booked flights. We faced the painful admission that for three years, we chased nothing but a pipe dream. The only thing we had to show for it were two adopted cats, now sitting frightened somewhere in the cargo hold.

The now-worthless approval came through two months later; the lawyers, cheerful as ever, were happy to send me a scan. The hollowed-out remnants of my former employer were eventually bought by Symantec – the very place from where I had my backup offer in hand.

– 7 –

In a way, Europe’s obsession with America’s flaws made it easier to come home without ever explaining how the adventure really played out. When asked, I could just wing it: a mention of the death penalty or permissive gun laws would always get you a knowing nod, allowing the conversation to move on.

Playing to other people’s preconceptions takes little effort; lying to yourself calls for more skill. It doesn’t help that when you come back after three years away from home, you notice all the small annoyances that you used to simply tune out. Back then, Warsaw still had a run-down vibe: the dilapidated road from the airport; the drab buildings on the other side of the river; the uneven pavements littered with dog poop; the dirty walls at my mother’s place, with barely any space to turn. You can live with it, of course – but it’s a reminder that you settled for less, and it’s a sensation that follows you every step of the way.

But more than the sights, I couldn’t forgive myself something else: that I was coming back home with just loose change in my pocket. There are some things that a failed communist state won’t teach you, and personal finance is one of them; I always looked at money just as a reward for work, something you get to spend to brighten your day. The indulgences were never extravagant: perhaps I would take the cab more often, or have take-out every day. But no matter how much I made, I kept living paycheck-to-paycheck – the only way I knew, the way our family always did.

– 8 –

With a three-year stint in the US on your resume, you don’t have a hard time finding a job in Poland. You face the music in a different way. I ended up with a salary around a fourth of what I used to make in Massachusetts, but I simply decided not to think about it much. I wanted to settle down, work on interesting projects, marry my girlfriend, have a child. I started doing consulting work whenever I could, setting almost all the proceeds aside.

After four years with T-Mobile in Poland, I had enough saved to get us through a year or so – and in a way, it changed the way I looked at my work. Being able to take on ambitious challenges and learn new things started to matter more than jumping ships for a modest salary bump. Burned by the folly of pursuing riches in a foreign land, I put a premium on boring professional growth.

Comically, all this introspection made me realize that from where I stood, I had almost nowhere left to go. Sure, Poland had telcos, refineries, banks – but they all consumed the technologies developed elsewhere, shipped here in a shrink-wrapped box; as far as their IT went, you could hardly tell the companies apart. To be a part of the cutting edge, you had to pack your bags, book a flight, and take a jump into the unknown. I sure as heck wasn’t ready for that again.

And then, out of the blue, Google swooped in with an offer to work for them from the comfort of my home, dialing in for a videoconference every now and then. The starting pay was about the same, but I had no second thoughts. I didn’t say it out loud, but deep down inside, I already knew what needed to happen next.

– 9 –

We moved back to the US in 2009, two years after taking the job, already on the hook for a good chunk of Google’s product security and with the comfort of knowing where we stood. In a sense, my motive was petty: you could call it a desire to vindicate a failed adolescent dream. But in many other ways, I have grown fond of the country that shunned us once before; and I wanted our children to grow up without ever having to face the tough choices and the uncertain prospects I had to deal with in my earlier years.

This time, we knew exactly what to do: a quick stop at a grocery store on a way from the airport, followed by e-mail to our immigration folks to get the green card paperwork out the door. A bit more than half a decade later, we were standing in a theater in Campbell, reciting the Oath of Allegiance and clinging on to our new certificates of US citizenship.

The ceremony closed a long and interesting chapter in my life. But more importantly, standing in that hall with people from all over the globe made me realize that my story is not extraordinary; many of them had lived through experiences far more harrowing and captivating than mine. If anything, my tale is hard to tell apart from that of countless other immigrants from the former Eastern Bloc. By some estimates, in the US alone, the Polish diaspora is about 9 million strong.

I know that the Poland of today is not the Poland I grew up in. It’s not not even the Poland I came back to in 2003; the gap to Western Europe is shrinking every single year. But I am grateful to now live in a country that welcomes more immigrants than any other place on Earth – and at the end of their journey, makes many of them them feel at home. It also makes me realize how small and misguided must be the conversations we are having about immigration – not just here, but all over the developed world.

To explore other articles in this short series about Poland, click here. You can also directly proceed to the next entry here.