Tag Archives: emoji

Amazon Translate Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-translate-now-generally-available/


Today we’re excited to make Amazon Translate generally available. Late last year at AWS re:Invent my colleague Tara Walker wrote about a preview of a new AI service, Amazon Translate. Starting today you can access Amazon Translate in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) with a 2 million character monthly free tier for the first 12 months and $15 per million characters after that. There are a number of new features available in GA: automatic source language inference, Amazon CloudWatch support, and up to 5000 characters in a single TranslateText call. Let’s take a quick look at the service in general availability.

Amazon Translate New Features

Since Tara’s post already covered the basics of the service I want to point out some of the new features of the service released today. Let’s start with a code sample:

import boto3
translate = boto3.client("translate")
resp = translate.translate_text(
    Text="🇫🇷Je suis très excité pour Amazon Traduire🇫🇷",
    SourceLanguageCode="auto",
    TargetLanguageCode="en"
)
print(resp['TranslatedText'])

Since I have specified my source language as auto, Amazon Translate will call Amazon Comprehend on my behalf to determine the source language used in this text. If you couldn’t guess it, we’re writing some French and the output is 🇫🇷I'm very excited about Amazon Translate 🇫🇷. You’ll notice that our emojis are preserved in the output text which is definitely a bonus feature for Millennials like me.

The Translate console is a great way to get started and see some sample response.

Translate is extremely easy to use in AWS Lambda functions which allows you to use it with almost any AWS service. There are a number of examples in the Translate documentation showing how to do everything from translate a web page to a Amazon DynamoDB table. Paired with other ML services like Amazon Comprehend and [transcribe] you can build everything from closed captioning to real-time chat translation to a robust text analysis pipeline for call centers transcriptions and other textual data.

New Languages Coming Soon

Today, Amazon Translate allows you to translate text to or from English, to any of the following languages: Arabic, Chinese (Simplified), French, German, Portuguese, and Spanish. We’ve announced support for additional languages coming soon: Japanese (go JAWSUG), Russian, Italian, Chinese (Traditional), Turkish, and Czech.

Amazon Translate can also be used to increase professional translator efficiency, and reduce costs and turnaround times for their clients. We’ve already partnered with a number of Language Service Providers (LSPs) to offer their customers end-to-end translation services at a lower cost by allowing Amazon Translate to produce a high-quality draft translation that’s then edited by the LSP for a guaranteed human quality result.

I’m excited to see what applications our customers are able to build with high quality machine translation just one API call away.

Randall

LED cubes and how to map them

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/led-panel-cubes-and-how-to-build-them/

Taking inspiration from a cube he had filmed at the 34th Chaos Communication Congress in Leipzig, Germany, polyfloyd gathered friends Sebastius and Boekenwuurm together to create their own.

The build

As polyfloyd’s blog post for the project notes, Sebastius led the way with the hardware portion of the build. The cube is made from six LED panels driven by a Raspberry Pi, and uses a breakout board to support the panels, which are connected in pairs:

The displays are connected in 3 chains, the maximum number of parallel chains the board supports, of 2 panels each. Having a higher degree of parallelization increases the refresh rate which in turn improves the overall image quality.

The first two chains make up the 4 sides. The remaining chain makes up the top and bottom of the cube.

Sebastius removed the plastic frames that come as standard on the panels, in order to allow them to fit together snugly as a cube. He designed and laser-cut a custom frame from plywood to support the panels instead.

Raspberry Pi LED Cube

Software

The team used hzeller’s software to drive the panels, and polyfloyd wrote their own program to “shove the pixels around”. polyfloyd used Ledcat, software they had made to drive previous LED projects, and adapted this interface so programs written for Ledcat would also work with hzeller’s library.

The full code for the project can be found on polyfloyd’s GitHub profile. It includes the ability to render animations to gzipped files, and to stream animations in real time via SSH.

Mapping 2D and spherical images with shaders

“One of the programs that could work with my LED-panels through [Unix] pipes was Shady,” observes polyfloyd, explaining the use of shaders with the cube. “The program works by rendering OpenGL fragment shaders to an RGB24 format which could then be piped to wherever needed. These shaders are small programs that can render an image by calculating the color for each pixel on the screen individually.”

The team programmed a shader to map the two-dimensional position of pixels in an image to the three-dimensional space of the cube. This then allowed the team to apply the mapping to spherical images, such as the globe in the video below:

The team has interesting plans for the cube moving forward, including the addition of an accelerometer and batteries. Follow their progress on the polyfloyd blog.

Fun with LED panels

The internet is full of amazing Raspberry Pi projects that use LED panels. This recent project available on Instructables shows how to assemble and set up a particle generator, while this one, featured on this blog last year, tracks emojis used on the Chelsea Handler: Gotta Go! app.

The post LED cubes and how to map them appeared first on Raspberry Pi.

timeShift(GrafanaBuzz, 1w) Issue 30

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/01/19/timeshiftgrafanabuzz-1w-issue-30/

Welcome to TimeShift

We’re only 6 weeks away from the next GrafanaCon and here at Grafana Labs we’re buzzing with excitement. We have some great talks lined up that you won’t want to miss.

This week’s TimeShift covers Grafana’s annotation functionality, monitoring with Prometheus, integrating Grafana with NetFlow and a peek inside Stream’s monitoring stack. Enjoy!


Latest Stable Release

Grafana 4.6.3 is now available. Latest bugfixes include:

  • Gzip: Fixes bug Gravatar images when gzip was enabled #5952
  • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #99513
  • Alerting: Fixes bug where rules evaluated as firing when all conditions was false and using OR operator. #93183
  • Cloudwatch: CloudWatch no longer display metrics’ default alias #101514, thx @mtanda

Download Grafana 4.6.3 Now


From the Blogosphere

Walkthrough: Watch your Ansible deployments in Grafana!: Your graphs start spiking and your platform begins behaving abnormally. Did the config change in a deployment, causing the problem? This article covers Grafana’s new annotation functionality, and specifically, how to create deployment annotations via Ansible playbooks.

Application Monitoring in OpenShift with Prometheus and Grafana: There are many article describing how to monitor OpenShift with Prometheus running in the same cluster, but what if you don’t have admin permissions to the cluster you need to monitor?

Spring Boot Metrics Monitoring Using Prometheus & Grafana: As the title suggests, this post walks you through how to configure Prometheus and Grafana to monitor you Spring Boot application metrics.

How to Integrate Grafana with NetFlow: Learn how to monitor NetFlow from Scrutinizer using Grafana’s SimpleJSON data source.

Stream & Go: News Feeds for Over 300 Million End Users: Stream lets you build scalable newsfeeds and activity streams via their API, which is used by more than 300 million end users. In this article, they discuss their monitoring stack and why they chose particular components and technologies.


GrafanaCon EU Tickets are Going Fast!

We’re six weeks from kicking off GrafanaCon EU! Join us for talks from Google, Bloomberg, Tinder, eBay and more! You won’t want to miss two great days of open source monitoring talks and fun in Amsterdam. Get your tickets before they sell out!

Get Your Ticket Now


Grafana Plugins

We have a couple of plugin updates to share this week that add some new features and improvements. Updating your plugins is easy. For on-prem Grafana, use the Grafana-cli tool, or update with 1 click on your Hosted Grafana.

UPDATED PLUGIN

Druid Data Source – This new update is packed with new features. Notable enhancement include:

  • Post Aggregation feature
  • Support for thetaSketch
  • Improvements to the Query editor

Update Now

UPDATED PLUGIN

Breadcrumb Panel – The Breadcrumb Panel is a small panel you can include in your dashboard that tracks other dashboards you have visited – making it easy to navigate back to a previously visited dashboard. The latest release adds support for dashboards loaded from a file.

Update Now


Upcoming Events

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

SnowCamp 2018: Yves Brissaud – Application metrics with Prometheus and Grafana | Grenoble, France – Jan 24, 2018:
We’ll take a look at how Prometheus, Grafana and a bit of code make it possible to obtain temporal data to visualize the state of our applications as well as to help with development and debugging.

Register Now

Women Who Go Berlin: Go Workshop – Monitoring and Troubleshooting using Prometheus and Grafana | Berlin, Germany – Jan 31, 2018: In this workshop we will learn about one of the most important topics in making apps production ready: Monitoring. We will learn how to use tools you’ve probably heard a lot about – Prometheus and Grafana, and using what we learn we will troubleshoot a particularly buggy Go app.

Register Now

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. There is no need to register; all are welcome.

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Carl Bergquist – Quickie: Monitoring? Not OPS Problem

Why should we monitor our system? Why can’t we just rely on the operations team anymore? They use to be able to do that. What’s currently changing? Presentation content: – Why do we monitor our system – How did it use to work? – Whats changing – Why do we need to shift focus – Everyone should be on call. – Resilience is the goal (Best way of having someone care about quality is to make them responsible).

Register Now

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Leonard Gram – Presentation: DevOps Deconstructed

What’s a Site Reliability Engineer and how’s that role different from the DevOps engineer my boss wants to hire? I really don’t want to be on call, should I? Is Docker the right place for my code or am I better of just going straight to Serverless? And why should I care about any of it? I’ll try to answer some of these questions while looking at what DevOps really is about and how commodisation of servers through “the cloud” ties into it all. This session will be an opinionated piece from a developer who’s been on-call for the past 6 years and would like to convince you to do the same, at least once.

Register Now

Stockholm Metrics and Monitoring | Stockholm, Sweden – Feb 7, 2018:
Observability 3 ways – Logging, Metrics and Distributed Tracing

Let’s talk about often confused telemetry tools: Logging, Metrics and Distributed Tracing. We’ll show how you capture latency using each of the tools and how they work differently. Through examples and discussion, we’ll note edge cases where certain tools have advantages over others. By the end of this talk, we’ll better understand how each of Logging, Metrics and Distributed Tracing aids us in different ways to understand our applications.

Register Now

OpenNMS – Introduction to “Grafana” | Webinar – Feb 21, 2018:
IT monitoring helps detect emerging hardware damage and performance bottlenecks in the enterprise network before any consequential damage or disruption to business processes occurs. The powerful open-source OpenNMS software monitors a network, including all connected devices, and provides logging of a variety of data that can be used for analysis and planning purposes. In our next OpenNMS webinar on February 21, 2018, we introduce “Grafana” – a web-based tool for creating and displaying dashboards from various data sources, which can be perfectly combined with OpenNMS.

Register Now


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

As we say with pie charts, use emojis wisely 😉


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


How are we doing?

That wraps up our 30th issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Fedora 27 released

Post Syndicated from corbet original https://lwn.net/Articles/738966/rss

The Fedora 27
release
is now available. “The Workstation edition of Fedora 27 features GNOME 3.26. In the new release, both the Display and Network configuration panels have been updated, along with the overall Settings panel appearance improvement. The system search now shows more results at once, including the system actions.

GNOME 3.26 also features color emoji support, folder sharing in Boxes, and
numerous improvements in the Builder IDE tool.”

Weekly roundup: Pedal to the medal

Post Syndicated from Eevee original https://eev.ee/dev/2017/11/09/weekly-roundup-pedal-to-the-medal/

Hi! Sorry. I’m a bit late. I’ve actually been up to my eyeballs in doing stuff for a few days, which has been pretty cool.

  • fox flux: Definitely been ramping up how much I’m working on this game. Finished another landing animation blah blah player sprites. Some more work on visual effects, this time a cool silhouette stencil effect thing.

  • art: Drew a pic celebrating 1000 followers on my nsfw art Twitter, wow!

  • blog: Wrote half of another cross-cutting programming languages post, for October. Then forgot about it for, uhhh, ten days. Whoops! Will definitely get back to that, um, soon.

  • writing: Actually made some “good ass legit progress” (according to my notes) on the little Flora twine I’m writing, now including some actual prose instead of just JavaScript wankery.

  • bots: I added a bunch more patterns to my Perlin noise Twitter bot and finally implemented a little “masking” thing that will let me make more complex patterns while still making it obvious what they’re supposed to be.

    Alas, while Twitter recently bumped the character limit to 280, that doesn’t mean the bot’s output can now be twice as big — emoji now count as two characters. (No, not because of UTF-16; Twitter is deliberately restricting CJK to 140. It’s super weird.)

  • cc: I got undo working with this accursèd sprite animation UI, and I fixed just a whole mess of bugs.

This week has been even more busy, which I think bodes well. I’m up to a lot of stuff, hope you’re looking forward to it!

JavaScript got better while I wasn’t looking

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/07/javascript-got-better-while-i-wasnt-looking/

IndustrialRobot has generously donated in order to inquire:

In the last few years there seems to have been a lot of activity with adding emojis to Unicode. Has there been an equal effort to add ‘real’ languages/glyph systems/etc?

And as always, if you don’t have anything to say on that topic, feel free to choose your own. :p

Yes.

I mean, each release of Unicode lists major new additions right at the top — Unicode 10, Unicode 9, Unicode 8, etc. They also keep fastidious notes, so you can also dig into how and why these new scripts came from, by reading e.g. the proposal for the addition of Zanabazar Square. I don’t think I have much to add here; I’m not a real linguist, I only play one on TV.

So with that out of the way, here’s something completely different!

A brief history of JavaScript

JavaScript was created in seven days, about eight thousand years ago. It was pretty rough, and it stayed rough for most of its life. But that was fine, because no one used it for anything besides having a trail of sparkles follow your mouse on their Xanga profile.

Then people discovered you could actually do a handful of useful things with JavaScript, and it saw a sharp uptick in usage. Alas, it stayed pretty rough. So we came up with polyfills and jQuerys and all kinds of miscellaneous things that tried to smooth over the rough parts, to varying degrees of success.

And… that’s it. That’s pretty much how things stayed for a while.


I have complicated feelings about JavaScript. I don’t hate it… but I certainly don’t enjoy it, either. It has some pretty neat ideas, like prototypical inheritance and “everything is a value”, but it buries them under a pile of annoying quirks and a woefully inadequate standard library. The DOM APIs don’t make things much better — they seem to be designed as though the target language were Java, rarely taking advantage of any interesting JavaScript features. And the places where the APIs overlap with the language are a hilarious mess: I have to check documentation every single time I use any API that returns a set of things, because there are at least three totally different conventions for handling that and I can’t keep them straight.

The funny thing is that I’ve been fairly happy to work with Lua, even though it shares most of the same obvious quirks as JavaScript. Both languages are weakly typed; both treat nonexistent variables and keys as simply false values, rather than errors; both have a single data structure that doubles as both a list and a map; both use 64-bit floating-point as their only numeric type (though Lua added integers very recently); both lack a standard object model; both have very tiny standard libraries. Hell, Lua doesn’t even have exceptions, not really — you have to fake them in much the same style as Perl.

And yet none of this bothers me nearly as much in Lua. The differences between the languages are very subtle, but combined they make a huge impact.

  • Lua has separate operators for addition and concatenation, so + is never ambiguous. It also has printf-style string formatting in the standard library.

  • Lua’s method calls are syntactic sugar: foo:bar() just means foo.bar(foo). Lua doesn’t even have a special this or self value; the invocant just becomes the first argument. In contrast, JavaScript invokes some hand-waved magic to set its contextual this variable, which has led to no end of confusion.

  • Lua has an iteration protocol, as well as built-in iterators for dealing with list-style or map-style data. JavaScript has a special dedicated Array type and clumsy built-in iteration syntax.

  • Lua has operator overloading and (surprisingly flexible) module importing.

  • Lua allows the keys of a map to be any value (though non-scalars are always compared by identity). JavaScript implicitly converts keys to strings — and since there’s no operator overloading, there’s no way to natively fix this.

These are fairly minor differences, in the grand scheme of language design. And almost every feature in Lua is implemented in a ridiculously simple way; in fact the entire language is described in complete detail in a single web page. So writing JavaScript is always frustrating for me: the language is so close to being much more ergonomic, and yet, it isn’t.

Or, so I thought. As it turns out, while I’ve been off doing other stuff for a few years, browser vendors have been implementing all this pie-in-the-sky stuff from “ES5” and “ES6”, whatever those are. People even upgrade their browsers now. Lo and behold, the last time I went to write JavaScript, I found out that a number of papercuts had actually been solved, and the solutions were sufficiently widely available that I could actually use them in web code.

The weird thing is that I do hear a lot about JavaScript, but the feature I’ve seen raved the most about by far is probably… built-in types for working with arrays of bytes? That’s cool and all, but not exactly the most pressing concern for me.

Anyway, if you also haven’t been keeping tabs on the world of JavaScript, here are some things we missed.

let

MDN docs — supported in Firefox 44, Chrome 41, IE 11, Safari 10

I’m pretty sure I first saw let over a decade ago. Firefox has supported it for ages, but you actually had to opt in by specifying JavaScript version 1.7. Remember JavaScript versions? You know, from back in the days when people actually suggested you write stuff like this:

1
<SCRIPT LANGUAGE="JavaScript1.2" TYPE="text/javascript">

Yikes.

Anyway, so, let declares a variable — but scoped to the immediately containing block, unlike var, which scopes to the innermost function. The trouble with var was that it was very easy to make misleading:

1
2
3
4
5
6
// foo exists here
while (true) {
    var foo = ...;
    ...
}
// foo exists here too

If you reused the same temporary variable name in a different block, or if you expected to be shadowing an outer foo, or if you were trying to do something with creating closures in a loop, this would cause you some trouble.

But no more, because let actually scopes the way it looks like it should, the way variable declarations do in C and friends. As an added bonus, if you refer to a variable declared with let outside of where it’s valid, you’ll get a ReferenceError instead of a silent undefined value. Hooray!

There’s one other interesting quirk to let that I can’t find explicitly documented. Consider:

1
2
3
4
5
6
7
let closures = [];
for (let i = 0; i < 4; i++) {
    closures.push(function() { console.log(i); });
}
for (let j = 0; j < closures.length; j++) {
    closures[j]();
}

If this code had used var i, then it would print 4 four times, because the function-scoped var i means each closure is sharing the same i, whose final value is 4. With let, the output is 0 1 2 3, as you might expect, because each run through the loop gets its own i.

But wait, hang on.

The semantics of a C-style for are that the first expression is only evaluated once, at the very beginning. So there’s only one let i. In fact, it makes no sense for each run through the loop to have a distinct i, because the whole idea of the loop is to modify i each time with i++.

I assume this is simply a special case, since it’s what everyone expects. We expect it so much that I can’t find anyone pointing out that the usual explanation for why it works makes no sense. It has the interesting side effect that for no longer de-sugars perfectly to a while, since this will print all 4s:

1
2
3
4
5
6
7
8
9
closures = [];
let i = 0;
while (i < 4) {
    closures.push(function() { console.log(i); });
    i++;
}
for (let j = 0; j < closures.length; j++) {
    closures[j]();
}

This isn’t a problem — I’m glad let works this way! — it just stands out to me as interesting. Lua doesn’t need a special case here, since it uses an iterator protocol that produces values rather than mutating a visible state variable, so there’s no problem with having the loop variable be truly distinct on each run through the loop.

Classes

MDN docs — supported in Firefox 45, Chrome 42, Safari 9, Edge 13

Prototypical inheritance is pretty cool. The way JavaScript presents it is a little bit opaque, unfortunately, which seems to confuse a lot of people. JavaScript gives you enough functionality to make it work, and even makes it sound like a first-class feature with a property outright called prototype… but to actually use it, you have to do a bunch of weird stuff that doesn’t much look like constructing an object or type.

The funny thing is, people with almost any background get along with Python just fine, and Python uses prototypical inheritance! Nobody ever seems to notice this, because Python tucks it neatly behind a class block that works enough like a Java-style class. (Python also handles inheritance without using the prototype, so it’s a little different… but I digress. Maybe in another post.)

The point is, there’s nothing fundamentally wrong with how JavaScript handles objects; the ergonomics are just terrible.

Lo! They finally added a class keyword. Or, rather, they finally made the class keyword do something; it’s been reserved this entire time.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

This is all just sugar for existing features: creating a Vector function to act as the constructor, assigning a function to Vector.prototype.dot, and whatever it is you do to make a property. (Oh, there are properties. I’ll get to that in a bit.)

The class block can be used as an expression, with or without a name. It also supports prototypical inheritance with an extends clause and has a super pseudo-value for superclass calls.

It’s a little weird that the inside of the class block has its own special syntax, with function omitted and whatnot, but honestly you’d have a hard time making a class block without special syntax.

One severe omission here is that you can’t declare values inside the block, i.e. you can’t just drop a bar = 3; in there if you want all your objects to share a default attribute. The workaround is to just do this.bar = 3; inside the constructor, but I find that unsatisfying, since it defeats half the point of using prototypes.

Properties

MDN docs — supported in Firefox 4, Chrome 5, IE 9, Safari 5.1

JavaScript historically didn’t have a way to intercept attribute access, which is a travesty. And by “intercept attribute access”, I mean that you couldn’t design a value foo such that evaluating foo.bar runs some code you wrote.

Exciting news: now it does. Or, rather, you can intercept specific attributes, like in the class example above. The above magnitude definition is equivalent to:

1
2
3
4
5
6
7
Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});

Beautiful.

And what even are these configurable and enumerable things? It seems that every single key on every single object now has its own set of three Boolean twiddles:

  • configurable means the property itself can be reconfigured with another call to Object.defineProperty.
  • enumerable means the property appears in for..in or Object.keys().
  • writable means the property value can be changed, which only applies to properties with real values rather than accessor functions.

The incredibly wild thing is that for properties defined by Object.defineProperty, configurable and enumerable default to false, meaning that by default accessor properties are immutable and invisible. Super weird.

Nice to have, though. And luckily, it turns out the same syntax as in class also works in object literals.

1
2
3
4
5
6
Vector.prototype = {
    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
    ...
};

Alas, I’m not aware of a way to intercept arbitrary attribute access.

Another feature along the same lines is Object.seal(), which marks all of an object’s properties as non-configurable and prevents any new properties from being added to the object. The object is still mutable, but its “shape” can’t be changed. And of course you can just make the object completely immutable if you want, via setting all its properties non-writable, or just using Object.freeze().

I have mixed feelings about the ability to irrevocably change something about a dynamic runtime. It would certainly solve some gripes of former Haskell-minded colleagues, and I don’t have any compelling argument against it, but it feels like it violates some unwritten contract about dynamic languages — surely any structural change made by user code should also be able to be undone by user code?

Slurpy arguments

MDN docs — supported in Firefox 15, Chrome 47, Edge 12, Safari 10

Officially this feature is called “rest parameters”, but that’s a terrible name, no one cares about “arguments” vs “parameters”, and “slurpy” is a good word. Bless you, Perl.

1
2
3
function foo(a, b, ...args) {
    // ...
}

Now you can call foo with as many arguments as you want, and every argument after the second will be collected in args as a regular array.

You can also do the reverse with the spread operator:

1
2
3
4
5
let args = [];
args.push(1);
args.push(2);
args.push(3);
foo(...args);

It even works in array literals, even multiple times:

1
2
let args2 = [...args, ...args];
console.log(args2);  // [1, 2, 3, 1, 2, 3]

Apparently there’s also a proposal for allowing the same thing with objects inside object literals.

Default arguments

MDN docs — supported in Firefox 15, Chrome 49, Edge 14, Safari 10

Yes, arguments can have defaults now. It’s more like Sass than Python — default expressions are evaluated once per call, and later default expressions can refer to earlier arguments. I don’t know how I feel about that but whatever.

1
2
3
function foo(n = 1, m = n + 1, list = []) {
    ...
}

Also, unlike Python, you can have an argument with a default and follow it with an argument without a default, since the default default (!) is and always has been defined as undefined. Er, let me just write it out.

1
2
3
function bar(a = 5, b) {
    ...
}

Arrow functions

MDN docs — supported in Firefox 22, Chrome 45, Edge 12, Safari 10

Perhaps the most humble improvement is the arrow function. It’s a slightly shorter way to write an anonymous function.

1
2
3
(a, b, c) => { ... }
a => { ... }
() => { ... }

An arrow function does not set this or some other magical values, so you can safely use an arrow function as a quick closure inside a method without having to rebind this. Hooray!

Otherwise, arrow functions act pretty much like regular functions; you can even use all the features of regular function signatures.

Arrow functions are particularly nice in combination with all the combinator-style array functions that were added a while ago, like Array.forEach.

1
2
3
[7, 8, 9].forEach(value => {
    console.log(value);
});

Symbol

MDN docs — supported in Firefox 36, Chrome 38, Edge 12, Safari 9

This isn’t quite what I’d call an exciting feature, but it’s necessary for explaining the next one. It’s actually… extremely weird.

symbol is a new kind of primitive (like number and string), not an object (like, er, Number and String). A symbol is created with Symbol('foo'). No, not new Symbol('foo'); that throws a TypeError, for, uh, some reason.

The only point of a symbol is as a unique key. You see, symbols have one very special property: they can be used as object keys, and will not be stringified. Remember, only strings can be keys in JavaScript — even the indices of an array are, semantically speaking, still strings. Symbols are a new exception to this rule.

Also, like other objects, two symbols don’t compare equal to each other: Symbol('foo') != Symbol('foo').

The result is that symbols solve one of the problems that plauges most object systems, something I’ve talked about before: interfaces. Since an interface might be implemented by any arbitrary type, and any arbitrary type might want to implement any number of arbitrary interfaces, all the method names on an interface are effectively part of a single global namespace.

I think I need to take a moment to justify that. If you have IFoo and IBar, both with a method called method, and you want to implement both on the same type… you have a problem. Because most object systems consider “interface” to mean “I have a method called method, with no way to say which interface’s method you mean. This is a hard problem to avoid, because IFoo and IBar might not even come from the same library. Occasionally languages offer a clumsy way to “rename” one method or the other, but the most common approach seems to be for interface designers to avoid names that sound “too common”. You end up with redundant mouthfuls like IFoo.foo_method.

This incredibly sucks, and the only languages I’m aware of that avoid the problem are the ML family and Rust. In Rust, you define all the methods for a particular trait (interface) in a separate block, away from the type’s “own” methods. It’s pretty slick. You can still do obj.method(), and as long as there’s only one method among all the available traits, you’ll get that one. If not, there’s syntax for explicitly saying which trait you mean, which I can’t remember because I’ve never had to use it.

Symbols are JavaScript’s answer to this problem. If you want to define some interface, you can name its methods with symbols, which are guaranteed to be unique. You just have to make sure you keep the symbol around somewhere accessible so other people can actually use it. (Or… not?)

The interesting thing is that JavaScript now has several of its own symbols built in, allowing user objects to implement features that were previously reserved for built-in types. For example, you can use the Symbol.hasInstance symbol — which is simply where the language is storing an existing symbol and is not the same as Symbol('hasInstance')! — to override instanceof:

1
2
3
4
5
6
7
8
// oh my god don't do this though
class EvenNumber {
    static [Symbol.hasInstance](obj) {
        return obj % 2 == 0;
    }
}
console.log(2 instanceof EvenNumber);  // true
console.log(3 instanceof EvenNumber);  // false

Oh, and those brackets around Symbol.hasInstance are a sort of reverse-quoting — they indicate an expression to use where the language would normally expect a literal identifier. I think they work as object keys, too, and maybe some other places.

The equivalent in Python is to implement a method called __instancecheck__, a name which is not special in any way except that Python has reserved all method names of the form __foo__. That’s great for Python, but doesn’t really help user code. JavaScript has actually outclassed (ho ho) Python here.

Of course, obj[BobNamespace.some_method]() is not the prettiest way to call an interface method, so it’s not perfect. I imagine this would be best implemented in user code by exposing a polymorphic function, similar to how Python’s len(obj) pretty much just calls obj.__len__().

I only bring this up because it’s the plumbing behind one of the most incredible things in JavaScript that I didn’t even know about until I started writing this post. I’m so excited oh my gosh. Are you ready? It’s:

Iteration protocol

MDN docs — supported in Firefox 27, Chrome 39, Safari 10; still experimental in Edge

Yes! Amazing! JavaScript has first-class support for iteration! I can’t even believe this.

It works pretty much how you’d expect, or at least, how I’d expect. You give your object a method called Symbol.iterator, and that returns an iterator.

What’s an iterator? It’s an object with a next() method that returns the next value and whether the iterator is exhausted.

Wait, wait, wait a second. Hang on. The method is called next? Really? You didn’t go for Symbol.next? Python 2 did exactly the same thing, then realized its mistake and changed it to __next__ in Python 3. Why did you do this?

Well, anyway. My go-to test of an iterator protocol is how hard it is to write an equivalent to Python’s enumerate(), which takes a list and iterates over its values and their indices. In Python it looks like this:

1
2
3
4
5
for i, value in enumerate(['one', 'two', 'three']):
    print(i, value)
# 0 one
# 1 two
# 2 three

It’s super nice to have, and I’m always amazed when languages with “strong” “support” for iteration don’t have it. Like, C# doesn’t. So if you want to iterate over a list but also need indices, you need to fall back to a C-style for loop. And if you want to iterate over a lazy or arbitrary iterable but also need indices, you need to track it yourself with a counter. Ridiculous.

Here’s my attempt at building it in JavaScript.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function enumerate(iterable) {
    // Return a new iter*able* object with a Symbol.iterator method that
    // returns an iterator.
    return {
        [Symbol.iterator]: function() {
            let iterator = iterable[Symbol.iterator]();
            let i = 0;

            return {
                next: function() {
                    let nextval = iterator.next();
                    if (! nextval.done) {
                        nextval.value = [i, nextval.value];
                        i++;
                    }
                    return nextval;
                },
            };
        },
    };
}
for (let [i, value] of enumerate(['one', 'two', 'three'])) {
    console.log(i, value);
}
// 0 one
// 1 two
// 2 three

Incidentally, for..of (which iterates over a sequence, unlike for..in which iterates over keys — obviously) is finally supported in Edge 12. Hallelujah.

Oh, and let [i, value] is destructuring assignment, which is also a thing now and works with objects as well. You can even use the splat operator with it! Like Python! (And you can use it in function signatures! Like Python! Wait, no, Python decided that was terrible and removed it in 3…)

1
let [x, y, ...others] = ['apple', 'orange', 'cherry', 'banana'];

It’s a Halloween miracle. 🎃

Generators

MDN docs — supported in Firefox 26, Chrome 39, Edge 13, Safari 10

That’s right, JavaScript has goddamn generators now. It’s basically just copying Python and adding a lot of superfluous punctuation everywhere. Not that I’m complaining.

Also, generators are themselves iterable, so I’m going to cut to the chase and rewrite my enumerate() with a generator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function enumerate(iterable) {
    return {
        [Symbol.iterator]: function*() {
            let i = 0;
            for (let value of iterable) {
                yield [i, value];
                i++;
            }
        },
    };
}
for (let [i, value] of enumerate(['one', 'two', 'three'])) {
    console.log(i, value);
}
// 0 one
// 1 two
// 2 three

Amazing. function* is a pretty strange choice of syntax, but whatever? I guess it also lets them make yield only act as a keyword inside a generator, for ultimate backwards compatibility.

JavaScript generators support everything Python generators do: yield* yields every item from a subsequence, like Python’s yield from; generators can return final values; you can pass values back into the generator if you iterate it by hand. No, really, I wasn’t kidding, it’s basically just copying Python. It’s great. You could now built asyncio in JavaScript!

In fact, they did that! JavaScript now has async and await. An async function returns a Promise, which is also a built-in type now. Amazing.

Sets and maps

MDN docs for MapMDN docs for Set — supported in Firefox 13, Chrome 38, IE 11, Safari 7.1

I did not save the best for last. This is much less exciting than generators. But still exciting.

The only data structure in JavaScript is the object, a map where the strings are keys. (Or now, also symbols, I guess.) That means you can’t readily use custom values as keys, nor simulate a set of arbitrary objects. And you have to worry about people mucking with Object.prototype, yikes.

But now, there’s Map and Set! Wow.

Unfortunately, because JavaScript, Map couldn’t use the indexing operators without losing the ability to have methods, so you have to use a boring old method-based API. But Map has convenient methods that plain objects don’t, like entries() to iterate over pairs of keys and values. In fact, you can use a map with for..of to get key/value pairs. So that’s nice.

Perhaps more interesting, there’s also now a WeakMap and WeakSet, where the keys are weak references. I don’t think JavaScript had any way to do weak references before this, so that’s pretty slick. There’s no obvious way to hold a weak value, but I guess you could substitute a WeakSet with only one item.

Template literals

MDN docs — supported in Firefox 34, Chrome 41, Edge 12, Safari 9

Template literals are JavaScript’s answer to string interpolation, which has historically been a huge pain in the ass because it doesn’t even have string formatting in the standard library.

They’re just strings delimited by backticks instead of quotes. They can span multiple lines and contain expressions.

1
2
console.log(`one plus
two is ${1 + 2}`);

Someone decided it would be a good idea to allow nesting more sets of backticks inside a ${} expression, so, good luck to syntax highlighters.

However, someone also had the most incredible idea ever, which was to add syntax allowing user code to do the interpolation — so you can do custom escaping, when absolutely necessary, which is virtually never, because “escaping” means you’re building a structured format by slopping strings together willy-nilly instead of using some API that works with the structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// OF COURSE, YOU SHOULDN'T BE DOING THIS ANYWAY; YOU SHOULD BUILD HTML WITH
// THE DOM API AND USE .textContent FOR LITERAL TEXT.  BUT AS AN EXAMPLE:
function html(literals, ...values) {
    let ret = [];
    literals.forEach((literal, i) => {
        if (i > 0) {
            // Is there seriously still not a built-in function for doing this?
            // Well, probably because you SHOULDN'T BE DOING IT
            ret.push(values[i - 1]
                .replace(/&/g, '&amp;')
                .replace(/</g, '&lt;')
                .replace(/>/g, '&gt;')
                .replace(/"/g, '&quot;')
                .replace(/'/g, '&apos;'));
        }
        ret.push(literal);
    });
    return ret.join('');
}
let username = 'Bob<script>';
let result = html`<b>Hello, ${username}!</b>`;
console.log(result);
// <b>Hello, Bob&lt;script&gt;!</b>

It’s a shame this feature is in JavaScript, the language where you are least likely to need it.

Trailing commas

Remember how you couldn’t do this for ages, because ass-old IE considered it a syntax error and would reject the entire script?

1
2
3
4
5
{
    a: 'one',
    b: 'two',
    c: 'three',  // <- THIS GUY RIGHT HERE
}

Well now it’s part of the goddamn spec and if there’s anything in this post you can rely on, it’s this. In fact you can use AS MANY GODDAMN TRAILING COMMAS AS YOU WANT. But only in arrays.

1
[1, 2, 3,,,,,,,,,,,,,,,,,,,,,,,,,]

Apparently that has the bizarre side effect of reserving extra space at the end of the array, without putting values there.

And more, probably

Like strict mode, which makes a few silent “errors” be actual errors, forces you to declare variables (no implicit globals!), and forbids the completely bozotic with block.

Or String.trim(), which trims whitespace off of strings.

Or… Math.sign()? That’s new? Seriously? Well, okay.

Or the Proxy type, which lets you customize indexing and assignment and calling. Oh. I guess that is possible, though this is a pretty weird way to do it; why not just use symbol-named methods?

You can write Unicode escapes for astral plane characters in strings (or identifiers!), as \u{XXXXXXXX}.

There’s a const now? I extremely don’t care, just name it in all caps and don’t reassign it, come on.

There’s also a mountain of other minor things, which you can peruse at your leisure via MDN or the ECMAScript compatibility tables (note the links at the top, too).

That’s all I’ve got. I still wouldn’t say I’m a big fan of JavaScript, but it’s definitely making an effort to clean up some goofy inconsistencies and solve common problems. I think I could even write some without yelling on Twitter about it now.

On the other hand, if you’re still stuck supporting IE 10 for some reason… well, er, my condolences.

Weekly roundup: Apocalypse

Post Syndicated from Eevee original https://eev.ee/dev/2017/10/02/weekly-roundup-apocalypse/

Uh, hey. What’s up. Been a while. My computer died? Linux abruptly put the primary hard drive in read-only mode, which seemed Really Bad, but then it refused to boot up entirely. I suspect the motherboard was on its last legs (though the drive itself was getting pretty worn out too), so long story short, I lost a week to ordering/building an entirely new machine and rearranging/zeroing hard drives. The old one was six years old, so it was about time anyway.

I also had some… internet stuff… to deal with, so overall I’ve had a rollercoaster of a week. Oh, and now my keyboard is finally starting to break.

  • fox flux: I’m at the point where the protagonists are almost all done and I’ve started touching up particular poses (times ten). So that’s cool. If I hadn’t lost the last week I might’ve been done with it by now!

  • devops: Well, there was that whole computer thing. Also I suddenly have support for colored fonts (read: emoji) in all GTK apps (except Chromium), and that led me to spend at least half a day trying to find a way to get Twemoji into a font using Google’s font extensions. Alas, no dice, so I’m currently stuck with a fairly outdated copy of the Android emoji, which I don’t want to upgrade because Google makes them worse with every revision.

  • blog: I started on a post. I didn’t get very far. I still owe two for September. Oops.

  • book: Did some editing, worked on some illustrations. I figured out how to get math sections to (mostly) use the same font as body text, so inline math doesn’t look quite so comically out of place any more.

  • cc: Fixed some stuff I broke, as usual, and worked some more on a Unity GUI for defining and editing sprite animations.

I’m now way behind and have completely lost all my trains of thought, though I guess having my computer break is a pretty good excuse. Trying to get back up to speed as quickly as possible.

Oh, and happy October. 🎃

GNOME 3.26 released

Post Syndicated from ris original https://lwn.net/Articles/733587/rss

The GNOME Project has announced the release of GNOME 3.26 “Manchester”.
This release brings refinements to the system search, animations for
maximizing and unmaximizing windows and support for color Emoji.

Improvements to core GNOME applications include a redesigned Settings
application, a new display settings panel, Firefox sync in the Web browser,
and many more.” There are openSUSE nightly live images that include
GNOME 3.26.

Grafana 4.3 Release

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/05/23/grafana-4.3-release/

Grafana v4.3 is now available for download.

Release Highlights

  • New Heatmap Panel
  • Graph Panel Histogram Mode
  • Elasticsearch Histogram Aggregation
  • Prometheus Table data format
  • New MySQL Data Source (alpha version to get some early feedback)
  • Dashed lines in the Graph Panel
  • 60+ small fixes and improvements, most of them contributed by our fantastic community!

Check out the New Features in v4.3 Dashboard on the Grafana Play site for
a showcase of these new features.

Histogram Support

A Histogram is a kind of bar chart that groups numbers into ranges, often called buckets or bins. Taller bars show that more data falls in that range.

The Graph Panel now supports Histograms.

Histogram Aggregation Support for Elasticsearch

Elasticsearch is the only supported data source that can return pre-bucketed data (data that is already grouped into ranges). With other data sources there is a risk of returning inaccurate data in a histogram due to using already aggregated data rather than raw data. This release adds support for Elasticsearch pre-bucketed data that can be visualized with the new Heatmap Panel.

Heatmap Panel

The Histogram support in the Graph Panel does not show changes over time – it aggregates all the data together for the chosen time range. To visualize a histogram over time, we have built a new Heatmap Panel.

Every column in a Heatmap is a histogram snapshot. Instead of visualizing higher values with higher bars, a heatmap visualizes higher values with color. The histogram shown above is equivalent to one column in the heatmap shown below.

The Heatmap panel also works with Elasticsearch Histogram Aggregations for more accurate server side bucketing.

MySQL Data Source (alpha)

This release includes a new core data source for MySQL. You can write any possible MySQL query and format it as either Time Series or Table Data allowing it be used with the Graph Panel, Table Panel and SingleStat Panel.

We are still working on the MySQL data source. As it’s missing some important features, like templating and macros and future changes could be breaking, we are
labeling the state of the data source as Alpha. Instead of holding up the release of v4.3 we are including it in its current shape to get some early feedback. So please try it out and let us know what you think on twitter or on our community forum. Is this a feature that you would use? How can we make it better?

The query editor can show the generated and interpolated SQL that is sent to the MySQL server.

The query editor will also show any errors that resulted from running the query (very useful when you have a syntax error!).

Dashed Lines in the Graph Panel

A new Dashes option has been added to Series overrides in the Graph Panel.

Health Check Endpoint

Now you can monitor the monitoring with the Health Check Endpoint! The new /api/health endpoint returns HTTP 200 OK if everything is up and HTTP 503 Error if the Grafana database cannot be pinged.

Lazy Load Panels

Grafana now delays loading panels until they become visible (scrolled into view). This means panels out of view are not sending requests thereby reducing the load on your time series database.

Prometheus – Table Data (column per label)

The Prometheus data source now supports the Table Data format by automatically assigning a column to a label. This makes it really easy to browse data in the table panel.

Improved Alerting Annotation

When an alert is fired, the annotation on the graph now shows information about execution errors or if it was fired due to no data.

Other Highlights From The Changelog

Changes:

  • Table: Support to change column header text #3551
  • InfluxDB: influxdb query builder support for ORDER BY and LIMIT (allows TOPN queries) #6065 Support influxdb’s SLIMIT Feature #7232 thx @thuck
  • Graph: Support auto grid min/max when using log scale #3090, thx @bigbenhur
  • Prometheus: Make Prometheus query field a textarea #7663, thx @hagen1778
  • Server: Support listening on a UNIX socket #4030, thx @mitjaziv

Fixes:

  • MySQL: 4-byte UTF8 not supported when using MySQL database (allows Emojis in Dashboard Names) #7958
  • Dashboard: Description tooltip is not fully displayed #7970

Lots more enhancements and fixes can be found in the Changelog.

Download

Head to the v4.3 download page for download links & instructions.

Thanks

A big thanks to all the Grafana users who contribute by submitting PRs, bug reports, helping out on our community site and providing feedback!

Grafana 4.3 Beta Release

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/05/12/grafana-4.3-beta-release/

Grafana v4.3 Beta is now available for download.

Release Highlights

  • New Heatmap Panel
  • Graph Panel Histogram Mode
  • Elasticsearch Histogram Aggregation
  • Prometheus Table data format
  • New MySQL Data Source (alpha version to get some early feedback)
  • 60+ small fixes and improvements, most of them contributed by our fantastic community!

Check out the New Features in v4.3 Dashboard on the Grafana Play site for
a showcase of these new features.

Histogram Support

A Histogram is a kind of bar chart that groups numbers into ranges, often called buckets or bins. Taller bars show that more data falls in that range.

The Graph Panel now supports Histograms.

Histogram Aggregation Support for Elasticsearch

Elasticsearch is the only supported data source that can return pre-bucketed data (data that is already grouped into ranges). With other data sources there is a risk of returning inaccurate data in a histogram due to using already aggregated data rather than raw data. This release adds support for Elasticsearch pre-bucketed data that can be visualized with the new Heatmap Panel.

Heatmap Panel

The Histogram support in the Graph Panel does not show changes over time – it aggregates all the data together for the chosen time range. To visualize a histogram over time, we have built a new Heatmap Panel.

Every column in a Heatmap is a histogram snapshot. Instead of visualizing higher values with higher bars, a heatmap visualizes higher values with color. The histogram shown above is equivalent to one column in the heatmap shown below.

The Heatmap panel also works with Elasticsearch Histogram Aggregations for more accurate server side bucketing.

MySQL Data Source (alpha)

This release includes a new core data source for MySQL. You can write any possible MySQL query and format it as either Time Series or Table Data allowing it be used with the Graph Panel, Table Panel and SingleStat Panel.

We are still working on the MySQL data source. As it’s missing some important features, like templating and macros and future changes could be breaking, we are
labeling the state of the data source as Alpha. Instead of holding up the release of v4.3 we are including it in its current shape to get some early feedback. So please try it out and let us know what you think on twitter or on our community forum. Is this a feature that you would use? How can we make it better?

The query editor can show the generated and interpolated SQL that is sent to the MySQL server.

The query editor will also show any errors that resulted from running the query (very useful when you have a syntax error!).

Health Check Endpoint

Now you can monitor the monitoring with the Health Check Endpoint! The new /api/health endpoint returns HTTP 200 OK if everything is up and HTTP 503 Error if the Grafana database cannot be pinged.

Lazy Load Panels

Grafana now delays loading panels until they become visible (scrolled into view). This means panels out of view are not sending requests thereby reducing the load on your time series database.

Prometheus – Table Data (column per label)

The Prometheus data source now supports the Table Data format by automatically assigning a column to a label. This makes it really easy to browse data in the table panel.

Other Highlights From The Changelog

Changes:

  • Table: Support to change column header text #3551
  • InfluxDB: influxdb query builder support for ORDER BY and LIMIT (allows TOPN queries) #6065 Support influxdb’s SLIMIT Feature #7232 thx @thuck
  • Graph: Support auto grid min/max when using log scale #3090, thx @bigbenhur
  • Prometheus: Make Prometheus query field a textarea #7663, thx @hagen1778
  • Server: Support listening on a UNIX socket #4030, thx @mitjaziv

Fixes:

  • MySQL: 4-byte UTF8 not supported when using MySQL database (allows Emojis in Dashboard Names) #7958
  • Dashboard: Description tooltip is not fully displayed #7970

Lots more enhancements and fixes can be found in the Changelog.

Download

Head to the v4.3 download page for download links & instructions.

Thanks

A big thanks to all the Grafana users who contribute by submitting PRs, bug reports, helping out on our community site and providing feedback!

Skillz: editing a web page

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/02/skillz-editing-web-page.html

So one of the skillz you ought to have in cybersec is messing with web-pages client-side using Chrome’s Developer Tools. Web-servers give you a bunch of HTML and JavaScript code which, once it reaches your browser, is yours to change and play with. You can do a lot with web-sites that they don’t intend by changing that code.

Let me give you an example. It’s only an example — touching briefly on steps to give you an impression what’s going on. It’s not a ground up explanation of everything, which you may find off-putting. Click on the images to expand them so you can see fully what’s going on.

Today is the American holiday called “Presidents Day”. It’s actually not a federal holiday, but a holiday in all 50 states. Originally it was just Washington’s birthday (February 22), but some states choose to honor other presidents as well, hence “Presidents Day”.
Thus of us who donated to Donald Trump’s campaign (note: I donated to all candidates campaigns back in 2015) received an email today suggesting that to honor Presidents Day, we should “sign a card” for Trump. It’s a gross dis-honoring of the Presidents the day is supposed to commemorate, but whatever, it’s the 21st century.
Okay, let’s say we want to honor the current President with a bunch of 🖕🖕🖕🖕 in order to point out his crassness of exploiting this holiday, and clicked on the URL [*], and filled it in as such (with multiple skin tones for the middle finger, just so he knows its from all of us):
Okay, now we hit the submit button “Add My Name” in order to send this to his campaign. The only problem is, the web page rejects us, telling us “Please enter a valid name” (note, I’m changing font sizes in these screen shots so you can see the message):
This is obviously client side validation of the field. It’s at this point that we go into Developer Tools in order to turn it off. One way is to [right-click] on that button, and from the popup menu, select “Inspect”, which gets you this screen (yes, the original page is squashed to the left-hand side):
We can edit the HTML right there and add the “novalidate” flag, as shown below, then hit the “Add My Name” button again:
This doesn’t work. The scripts on the webpage aren’t honoring the HTML5 “novalidate” flag. Therefore, we’ll have to go edit those scripts. We do that by clicking on the Sources tab, then press [ctrl-shift-f] to open the ‘find’ window in the sources, and type “Please enter a valid name”, and you’ll find the JavaScript source file (validation.js) where the validation function is located:
If at this point you find all these windows bewildering, then yes, you are on the right track. We typed in the search there near the bottom next to the classic search icon 🔍. Then right below that we got the search results. We clicked on the search results, then up above popped up the source file (validation.js) among all the possible source files with the line selected that contains our search term. Remember: when you pull down a single HTML page, like the one from donaldtrump.com, it can pull in a zillion JavaScript files as well.
Unlike the HTML, we can’t change the JavaScript on the fly (at least, I don’t know how to). Instead, we have to run more JavaScript. Specifically, we need to run a script that registers a new validation function. If you look in the original source, it contains a function that validates the input by making sure it matches a regular expression:
  1. jQuery.validator.addMethod(“isname”, function(value, element) {
  2.     return this.optional(element) || (/^[a-zA-Z]+[ ]+(([‘,. -][a-zA-Z ])?[a-zA-Z]*)+.?$/.test(value.trim()));
  3. }, “Please enter a valid name”);
From the console, we are going to call the addMethod function ourselves to register a different validation function for isname, specifically a validation function that always returns true, meaning the input is valid. This will override the previously registered function. As the Founders of our country say, the solution to bad JavaScript is not to censor it, but to add more JavaScript.
  1. jQuery.validator.addMethod(“isname”, function () {
  2.     return true});
We just type that in the Console as shown below (in the bottom window where Search used to be) and hit [enter]. It gives us the response “undefined”, but that’s OK. (Note: in the screenshot I misspelled it as isName, it should instead be all lowercase isname).
Now we can close Developer Tools and press the “Add My Name” button, and we get the following response:
Darn, foiled again. But at least this time, our request went to the server. It was on the server side that the request was rejected. We successfully turned off client-side checking. Had the server accepted our Unicode emoji, we would’ve reached the next step, where it asks for donations. (By the way, the entire purpose of “sign this card” is to get users to donate, nothing else).

Conclusion

So we didn’t actually succeed at doing anything here, but I thought I’d write it up anyway. Editing the web-page client-side, or mucking around with JavaScript client-side, is a skill that every cybersec professional should have. Hopefully, this is an amusing enough example that people will follow the steps to see how this is done.

Eevee gained 2611 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2017/01/21/eevee-gained-2611-experience-points/

Eevee grew to level 30!

Wow! What an incredible fucking year. I’ve never done so much in my life.

I made a video game! Twice! Thrice, even, if you count the one I just finished, which was done on the week surrounding my birthday. Mel and I made Under Construction on the PICO-8 in a few weeks, then I used its engine to make Isaac’s Descent on my own for Ludum Dare, and we just published the more substantial LÖVE-based NEON PHASE for my own week-long game jam. I’ve also got a couple other things in progress, and Mel is at this very second working on NEON PHASE 2.

And we made these from scratch. I wrote a little collision engine! It even mostly works! That’s amazing and I’m so happy about it. It’s also great that Mel and I have something we can actually work on together; before, the most collaborating we really did was on web design.

My interest in Doom waxes and wanes, but I did find time to finally make a real map and later make some speedmaps (which I realize I never fully published, oops!). I even put some effort into embedding Lua in ZDoom, which is an interesting side project but probably not something I’m going to actively champion.

I wrote a few Twitter bots! @perlin_noise puts my post on Perlin noise to use by tweeting images, GIFs, and emoji grids generated from various kinds of Perlin noise. @unicodeveryword tweets every word in the English language that’s also in my wordlist, slightly scrambled by Unicode shenanigans. @calloutbot is mostly an injoke.

I wrote 45 posts comprising 195k words, which is about four novels, and that’s not counting the couple of articles I was paid to write a few months ago. I have Patreon to thank for getting me to write more often, though I recently dropped the writing obligation so I can spend more time on games and a book and other stuff.

Oh, yes, I actually started writing a book, in the sense that I have (quite a bit) more than a few pages written and a concept that I like and am sticking to.

I dipped a toe into music, however briefly, and managed to produce a simple song that’s not too bad. Promising!

I feel like I’ve been slacking a lot on art lately, but looking back, this year saw my first animation (and several more after that), a mountain of ink-on-paper work for Inktober, and of course a lot of exploration of pixel art for games. I haven’t done as much digital illustrating as I’d like in the last few months, perhaps, but I’m definitely creating stuff I’d never have imagined I could do two years ago.

Veekun? Ah, hm. I did write some gen 1 dumping stuff, much of which can be reused for gen 2, so that’s good. Some of the code got modernized a bit, and I made some good progress on gen 6/7. I also wrote that model viewer, which is pretty dang cool. So, still making progress, just slowly. It’s obviously not something that holds my attention as strongly nowadays, but I’d still like to get the site redone — hopefully that’ll make it easier to understand and work on, both for me and other people.

Even after all this, I’m surely forgetting a few more minor things! I’m really happy about how far I’ve come in a year, and I can’t wait to see what this coming year has in store.

Eevee gained 2611 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2017/01/21/eevee-gained-2611-experience-points/

Eevee grew to level 30!

Wow! What an incredible fucking year. I’ve never done so much in my life.

I made a video game! Twice! Thrice, even, if you count the one I just finished, which was done on the week surrounding my birthday. Mel and I made Under Construction on the PICO-8 in a few weeks, then I used its engine to make Isaac’s Descent on my own for Ludum Dare, and we just published the more substantial LÖVE-based NEON PHASE for my own week-long game jam. I’ve also got a couple other things in progress, and Mel is at this very second working on NEON PHASE 2.

And we made these from scratch. I wrote a little collision engine! It even mostly works! That’s amazing and I’m so happy about it. It’s also great that Mel and I have something we can actually work on together; before, the most collaborating we really did was on web design.

My interest in Doom waxes and wanes, but I did find time to finally make a real map and later make some speedmaps (which I realize I never fully published, oops!). I even put some effort into embedding Lua in ZDoom, which is an interesting side project but probably not something I’m going to actively champion.

I wrote a few Twitter bots! @perlin_noise puts my post on Perlin noise to use by tweeting images, GIFs, and emoji grids generated from various kinds of Perlin noise. @unicodeveryword tweets every word in the English language that’s also in my wordlist, slightly scrambled by Unicode shenanigans. @calloutbot is mostly an injoke.

I wrote 45 posts comprising 195k words, which is about four novels, and that’s not counting the couple of articles I was paid to write a few months ago. I have Patreon to thank for getting me to write more often, though I recently dropped the writing obligation so I can spend more time on games and a book and other stuff.

Oh, yes, I actually started writing a book, in the sense that I have (quite a bit) more than a few pages written and a concept that I like and am sticking to.

I dipped a toe into music, however briefly, and managed to produce a simple song that’s not too bad. Promising!

I feel like I’ve been slacking a lot on art lately, but looking back, this year saw my first animation (and several more after that), a mountain of ink-on-paper work for Inktober, and of course a lot of exploration of pixel art for games. I haven’t done as much digital illustrating as I’d like in the last few months, perhaps, but I’m definitely creating stuff I’d never have imagined I could do two years ago.

Veekun? Ah, hm. I did write some gen 1 dumping stuff, much of which can be reused for gen 2, so that’s good. Some of the code got modernized a bit, and I made some good progress on gen 6/7. I also wrote that model viewer, which is pretty dang cool. So, still making progress, just slowly. It’s obviously not something that holds my attention as strongly nowadays, but I’d still like to get the site redone — hopefully that’ll make it easier to understand and work on, both for me and other people.

Even after all this, I’m surely forgetting a few more minor things! I’m really happy about how far I’ve come in a year, and I can’t wait to see what this coming year has in store.

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca language we have… because we have no other lingua franca languages.

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I complain about argument order and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Bash” generally means zsh and ksh and other derivatives as well, and when referring to expressions, means the $(( ... )) syntax; “Unix shells” means Bourne and thus almost certainly everything else as well. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Ruby, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: C#, D, expr, JavaScript, Perl 5, PHP.

Wisened up: Bash, F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are ?& ?| ?^), Python, Ruby, SQL, Swift.

Special mention: Java has C’s precedence, but forbids using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: Bash, C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Bash, Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Scala.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Racket (#o), Ruby, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), Bash, COBOL, CoffeeScript, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript has both for...in and for...of, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: C#, Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Swift (has a fallthrough statement), Unix shells.

Special mention: D requires break, but requires something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: Java, Perl 6

Looks Lovely: Python 3 (annotation syntax and the typing module), Rust, Swift, TypeScript

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), Unix shells (everything’s a string, but even arrays/scalars are somewhat interchangeable)

Strong: Rust (even numeric upcasts must be explicit).

Special mention: Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: Bash, bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid).

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Perl 6, Python, and probably others have separate integral division operators: div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust.

Special mention: Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Autoincrement and autodecrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood:

Double plus good: Python

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter), but I believe the named forms are preferred.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Go, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Lua.

Special mention: Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us!

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation, so I think it’s all sugar for a monadic return value. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in), R.

Nullable types: Swift.

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Some languages go entirely the opposite direction and make everything an expression, including block constructs like if. In those languages, it makes sense for assignment to be an expression, for consistency with everything else.

Assignment’s an expression: ACS, C#, D, Java, JavaScript, Perl, PHP, Swift.

Everything’s an expression: Ruby, Rust.

Assignment’s a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. Functional languages generally don’t have an assignment operator. Rust has a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low score: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-named: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: JavaScript (kinda — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca we have… because we have no other lingua francas. Linguae franca. Surgeons general?

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I call it “weakly typed” and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Unix shells” means some handwaved combination that probably includes bash and its descendants; for expressions, it means the (( ... )) syntax. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files. Ruby doesn’t automatically give required files their own namespace, but doesn’t evaluate them in the caller’s namespace either.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: D, expr, JavaScript, Perl 5, PHP.

Wisened up: F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are +& +| +^), Python, Ruby, Rust, SQL, Swift, Unix shells.

Special mention: C# and Java have C’s precedence, but forbid using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. (Well, since C99; before that it was unspecified, which is probably worse.) This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, Unix shells, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Unix shells.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Ruby, Rust, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, Scala, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), COBOL, CoffeeScript, ECMAScript 7, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby, Unix shells.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), ECMAScript 6, Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript (pre ES6) has both for...in and for each...in, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Ruby, Swift (has a fallthrough statement), Unix shells.

Special mention: C# and D require break, but require something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: ACS, ALGOL, C#, D (though [] goes on the type), Fortran, Java, Perl 6.

Looks Lovely: Ada, Boo, F#, Go, Python 3 (via annotation syntax and the typing module), Rust, Swift, TypeScript.

Special mention: BASIC uses trailing type sigils to indicate scalar types.

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), PHP, Unix shells (almost everything’s a string, but even arrays/scalars are somewhat interchangeable).

Strong: F#, Go (explicit numeric casts), Haskell, Python, Rust (explicit numeric casts).

Special mention: ACS only has integers; even fixed-point values are stored in integers, and the compiler has no notion of a fixed-point type, making it the weakest language imaginable. C++ and Scala both allow defining implicit conversions, for better or worse. Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus. Tcl only has strings.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid), Unix shells.

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Haskell, Perl 6, Python, and probably others have separate integral division operators: div, div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Go, Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust, Swift (even gets combining characters right!).

Special mention: Go’s strings are explicitly arbitrary byte sequences, but iterating over a string with for..range decodes UTF-8 code points. Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Hmm. This one is kind of hard to track down for sure without either knowing a lot about internals or installing fifty different interpreters/compilers.

Increment and decrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood: ACS, awk, C#, D, Go, Java, JavaScript, Perl, Unix shells, Vala.

Double plus good: Lua (which doesn’t have += either), Python, Ruby, Rust, Swift (removed in v3).

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Inform 7, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Inform 7 often blends the negation into the verb, e.g. if the player does not have.... Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter); I believe Perl 5 prefers the words and Ruby prefers the symbols. Perl and Ruby also both have a separate unless block, with the opposite meaning to if. Python has is not and not in operators.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Go, Lua.

Special mention: C# has explicit syntax for out parameters, but it’s a compile-time error to not assign to all of them, which is slightly better than C. Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us! And even Unix shells have set -e; it’s just opt-in.

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about; it also allows both the assignment and the error check together in the header of an if. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation — apparently sugar for an “out” parameter holding an error. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in).

Nullable types: Swift (sugar for a monad).

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that. R and SQL have a more mathematical kind of NULL, which tends to e.g. vanish from lists.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Freedom of expression: ACS, C#, Java, JavaScript, Perl, PHP, Swift.

Makes a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. D allows variable declaration as an expression, so if (int x = 3) is allowed, but regular assignment is not. Functional languages generally don’t have an assignment operator. Go disallows assignment as an expression, but assignment and a test can appear together in an if condition, and this is an idiomatic way to check success. Ruby makes everything an expression, so assignment might as well be too. Rust makes everything an expression, but assignment evaluates to the useless () value (due to ownership rules), so it’s not actually useful. Rust and Swift both have a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low scoring: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-designed: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: Go (ASI), JavaScript (ASI — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons. MUMPS doesn’t support nesting at all, but I believe there are extensions that use dots to indicate it.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. (This approach is called automatic semicolon insertion.) That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

2016-08-17 java, unicode, emoji

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3316

Чудех се дали да кръстя това “fuck you, Java”.

От някакво време гледаме проблеми с пращането на emoticon-и (ако не знаете, unicode стана пълен с всякакви лайна) – в някакви случаи се намазват, не пристигат вярно и т.н.. При нас през JNI стринговете се пращат до един C/C++ lib, от който всъщност излизат през мрежата от там, като по пътя има малко debug, който да каже какво излиза…

Седя аз и гледам как за намазващото се emoji по някаква причина получавам 6 байта от java-та, вместо 4 (което очаквам). 6-те байта ми изглеждат странно, не наподобяват UTF-8 (поне това, което аз знам), и след някакво четене откривам, че педе^Wпрекрасните хора от java под UTF-8 разбират modified UTF-8, или някакви неща като по-големите unicode символи се кодират така, че нищо друго не може да ги схване, освен друга java. Това обърква SMS центровете по пътя и всякакви други реализации и води до странни за дебъгване проблеми.

Разбира се, това се случва само с определени много много големи emoji-та, които ги има само на определени телефони в стандартните клавиатури, което па води до много фалшиви следи, като например “това са го строшили в android 5 при samsung”, “сървъра ги яде”, “космически излъчвания” и т.н..
(“клавиатурите” в android са софтуерни компоненти, дето явно всеки малоумен производител си пише сам)

Решението поне в нашия случай е в JNI-то да се превежда техния utf8 до нормалния. Ровейки се из интернета, намерих още подобни оплаквания, но явно не всички успяват да се ударят в това, понеже преди това се оказва, че например mysql-ската им база не ползва верния storage type, node.js-а им има само UCS2, който па хептен не може да ги събере и т.н..

Много хора просто казват “не ползвайте тия работи”. Сериозно ми се иска и за мен да беше опция…