Tag Archives: uae

In the Works – AWS Region in the Middle East

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-region-in-the-middle-east/

Last year we launched new AWS Regions in Canada, India, Korea, the UK, and the United States, and announced that new regions are coming to China, France, Hong Kong, Sweden, and a second GovCloud Region in the US throughout 2017 and 2018.

Middle East Region by Early 2019
Today, I am happy to announce that we will be opening an AWS Region in the Middle East by early 2019. The new Region will be based in Bahrain, will be comprised of three Availability Zones at launch, and will give AWS customers and partners the ability to run their workloads and store their data in the Middle East.

AWS customers are already making use of 44 Availability Zones across 16 geographic regions. Today’s announcement brings the total number of global regions (operational and in the works) up to 22.

UAE Edge Location in 2018
We also plan to open an edge location in the UAE in the first quarter of 2018. This will bring Amazon CloudFront, Amazon Route 53, AWS Shield, and AWS WAF to the region, adding to our existing set of 78 points of presence world-wide.

These announcements add to our continued investment in the Middle East. Earlier this year we announced the opening of AWS offices in Dubai, UAE and Manama, Bahrain. Prior to this we have supported the growth of technology education in the region with AWS Educate and have supported the growth of new businesses through AWS Activate for many years.

The addition of AWS infrastructure in the Middle East will help countries across the region to innovate, grow their economies, and pursue their vision plans (Saudi Vision 2030, UAE Vision 2021, Bahrain Vision 2030, and so forth).

Talk to Us
As always, we are looking forward to serving new and existing customers in the Middle East and working with partners across the region. Of course, the new Region will also be open to existing AWS customers who would like to serve users in the Middle East.

To learn more about the AWS Middle East Region feel free to contact our team at [email protected] .

If you are interested in joining the team and would like to learn more about AWS positions in the Middle East, take a look at the Amazon Jobs site.

Jeff;

Surveillance and our Insecure Infrastructure

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/04/surveillance_an_2.html

Since Edward Snowden revealed to the world the extent of the NSA’s global surveillance network, there has been a vigorous debate in the technological community about what its limits should be.

Less discussed is how many of these same surveillance techniques are used by other — smaller and poorer — more totalitarian countries to spy on political opponents, dissidents, human rights defenders; the press in Toronto has documented some of the many abuses, by countries like Ethiopia , the UAE, Iran, Syria, Kazakhstan , Sudan, Ecuador, Malaysia, and China.

That these countries can use network surveillance technologies to violate human rights is a shame on the world, and there’s a lot of blame to go around.

We can point to the governments that are using surveillance against their own citizens.

We can certainly blame the cyberweapons arms manufacturers that are selling those systems, and the countries — mostly European — that allow those arms manufacturers to sell those systems.

There’s a lot more the global Internet community could do to limit the availability of sophisticated Internet and telephony surveillance equipment to totalitarian governments. But I want to focus on another contributing cause to this problem: the fundamental insecurity of our digital systems that makes this a problem in the first place.

IMSI catchers are fake mobile phone towers. They allow someone to impersonate a cell network and collect information about phones in the vicinity of the device and they’re used to create lists of people who were at a particular event or near a particular location.

Fundamentally, the technology works because the phone in your pocket automatically trusts any cell tower to which it connects. There’s no security in the connection protocols between the phones and the towers.

IP intercept systems are used to eavesdrop on what people do on the Internet. Unlike the surveillance that happens at the sites you visit, by companies like Facebook and Google, this surveillance happens at the point where your computer connects to the Internet. Here, someone can eavesdrop on everything you do.

This system also exploits existing vulnerabilities in the underlying Internet communications protocols. Most of the traffic between your computer and the Internet is unencrypted, and what is encrypted is often vulnerable to man-in-the-middle attacks because of insecurities in both the Internet protocols and the encryption protocols that protect it.

There are many other examples. What they all have in common is that they are vulnerabilities in our underlying digital communications systems that allow someone — whether it’s a country’s secret police, a rival national intelligence organization, or criminal group — to break or bypass what security there is and spy on the users of these systems.

These insecurities exist for two reasons. First, they were designed in an era where computer hardware was expensive and inaccessibility was a reasonable proxy for security. When the mobile phone network was designed, faking a cell tower was an incredibly difficult technical exercise, and it was reasonable to assume that only legitimate cell providers would go to the effort of creating such towers.

At the same time, computers were less powerful and software was much slower, so adding security into the system seemed like a waste of resources. Fast forward to today: computers are cheap and software is fast, and what was impossible only a few decades ago is now easy.

The second reason is that governments use these surveillance capabilities for their own purposes. The FBI has used IMSI-catchers for years to investigate crimes. The NSA uses IP interception systems to collect foreign intelligence. Both of these agencies, as well as their counterparts in other countries, have put pressure on the standards bodies that create these systems to not implement strong security.

Of course, technology isn’t static. With time, things become cheaper and easier. What was once a secret NSA interception program or a secret FBI investigative tool becomes usable by less-capable governments and cybercriminals.

Man-in-the-middle attacks against Internet connections are a common criminal tool to steal credentials from users and hack their accounts.

IMSI-catchers are used by criminals, too. Right now, you can go onto Alibaba.com and buy your own IMSI catcher for under $2,000.

Despite their uses by democratic governments for legitimate purposes, our security would be much better served by fixing these vulnerabilities in our infrastructures.

These systems are not only used by dissidents in totalitarian countries, they’re also used by legislators, corporate executives, critical infrastructure providers, and many others in the US and elsewhere.

That we allow people to remain insecure and vulnerable is both wrongheaded and dangerous.

Earlier this month, two American legislators — Senator Ron Wyden and Rep Ted Lieu — sent a letter to the chairman of the Federal Communications Commission, demanding that he do something about the country’s insecure telecommunications infrastructure.

They pointed out that not only are insecurities rampant in the underlying protocols and systems of the telecommunications infrastructure, but also that the FCC knows about these vulnerabilities and isn’t doing anything to force the telcos to fix them.

Wyden and Lieu make the point that fixing these vulnerabilities is a matter of US national security, but it’s also a matter of international human rights. All modern communications technologies are global, and anything the US does to improve its own security will also improve security worldwide.

Yes, it means that the FBI and the NSA will have a harder job spying, but it also means that the world will be a safer and more secure place.

This essay previously appeared on AlJazeera.com.

Hacker Leaks Cellebrite’s Phone-Hacking Tools

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/hacker_leaks_ce.html

In January we learned that a hacker broke into Cellebrite’s network and stole 900GB of data. Now the hacker has dumped some of Cellebrite’s phone-hacking tools on the Internet.

In their README, the hacker notes much of the iOS-related code is very similar to that used in the jailbreaking scene­a community of iPhone hackers that typically breaks into iOS devices and release its code publicly for free.

Jonathan Zdziarski, a forensic scientist, agreed that some of the iOS files were nearly identical to tools created and used by the jailbreaking community, including patched versions of Apple’s firmware designed to break security mechanisms on older iPhones. A number of the configuration files also reference “limera1n,” the name of a piece of jailbreaking software created by infamous iPhone hacker Geohot. He said he wouldn’t call the released files “exploits” however.

Zdziarski also said that other parts of the code were similar to a jailbreaking project called QuickPwn, but that the code had seemingly been adapted for forensic purposes. For example, some of the code in the dump was designed to brute force PIN numbers, which may be unusual for a normal jailbreaking piece of software.

“If, and it’s a big if, they used this in UFED or other products, it would indicate they ripped off software verbatim from the jailbreak community and used forensically unsound and experimental software in their supposedly scientific and forensically validated products,” Zdziarski continued.

If you remember, Cellebrite was the company that supposedly helped the FBI break into the San Bernadino terrorist iPhone. (I say “supposedly,” because the evidence is unclear.) We do know that they provide this sort of forensic assistance to countries like Russia, Turkey, and the UAE — as well as to many US jurisdictions.

As Cory Doctorow points out:

…suppressing disclosure of security vulnerabilities in commonly used tools does not prevent those vulnerabilities from being independently discovered and weaponized — it just means that users, white-hat hackers and customers are kept in the dark about lurking vulnerabilities, even as they are exploited in the wild, which only end up coming to light when they are revealed by extraordinary incidents like this week’s dump.

We are all safer when vulnerabilities are reported and fixed, not when they are hoarded and used in secret.

Slashdot thread.

Attributing the DNC Hacks to Russia

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/01/attributing_the_1.html

President Barack Obama’s public accusation of Russia as the source of the hacks in the US presidential election and the leaking of sensitive e-mails through WikiLeaks and other sources has opened up a debate on what constitutes sufficient evidence to attribute an attack in cyberspace. The answer is both complicated and inherently tied up in political considerations.

The administration is balancing political considerations and the inherent secrecy of electronic espionage with the need to justify its actions to the public. These issues will continue to plague us as more international conflict plays out in cyberspace.

It’s true that it’s easy for an attacker to hide who he is in cyberspace. We are unable to identify particular pieces of hardware and software around the world positively. We can’t verify the identity of someone sitting in front of a keyboard through computer data alone. Internet data packets don’t come with return addresses, and it’s easy for attackers to disguise their origins. For decades, hackers have used techniques such as jump hosts, VPNs, Tor and open relays to obscure their origin, and in many cases they work. I’m sure that many national intelligence agencies route their attacks through China, simply because everyone knows lots of attacks come from China.

On the other hand, there are techniques that can identify attackers with varying degrees of precision. It’s rarely just one thing, and you’ll often hear the term “constellation of evidence” to describe how a particular attacker is identified. It’s analogous to traditional detective work. Investigators collect clues and piece them together with known mode of operations. They look for elements that resemble other attacks and elements that are anomalies. The clues might involve ones and zeros, but the techniques go back to Sir Arthur Conan Doyle.

The University of Toronto-based organization Citizen Lab routinely attributes attacks against the computers of activists and dissidents to particular Third World governments. It took months to identify China as the source of the 2012 attacks against the New York Times. While it was uncontroversial to say that Russia was the source of a cyberattack against Estonia in 2007, no one knew if those attacks were authorized by the Russian government — until the attackers explained themselves. And it was the Internet security company CrowdStrike, which first attributed the attacks against the Democratic National Committee to Russian intelligence agencies in June, based on multiple pieces of evidence gathered from its forensic investigation.

Attribution is easier if you are monitoring broad swaths of the Internet. This gives the National Security Agency a singular advantage in the attribution game. The problem, of course, is that the NSA doesn’t want to publish what it knows.

Regardless of what the government knows and how it knows it, the decision of whether to make attribution evidence public is another matter. When Sony was attacked, many security experts — myself included­ — were skeptical of both the government’s attribution claims and the flimsy evidence associated with it. I only became convinced when the New York Times ran a story about the government’s attribution, which talked about both secret evidence inside the NSA and human intelligence assets inside North Korea. In contrast, when the Office of Personnel Management was breached in 2015, the US government decided not to accuse China publicly, either because it didn’t want to escalate the political situation or because it didn’t want to reveal any secret evidence.

The Obama administration has been more public about its evidence in the DNC case, but it has not been entirely public.

It’s one thing for the government to know who attacked it. It’s quite another for it to convince the public who attacked it. As attribution increasingly relies on secret evidence­ — as it did with North Korea’s attack of Sony in 2014 and almost certainly does regarding Russia and the previous election — ­the government is going to have to face the choice of making previously secret evidence public and burning sources and methods, or keeping it secret and facing perfectly reasonable skepticism.

If the government is going to take public action against a cyberattack, it needs to make its evidence public. But releasing secret evidence might get people killed, and it would make any future confidentiality assurances we make to human sources completely non-credible. This problem isn’t going away; secrecy helps the intelligence community, but it wounds our democracy.

The constellation of evidence attributing the attacks against the DNC, and subsequent release of information, is comprehensive. It’s possible that there was more than one attack. It’s possible that someone not associated with Russia leaked the information to WikiLeaks, although we have no idea where that someone else would have obtained the information. We know that the Russian actors who hacked the DNC­ — both the FSB, Russia’s principal security agency, and the GRU, Russia’s military intelligence unit — ­are also attacking other political networks around the world.

In the end, though, attribution comes down to whom you believe. When Citizen Lab writes a report outlining how a United Arab Emirates human rights defender was targeted with a cyberattack, we have no trouble believing that it was the UAE government. When Google identifies China as the source of attacks against Gmail users, we believe it just as easily.

Obama decided not to make the accusation public before the election so as not to be seen as influencing the election. Now, afterward, there are political implications in accepting that Russia hacked the DNC in an attempt to influence the US presidential election. But no amount of evidence can convince the unconvinceable.

The most important thing we can do right now is deter any country from trying this sort of thing in the future, and the political nature of the issue makes that harder. Right now, we’ve told the world that others can get away with manipulating our election process as long as they can keep their efforts secret until after one side wins. Obama has promised both secret retaliations and public ones. We need to hope they’re enough.

This essay previously appeared on CNN.com.

EDITED TO ADD: The ODNI released a declassified report on the Russian attacks. Here’s a New York Times article on the report.

And last week there were Senate hearings on this issue.

EDITED TO ADD: A Washington Post article talks about some of the intelligence behind the assessment.

EDITED TO ADD (1/10): The UK connection.

How Signal Is Evading Censorship

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/12/how_signal_is_e.html

Signal, the encrypted messaging app I prefer, is being blocked in both Egypt and the UAE. Recently, the Signal team developed a workaround: domain fronting.

Signal’s new anti-censorship feature uses a trick called “domain fronting,” Marlinspike explains. A country like Egypt, with only a few small internet service providers tightly controlled by the government, can block any direct request to a service on its blacklist. But clever services can circumvent that censorship by hiding their traffic inside of encrypted connections to a major internet service, like the content delivery networks (CDNs) that host content closer to users to speed up their online experience — or in Signal’s case, Google’s App Engine platform, designed to host apps on Google’s servers.

“Now when people in Egypt or the United Arab Emirates send a Signal message, it’ll look identical to something like a Google search,” Marlinspike says. “The idea is that using Signal will look like using Google; if you want to block Signal you’ll have to block Google.”

The trick works because Google’s App Engine allows developers to redirect traffic from Google.com to their own domain. Google’s use of TLS encryption means that contents of the traffic, including that redirect request, are hidden, and the internet service provider can see only that someone has connected to Google.com. That essentially turns Google into a proxy for Signal, bouncing its traffic and fooling the censors.

This isn’t a new trick (Tor uses it too, for example), but it does work.

Opening Soon – AWS Office in Dubai to Support Cloud Growth in UAE

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/opening-soon-aws-office-in-dubai-to-support-cloud-growth-in-uae/

The AWS office in Dubai, UAE will open on January 1, 2017.

We’ve been working with the Dubai Investment Development Agency (Dubai FDI) to launch the office, and plan to support startups, government institutions, and some of the Middle East’s historic and most established enterprises as they make the transition to the AWS Cloud.

Resources in Dubai
The office will be staffed with account managers, solutions architects, partner managers, professional services consultant, and support staff to allow customers to interact with AWS in their local setting and language.

In addition to access to the AWS team, customers in the Middle East have access to several important AWS programs including AWS Activate and AWS Educate:

  • AWS Activate is designed to provide startups with resources that will help them to get started on AWS, including up to $100,000 (USD) in AWS promotional credits.
  • AWS Educate is a global initiative designed to provide students and educators with the resources needed to accelerate cloud-based learning endeavors.
  • AWS Training and Certification helps technologists to develop the skills to design, deploy, and operate infrastructure in the AWS Cloud.

We are also planning to host AWSome days and other large-scale training events in the region.

Customers in the Middle East
Middle Eastern organizations were among the earliest adopters of cloud services when AWS launched in 2006. Customers based in the region are using AWS to run everything from development and test environments to Big Data analytics, from mobile, web and social applications to enterprise business applications and mission critical workloads.

AWS counts some of the UAE’s most well-known and fastest growing businesses as customers, including PayFort and Careem, as well as government institutions and some of the largest companies in the Middle East, such as flydubai and Middle East Broadcasting Center.

Careem is the leading ride booking app in the Middle East and North Africa. Launched in 2012, Careem runs totally on AWS and over the past three years has grown by 10x in size every year. This is growth that would not have been possible without AWS. After starting with one city, Dubai, Careem now serves millions of commuters in 43 cities across the Middle East, North Africa and Asia. Careem uses over 500 EC2 instances as well as a number of other services such as Amazon S3, Amazon DynamoDB, Elastic Beanstalk and others.

PayFort is a startup based in the United Arab Emirates that provides payment solutions to customers across the Middle East through its payments gateway, FORT. The platform enables organizations to accept online payments via debit and credit cards. PayFort counts Etihad Airways, Ferrari World, and Souq.com among its customers. PayFort chose to run FORT entirely on AWS technologies and as a result is saving 32% over their on-premises costs. Although cost was key for PayFort, it turns our that they chose AWS due to the high level of security that they could achieve with the platform. Compliance with Payment Card Industry Data Security Standard (PCI DSS) and International Organization for Standards (ISO) 27001 are central to PayFort’s payment services, both of which are available with AWS (we were actually the first cloud provider to reach compliance with version 3.1 of PCI DSS).

Fly Dubai is the leading low-cost airline in the Middle East, with over 90 destinations, and was launched by the government of Dubai in 2009. flydubai chose to build their online check-in platform on AWS and went from design to production in four months where it is now being used by thousands of passengers a day – this timeline would not have been possible without the cloud. Given the seasonal fluctuations in demand for flights, flydubai also needs an IT provider that allows it to cope with spikes in demand. Using AWS allows them to do this and lead times for new infrastructure services have been reduced from up to 10 weeks to a matter of hours.

Partners
The AWS Partner Network of consulting and technology partners in the region helps our customers to get the most from the cloud. The network includes global members like CSC as well as prominent regional members such as Redington.

Redington is an AWS Consulting Partner and is the Master Value Added Distributor for AWS in Middle East and North Africa. They are also an Authorized Commercial Reseller of AWS cloud technologies. Redington is helping organizations in the MEA region with cloud assessment, cloud readiness, design, implementation, migration, deployment and optimization of cloud resources. They also have an ecosystem of partners including ISV’s with experienced and certified AWS engineers with cross domain experience.

Join Us
This announcement is part of our continued expansion across Europe, the Middle East, and Asia. As part of our investment in these areas, we created over 10,000 new jobs in 2015. If you are interested in joining our team in Dubai or in any other location around the world, check out the Amazon Jobs site.

Jeff;

 

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca we have… because we have no other lingua francas. Linguae franca. Surgeons general?

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I call it “weakly typed” and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Unix shells” means some handwaved combination that probably includes bash and its descendants; for expressions, it means the (( ... )) syntax. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files. Ruby doesn’t automatically give required files their own namespace, but doesn’t evaluate them in the caller’s namespace either.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: D, expr, JavaScript, Perl 5, PHP.

Wisened up: F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are +& +| +^), Python, Ruby, Rust, SQL, Swift, Unix shells.

Special mention: C# and Java have C’s precedence, but forbid using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. (Well, since C99; before that it was unspecified, which is probably worse.) This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, Unix shells, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Unix shells.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Ruby, Rust, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, Scala, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), COBOL, CoffeeScript, ECMAScript 7, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby, Unix shells.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), ECMAScript 6, Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript (pre ES6) has both for...in and for each...in, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Ruby, Swift (has a fallthrough statement), Unix shells.

Special mention: C# and D require break, but require something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: ACS, ALGOL, C#, D (though [] goes on the type), Fortran, Java, Perl 6.

Looks Lovely: Ada, Boo, F#, Go, Python 3 (via annotation syntax and the typing module), Rust, Swift, TypeScript.

Special mention: BASIC uses trailing type sigils to indicate scalar types.

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), PHP, Unix shells (almost everything’s a string, but even arrays/scalars are somewhat interchangeable).

Strong: F#, Go (explicit numeric casts), Haskell, Python, Rust (explicit numeric casts).

Special mention: ACS only has integers; even fixed-point values are stored in integers, and the compiler has no notion of a fixed-point type, making it the weakest language imaginable. C++ and Scala both allow defining implicit conversions, for better or worse. Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus. Tcl only has strings.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid), Unix shells.

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Haskell, Perl 6, Python, and probably others have separate integral division operators: div, div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Go, Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust, Swift (even gets combining characters right!).

Special mention: Go’s strings are explicitly arbitrary byte sequences, but iterating over a string with for..range decodes UTF-8 code points. Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Hmm. This one is kind of hard to track down for sure without either knowing a lot about internals or installing fifty different interpreters/compilers.

Increment and decrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood: ACS, awk, C#, D, Go, Java, JavaScript, Perl, Unix shells, Vala.

Double plus good: Lua (which doesn’t have += either), Python, Ruby, Rust, Swift (removed in v3).

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Inform 7, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Inform 7 often blends the negation into the verb, e.g. if the player does not have.... Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter); I believe Perl 5 prefers the words and Ruby prefers the symbols. Perl and Ruby also both have a separate unless block, with the opposite meaning to if. Python has is not and not in operators.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Go, Lua.

Special mention: C# has explicit syntax for out parameters, but it’s a compile-time error to not assign to all of them, which is slightly better than C. Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us! And even Unix shells have set -e; it’s just opt-in.

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about; it also allows both the assignment and the error check together in the header of an if. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation — apparently sugar for an “out” parameter holding an error. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in).

Nullable types: Swift (sugar for a monad).

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that. R and SQL have a more mathematical kind of NULL, which tends to e.g. vanish from lists.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Freedom of expression: ACS, C#, Java, JavaScript, Perl, PHP, Swift.

Makes a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. D allows variable declaration as an expression, so if (int x = 3) is allowed, but regular assignment is not. Functional languages generally don’t have an assignment operator. Go disallows assignment as an expression, but assignment and a test can appear together in an if condition, and this is an idiomatic way to check success. Ruby makes everything an expression, so assignment might as well be too. Rust makes everything an expression, but assignment evaluates to the useless () value (due to ownership rules), so it’s not actually useful. Rust and Swift both have a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low scoring: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-designed: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: Go (ASI), JavaScript (ASI — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons. MUMPS doesn’t support nesting at all, but I believe there are extensions that use dots to indicate it.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. (This approach is called automatic semicolon insertion.) That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

NSO Group

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/08/nso_group.html

We’re starting to see some information on the Israeli cyberweapons arms manufacturer that sold the iPhone zero-day exploit to the United Arab Emirates so they could spy on human rights defenders.

EDITED TO ADD (9/1): There is criticism in the comments about me calling NSO Group an Israeli company. I was just repeating the news articles, but further research indicates that it is Israeli-founded and Israeli-based, but 100% owned by an American private equity firm.

iPhone Zero-Day Used by UAE Government

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/08/iphone_zero-day.html

Last week, Apple issued a critical security patch for the iPhone: iOS 9.3.5. The incredible story is that this patch is the result of investigative work by Citizen Lab, which uncovered a zero-day exploit being used by the UAE government against a human rights defender. The UAE spyware was provided by the Israeli cyberweapons arms manufacturer NSO Group.

This is a big deal. iOS vulnerabilities are expensive, and can sell for over $1M. That we can find one used in the wild and patch it, rendering it valueless, is a major win and puts a huge dent in the vulnerabilities market. The more we can do this, the less valuable these zero-days will be to both criminals and governments — and to criminal governments.

Citizen Lab blog post and report. New York Times article. More news articles.