Tag Archives: ISPs

Goodbye, net neutrality—Ajit Pai’s FCC votes to allow blocking and throttling (Ars Technica)

Post Syndicated from jake original https://lwn.net/Articles/741482/rss

In a vote that was not any kind of surprise, the US Federal Communications Commission (FCC) voted to end the “net neutrality” rules that stop internet service providers (ISPs) and others from blocking or throttling certain kinds of traffic to try to force consumers and content providers to pay more for “fast lanes”. Ars Technica covers the vote and the reaction to it, including the fact that the fight is not yet over: “Plenty of organizations might appeal, said consumer advocate Gigi Sohn, who was a top counselor to then-FCC Chairman Tom Wheeler when the commission imposed its rules.

‘I think you’ll see public interest groups, trade associations, and small and mid-sized tech companies filing the petitions for review,’ Sohn told Ars. One or two ‘big companies’ could also challenge the repeal, she thinks.

Lawsuit filers can challenge the repeal on numerous respects, she said. They can argue that the public record doesn’t support the FCC’s claim that broadband isn’t a telecommunications service, that ‘throwing away all protections for consumers and innovators for the first time since this issue has been debated is arbitrary and capricious,’ and that the FCC cannot preempt state net neutrality laws, she said.”

Libertarians are against net neutrality

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/12/libertarians-are-against-net-neutrality.html

This post claims to be by a libertarian in support of net neutrality. As a libertarian, I need to debunk this. “Net neutrality” is a case of one-hand clapping, you rarely hear the competing side, and thus, that side may sound attractive. This post is about the other side, from a libertarian point of view.

That post just repeats the common, and wrong, left-wing talking points. I mean, there might be a libertarian case for some broadband regulation, but this isn’t it.

This thing they call “net neutrality” is just left-wing politics masquerading as some sort of principle. It’s no different than how people claim to be “pro-choice”, yet demand forced vaccinations. Or, it’s no different than how people claim to believe in “traditional marriage” even while they are on their third “traditional marriage”.

Properly defined, “net neutrality” means no discrimination of network traffic. But nobody wants that. A classic example is how most internet connections have faster download speeds than uploads. This discriminates against upload traffic, harming innovation in upload-centric applications like DropBox’s cloud backup or BitTorrent’s peer-to-peer file transfer. Yet activists never mention this, or other types of network traffic discrimination, because they no more care about “net neutrality” than Trump or Gingrich care about “traditional marriage”.

Instead, when people say “net neutrality”, they mean “government regulation”. It’s the same old debate between who is the best steward of consumer interest: the free-market or government.

Specifically, in the current debate, they are referring to the Obama-era FCC “Open Internet” order and reclassification of broadband under “Title II” so they can regulate it. Trump’s FCC is putting broadband back to “Title I”, which means the FCC can’t regulate most of its “Open Internet” order.

Don’t be tricked into thinking the “Open Internet” order is anything but intensely politically. The premise behind the order is the Democrat’s firm believe that it’s government who created the Internet, and all innovation, advances, and investment ultimately come from the government. It sees ISPs as inherently deceitful entities who will only serve their own interests, at the expense of consumers, unless the FCC protects consumers.

It says so right in the order itself. It starts with the premise that broadband ISPs are evil, using illegitimate “tactics” to hurt consumers, and continues with similar language throughout the order.

A good contrast to this can be seen in Tim Wu’s non-political original paper in 2003 that coined the term “net neutrality”. Whereas the FCC sees broadband ISPs as enemies of consumers, Wu saw them as allies. His concern was not that ISPs would do evil things, but that they would do stupid things, such as favoring short-term interests over long-term innovation (such as having faster downloads than uploads).

The political depravity of the FCC’s order can be seen in this comment from one of the commissioners who voted for those rules:

FCC Commissioner Jessica Rosenworcel wants to increase the minimum broadband standards far past the new 25Mbps download threshold, up to 100Mbps. “We invented the internet. We can do audacious things if we set big goals, and I think our new threshold, frankly, should be 100Mbps. I think anything short of that shortchanges our children, our future, and our new digital economy,” Commissioner Rosenworcel said.

This is indistinguishable from communist rhetoric that credits the Party for everything, as this booklet from North Korea will explain to you.

But what about monopolies? After all, while the free-market may work when there’s competition, it breaks down where there are fewer competitors, oligopolies, and monopolies.

There is some truth to this, in individual cities, there’s often only only a single credible high-speed broadband provider. But this isn’t the issue at stake here. The FCC isn’t proposing light-handed regulation to keep monopolies in check, but heavy-handed regulation that regulates every last decision.

Advocates of FCC regulation keep pointing how broadband monopolies can exploit their renting-seeking positions in order to screw the customer. They keep coming up with ever more bizarre and unlikely scenarios what monopoly power grants the ISPs.

But the never mention the most simplest: that broadband monopolies can just charge customers more money. They imagine instead that these companies will pursue a string of outrageous, evil, and less profitable behaviors to exploit their monopoly position.

The FCC’s reclassification of broadband under Title II gives it full power to regulate ISPs as utilities, including setting prices. The FCC has stepped back from this, promising it won’t go so far as to set prices, that it’s only regulating these evil conspiracy theories. This is kind of bizarre: either broadband ISPs are evilly exploiting their monopoly power or they aren’t. Why stop at regulating only half the evil?

The answer is that the claim “monopoly” power is a deception. It starts with overstating how many monopolies there are to begin with. When it issued its 2015 “Open Internet” order the FCC simultaneously redefined what they meant by “broadband”, upping the speed from 5-mbps to 25-mbps. That’s because while most consumers have multiple choices at 5-mbps, fewer consumers have multiple choices at 25-mbps. It’s a dirty political trick to convince you there is more of a problem than there is.

In any case, their rules still apply to the slower broadband providers, and equally apply to the mobile (cell phone) providers. The US has four mobile phone providers (AT&T, Verizon, T-Mobile, and Sprint) and plenty of competition between them. That it’s monopolistic power that the FCC cares about here is a lie. As their Open Internet order clearly shows, the fundamental principle that animates the document is that all corporations, monopolies or not, are treacherous and must be regulated.

“But corporations are indeed evil”, people argue, “see here’s a list of evil things they have done in the past!”

No, those things weren’t evil. They were done because they benefited the customers, not as some sort of secret rent seeking behavior.

For example, one of the more common “net neutrality abuses” that people mention is AT&T’s blocking of FaceTime. I’ve debunked this elsewhere on this blog, but the summary is this: there was no network blocking involved (not a “net neutrality” issue), and the FCC analyzed it and decided it was in the best interests of the consumer. It’s disingenuous to claim it’s an evil that justifies FCC actions when the FCC itself declared it not evil and took no action. It’s disingenuous to cite the “net neutrality” principle that all network traffic must be treated when, in fact, the network did treat all the traffic equally.

Another frequently cited abuse is Comcast’s throttling of BitTorrent.Comcast did this because Netflix users were complaining. Like all streaming video, Netflix backs off to slower speed (and poorer quality) when it experiences congestion. BitTorrent, uniquely among applications, never backs off. As most applications become slower and slower, BitTorrent just speeds up, consuming all available bandwidth. This is especially problematic when there’s limited upload bandwidth available. Thus, Comcast throttled BitTorrent during prime time TV viewing hours when the network was already overloaded by Netflix and other streams. BitTorrent users wouldn’t mind this throttling, because it often took days to download a big file anyway.

When the FCC took action, Comcast stopped the throttling and imposed bandwidth caps instead. This was a worse solution for everyone. It penalized heavy Netflix viewers, and prevented BitTorrent users from large downloads. Even though BitTorrent users were seen as the victims of this throttling, they’d vastly prefer the throttling over the bandwidth caps.

In both the FaceTime and BitTorrent cases, the issue was “network management”. AT&T had no competing video calling service, Comcast had no competing download service. They were only reacting to the fact their networks were overloaded, and did appropriate things to solve the problem.

Mobile carriers still struggle with the “network management” issue. While their networks are fast, they are still of low capacity, and quickly degrade under heavy use. They are looking for tricks in order to reduce usage while giving consumers maximum utility.

The biggest concern is video. It’s problematic because it’s designed to consume as much bandwidth as it can, throttling itself only when it experiences congestion. This is what you probably want when watching Netflix at the highest possible quality, but it’s bad when confronted with mobile bandwidth caps.

With small mobile devices, you don’t want as much quality anyway. You want the video degraded to lower quality, and lower bandwidth, all the time.

That’s the reasoning behind T-Mobile’s offerings. They offer an unlimited video plan in conjunction with the biggest video providers (Netflix, YouTube, etc.). The catch is that when congestion occurs, they’ll throttle it to lower quality. In other words, they give their bandwidth to all the other phones in your area first, then give you as much of the leftover bandwidth as you want for video.

While it sounds like T-Mobile is doing something evil, “zero-rating” certain video providers and degrading video quality, the FCC allows this, because they recognize it’s in the customer interest.

Mobile providers especially have great interest in more innovation in this area, in order to conserve precious bandwidth, but they are finding it costly. They can’t just innovate, but must ask the FCC permission first. And with the new heavy handed FCC rules, they’ve become hostile to this innovation. This attitude is highlighted by the statement from the “Open Internet” order:

And consumers must be protected, for example from mobile commercial practices masquerading as “reasonable network management.”

This is a clear declaration that free-market doesn’t work and won’t correct abuses, and that that mobile companies are treacherous and will do evil things without FCC oversight.

Conclusion

Ignoring the rhetoric for the moment, the debate comes down to simple left-wing authoritarianism and libertarian principles. The Obama administration created a regulatory regime under clear Democrat principles, and the Trump administration is rolling it back to more free-market principles. There is no principle at stake here, certainly nothing to do with a technical definition of “net neutrality”.

The 2015 “Open Internet” order is not about “treating network traffic neutrally”, because it doesn’t do that. Instead, it’s purely a left-wing document that claims corporations cannot be trusted, must be regulated, and that innovation and prosperity comes from the regulators and not the free market.

It’s not about monopolistic power. The primary targets of regulation are the mobile broadband providers, where there is plenty of competition, and who have the most “network management” issues. Even if it were just about wired broadband (like Comcast), it’s still ignoring the primary ways monopolies profit (raising prices) and instead focuses on bizarre and unlikely ways of rent seeking.

If you are a libertarian who nonetheless believes in this “net neutrality” slogan, you’ve got to do better than mindlessly repeating the arguments of the left-wing. The term itself, “net neutrality”, is just a slogan, varying from person to person, from moment to moment. You have to be more specific. If you truly believe in the “net neutrality” technical principle that all traffic should be treated equally, then you’ll want a rewrite of the “Open Internet” order.

In the end, while libertarians may still support some form of broadband regulation, it’s impossible to reconcile libertarianism with the 2015 “Open Internet”, or the vague things people mean by the slogan “net neutrality”.

NetNeutrality vs. Verizon censoring Naral

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/netneutrality-vs-verizon-censoring-naral.html

People keep retweeting this ACLU graphic in support of net neutrality. It’s wrong. In this post, I debunk the second item. I debunk other items in other posts [1] [4].

Firstly, it’s not a NetNeutrality issue (which applies only to the Internet), but an issue with text-messages. In other words, it’s something that will continue to happen even with NetNeutrality rules. People relate this to NetNeutrality as an analogy, not because it actually is such an issue.

Secondly, it’s an edge/content issue, not a transit issue. The details in this case is that Verizon provides a program for sending bulk messages to its customers from the edge of the network. Verizon isn’t censoring text messages in transit, but from the edge. You can send a text message to your friend on the Verizon network, and it won’t be censored. Thus the analogy is incorrect — the correct analogy would be with content providers like Twitter and Facebook, not ISPs like Comcast.

Like all cell phone vendors, Verizon polices this content, canceling accounts that abuse the system, like spammers. We all agree such censorship is a good thing, and that such censorship of content providers is not remotely a NetNeutrality issue. Content providers do this not because they disapprove of the content of spam such much as the distaste their customers have for spam.
Content providers that are political, rather than neutral to politics is indeed worrisome. It’s not a NetNeutrality issue per se, but it is a general “neutrality” issue. We free-speech activists want all content providers (Twitter, Facebook, Verizon mass-texting programs) to be free of political censorship — though we don’t want government to mandate such neutrality.
But even here, Verizon may be off the hook. They appear not be to be censoring one political view over another, but the controversial/unsavory way Naral expresses its views. Presumably, Verizon would be okay with less controversial political content.

In other words, as Verizon expresses it’s principles, it wants to block content that drivers away customers, but is otherwise neutral to the content. While this may unfairly target controversial political content, it’s at least basically neutral.

So in conclusion, while activists portray this as a NetNeutrality issue, it isn’t. It’s not even close.

NetNeutrality vs. AT&T censoring Pearl Jam

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/netneutrality-vs-at-censoring-pearl-jam.html

People keep retweeting this ACLU graphic in response to the FCC’s net neutrality decision. In this post, I debunk the first item on the list. In other posts [2] [4] I debunk other items.

First of all, this obviously isn’t a Net Neutrality case. The case isn’t about AT&T acting as an ISP transiting network traffic. Instead, this was about AT&T being a content provider, through their “Blue Room” subsidiary, whose content traveled across other ISPs. Such things will continue to happen regardless of the most stringent enforcement of NetNeutrality rules, since the FCC doesn’t regulate content providers.
Second of all, it wasn’t AT&T who censored the traffic. It wasn’t their Blue Room subsidiary who censored the traffic. It was a third party company they hired to bleep things like swear words and nipple slips. You are blaming AT&T for a decision by a third party that went against AT&T’s wishes. It was an accident, not AT&T policy.
Thirdly, and this is the funny bit, Tim Wu, the guy who defined the term “net neutrality”, recently wrote an op-ed claiming that while ISPs shouldn’t censor traffic, that content providers should. In other words, he argues that companies AT&T’s Blue Room should censor political content.
What activists like ACLU say about NetNeutrality have as little relationship to the truth as Trump’s tweets. Both pick “facts” that agree with them only so long as you don’t look into them.

The FCC has never defended Net Neutrality

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/the-fcc-has-never-defended-net.html

This op-ed by a “net neutrality expert” claims the FCC has always defended “net neutrality”. It’s garbage.

This wrong on its face. It imagines decades ago that the FCC inshrined some plaque on the wall stating principles that subsequent FCC commissioners have diligently followed. The opposite is true. FCC commissioners are a chaotic bunch, with different interests, influenced (i.e. “lobbied” or “bribed”) by different telecommunications/Internet companies. Rather than following a principle, their Internet regulatory actions have been ad hoc and arbitrary — for decades.

Sure, you can cherry pick some of those regulatory actions as fitting a “net neutrality” narrative, but most actions don’t fit that narrative, and there have been gross net neutrality violations that the FCC has ignored.

There are gross violations going on right now that the FCC is allowing. Most egregiously is the “zero-rating” of video traffic on T-Mobile. This is a clear violation of the principles of net neutrality, yet the FCC is allowing it — despite official “net neutrality” rules in place.

The op-ed above claims that “this [net neutrality] principle was built into the architecture of the Internet”. The opposite is true. Traffic discrimination was built into the architecture since the beginning. If you don’t believe me, read RFC 791 and the “precedence” field.

More concretely, from the beginning of the Internet as we know it (the 1990s), CDNs (content delivery networks) have provided a fast-lane for customers willing to pay for it. These CDNs are so important that the Internet wouldn’t work without them.

I just traced the route of my CNN live stream. It comes from a server 5 miles away, instead of CNN’s headquarters 2500 miles away. That server is located inside Comcast’s network, because CNN pays Comcast a lot of money to get a fast-lane to Comcast’s customers.

The reason these egregious net net violations exist is because it’s in the interests of customers. Moving content closer to customers helps. Re-prioritizing (and charging less for) high-bandwidth video over cell networks helps customers.

You might say it’s okay that the FCC bends net neutrality rules when it benefits consumers, but that’s garbage. Net neutrality claims these principles are sacred and should never be violated. Obviously, that’s not true — they should be violated when it benefits consumers. This means what net neutrality is really saying is that ISPs can’t be trusted to allows act to benefit consumers, and therefore need government oversight. Well, if that’s your principle, then what you are really saying is that you are a left-winger, not that you believe in net neutrality.

Anyway, my point is that the above op-ed cherry picks a few data points in order to build a narrative that the FCC has always regulated net neutrality. A larger view is that the FCC has never defended this on principle, and is indeed, not defending it right now, even with “net neutrality” rules officially in place.

Defending anti-netneutrality arguments

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/07/defending-anti-netneutrality-arguments.html

Last week, activists proclaimed a “NetNeutrality Day”, trying to convince the FCC to regulate NetNeutrality. As a libertarian, I tweeted many reasons why NetNeutrality is stupid. NetNeutrality is exactly the sort of government regulation Libertarians hate most. Somebody tweeted the following challenge, which I thought I’d address here.

The links point to two separate cases.

  • the Comcast BitTorrent throttling case
  • a lawsuit against Time Warning for poor service
The tone of the tweet suggests that my anti-NetNeutrality stance cannot be defended in light of these cases. But of course this is wrong. The short answers are:

  • the Comcast BitTorrent throttling benefits customers
  • poor service has nothing to do with NetNeutrality

The long answers are below.

The Comcast BitTorrent Throttling

The presumption is that any sort of packet-filtering is automatically evil, and against the customer’s interests. That’s not true.
Take GoGoInflight’s internet service for airplanes. They block access to video sites like NetFlix. That’s because they often have as little as 1-mbps for the entire plane, which is enough to support many people checking email and browsing Facebook, but a single person trying to watch video will overload the internet connection for everyone. Therefore, their Internet service won’t work unless they filter video sites.
GoGoInflight breaks a lot of other NetNeutrality rules, such as providing free access to Amazon.com or promotion deals where users of a particular phone get free Internet access that everyone else pays for. And all this is allowed by FCC, allowing GoGoInflight to break NetNeutrality rules because it’s clearly in the customer interest.
Comcast’s throttling of BitTorrent is likewise clearly in the customer interest. Until the FCC stopped them, BitTorrent users were allowed unlimited downloads. Afterwards, Comcast imposed a 300-gigabyte/month bandwidth cap.
Internet access is a series of tradeoffs. BitTorrent causes congestion during prime time (6pm to 10pm). Comcast has to solve it somehow — not solving it wasn’t an option. Their options were:
  • Charge all customers more, so that the 99% not using BitTorrent subsidizes the 1% who do.
  • Impose a bandwidth cap, preventing heavy BitTorrent usage.
  • Throttle BitTorrent packets during prime-time hours when the network is congested.
Option 3 is clearly the best. BitTorrent downloads take hours, days, and sometimes weeks. BitTorrent users don’t mind throttling during prime-time congested hours. That’s preferable to the other option, bandwidth caps.
I’m a BitTorrent user, and a heavy downloader (I scan the Internet on a regular basis from cloud machines, then download the results to home, which can often be 100-gigabytes in size for a single scan). I want prime-time BitTorrent throttling rather than bandwidth caps. The EFF/FCC’s action that prevented BitTorrent throttling forced me to move to Comcast Business Class which doesn’t have bandwidth caps, charging me $100 more a month. It’s why I don’t contribute the EFF — if they had not agitated for this, taking such choices away from customers, I’d have $1200 more per year to donate to worthy causes.
Ask any user of BitTorrent which they prefer: 300gig monthly bandwidth cap or BitTorrent throttling during prime-time congested hours (6pm to 10pm). The FCC’s action did not help Comcast’s customers, it hurt them. Packet-filtering would’ve been a good thing, not a bad thing.

The Time-Warner Case
First of all, no matter how you define the case, it has nothing to do with NetNeutrality. NetNeutrality is about filtering packets, giving some priority over others. This case is about providing slow service for everyone.
Secondly, it’s not true. Time Warner provided the same access speeds as everyone else. Just because they promise 10mbps download speeds doesn’t mean you get 10mbps to NetFlix. That’s not how the Internet works — that’s not how any of this works.
To prove this, look at NetFlix’s connection speed graphis. It shows Time Warner Cable is average for the industry. It had the same congestion problems most ISPs had in 2014, and it has the same inability to provide more than 3mbps during prime-time (6pm-10pm) that all ISPs have today.

The YouTube video quality diagnostic pages show Time Warner Cable to similar to other providers around the country. It also shows the prime-time bump between 6pm and 10pm.
Congestion is an essential part of the Internet design. When an ISP like Time Warner promises you 10mbps bandwidth, that’s only “best effort”. There’s no way they can promise 10mbps stream to everybody on the Internet, especially not to a site like NetFlix that gets overloaded during prime-time.
Indeed, it’s the defining feature of the Internet compared to the old “telecommunications” network. The old phone system guaranteed you a steady 64-kbps stream between any time points in the phone network, but it cost a lot of money. Today’s Internet provide a free multi-megabit stream for free video calls (Skype, Facetime) around the world — but with the occasional dropped packets because of congestion.
Whatever lawsuit money-hungry lawyers come up with isn’t about how an ISP like Time Warner works. It’s only about how they describe the technology. They work no different than every ISP — no different than how anything is possible.
Conclusion

The short answer to the above questions is this: Comcast’s BitTorrent throttling benefits customers, and the Time Warner issue has nothing to do with NetNeutrality at all.

The tweet demonstrates that NetNeutrality really means. It has nothing to do with the facts of any case, especially the frequency that people point to ISP ills that have nothing actually to do with NetNeutrality. Instead, what NetNeutrality really about is socialism. People are convinced corporations are evil and want the government to run the Internet. The Comcast/BitTorrent case is a prime example of why this is a bad idea: government definitions of what customers want is actually far different than what customers actually want.

Creating a Daily Dashboard to Track Bounces and Complaints

Post Syndicated from Rubem De Lima Savordelli original https://aws.amazon.com/blogs/ses/creating-a-daily-dashboard-to-track-bounces-and-complaints/

Bounce and complaint rates can have a negative impact on your sender reputation, and a bad sender reputation makes it less likely that the emails you send will reach your recipients’ inboxes. Further, if your bounce or complaint rate is too high, we may have to suspend your Amazon SES account to protect other users. For these reasons, it is very important that you have a process in place to remove email addresses that have bounced or complained from your recipient list.

This article includes background information about bounces and complaints. It also discusses a sample solution that you can use to keep track of the bounce and complaint notifications that you receive.

What is a Bounce?

A bounce occurs when a message cannot be delivered to the intended recipient. There are two types of bounces:

  • A hard bounce occurs when a persistent issue prevents the message from being delivered. Hard bounces can occur when the recipient’s email address does not exist or the receiving domain does not exist. When an email hard bounces, it means that the recipient did not receive the message, and Amazon SES will no longer attempt to deliver the message.
  • A soft bounce occurs when a temporary issue prevents a message from being delivered. Soft bounces can occur when the recipient’s mailbox is full, when the connection to the receiving email server times out, or when there are too many simultaneous connections to the receiving mail server. When an email soft bounces, Amazon will attempt to redeliver it. If the issue persists, Amazon SES will stop trying to deliver the message, and the soft bounce will be converted to a hard bounce.

To learn more about bounces, see the Amazon SES Bounce FAQ in the Amazon SES Developer Guide.

What is a Complaint?

When an email recipient clicks the Mark as Spam (or similar) button in his or her email client, the ISP records the event as a complaint. If the emails that you send generate too many of these complaint events, the ISP may conclude that you’re sending spam. Many ISPs provide feedback loops, in which the ISP provides you with information about the message that generated the complaint event.

For more information about complaints, see the Amazon SES Complaint FAQ in the Amazon SES Developer Guide.

Building a Daily Dashboard

We recently added a section to the Amazon SES Developer Guide that documents the process of creating a daily bounce and complaint tracking dashboard. You can find the procedures for creating this daily dashboard at http://docs.aws.amazon.com/ses/latest/DeveloperGuide/bouncecomplaintdashboard.html.

This solution uses several AWS components—including Simple Notification Service (SNS), Simple Queue Service (SQS), Identity and Access Management (IAM), Simple Storage Service (S3), Lambda, and CloudWatch—to create a dashboard that is emailed to you every day. The daily dashboard, illustrated in the following image, contains a list of the messages that generated bounces and complaints over the past 24 hours.

This solution uses SNS to track bounce and complaint notifications. Those notifications are then collected in an SQS queue. A CloudWatch trigger initiates a Lambda function, which collects the notification events from SQS, processes them, publishes a dashboard to an S3 bucket, and sends you an email when the dashboard is ready to view. The following image illustrates the architecture of this solution.

When you receive the daily dashboard, you should use it to remove the addresses that hard bounced or complained from your recipient list. This measure will help protect your deliverability and inbox placement rates.

This solution is just one method of tracking the bounces and complaints that you receive when sending email using Amazon SES. We hope you find this sample solution useful. If you have any questions about this solution, please leave a comment below, or start a discussion in the Amazon SES forum.

APT10 and Cloud Hopper

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/04/apt10_and_cloud.html

There’s a new report of a nation-state attack, presumed to be from China, on a series of managed ISPs. From the executive summary:

Since late 2016, PwC UK and BAE Systems have been assisting victims of a new cyber espionage campaign conducted by a China-based threat actor. We assess this threat actor to almost certainly be the same as the threat actor widely known within the security community as ‘APT10’. The campaign, which we refer to as Operation Cloud Hopper, has targeted managed IT service providers (MSPs), allowing APT10 unprecedented potential access to the intellectual property and sensitive data of those MSPs and their clients globally. A number of Japanese organisations have also been directly targeted in a separate, simultaneous campaign by the same actor.

We have identified a number of key findings that are detailed below.

APT10 has recently unleashed a sustained campaign against MSPs. The compromise of MSP networks has provided broad and unprecedented access to MSP customer networks.

  • Multiple MSPs were almost certainly being targeted from 2016 onwards, and it is likely that APT10 had already begun to do so from as early as 2014.
  • MSP infrastructure has been used as part of a complex web of exfiltration routes spanning multiple victim networks.

[…]

APT10 focuses on espionage activity, targeting intellectual property and other sensitive data.

  • APT10 is known to have exfiltrated a high volume of data from multiple victims, exploiting compromised MSP networks, and those of their customers, to stealthily move this data around the world.
  • The targeted nature of the exfiltration we have observed, along with the volume of the data, is reminiscent of the previous era of APT campaigns pre-2013.

PwC UK and BAE Systems assess APT10 as highly likely to be a China-based threat actor.

  • It is a widely held view within the cyber security community that APT10 is a China-based threat actor.
  • Our analysis of the compile times of malware binaries, the registration times of domains attributed to APT10, and the majority of its intrusion activity indicates a pattern of work in line with China Standard Time (UTC+8).

  • The threat actor’s targeting of diplomatic and political organisations in response to geopolitical tensions, as well as the targeting of specific commercial enterprises, is closely aligned with strategic Chinese interests.

I know nothing more than what’s in this report, but it looks like a big one.

Press release.

Congress Removes FCC Privacy Protections on Your Internet Usage

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/congress_remove.html

Think about all of the websites you visit every day. Now imagine if the likes of Time Warner, AT&T, and Verizon collected all of your browsing history and sold it on to the highest bidder. That’s what will probably happen if Congress has its way.

This week, lawmakers voted to allow Internet service providers to violate your privacy for their own profit. Not only have they voted to repeal a rule that protects your privacy, they are also trying to make it illegal for the Federal Communications Commission to enact other rules to protect your privacy online.

That this is not provoking greater outcry illustrates how much we’ve ceded any willingness to shape our technological future to for-profit companies and are allowing them to do it for us.

There are a lot of reasons to be worried about this. Because your Internet service provider controls your connection to the Internet, it is in a position to see everything you do on the Internet. Unlike a search engine or social networking platform or news site, you can’t easily switch to a competitor. And there’s not a lot of competition in the market, either. If you have a choice between two high-speed providers in the US, consider yourself lucky.

What can telecom companies do with this newly granted power to spy on everything you’re doing? Of course they can sell your data to marketers — and the inevitable criminals and foreign governments who also line up to buy it. But they can do more creepy things as well.

They can snoop through your traffic and insert their own ads. They can deploy systems that remove encryption so they can better eavesdrop. They can redirect your searches to other sites. They can install surveillance software on your computers and phones. None of these are hypothetical.

They’re all things Internet service providers have done before, and they are some of the reasons the FCC tried to protect your privacy in the first place. And now they’ll be able to do all of these things in secret, without your knowledge or consent. And, of course, governments worldwide will have access to these powers. And all of that data will be at risk of hacking, either by criminals and other governments.

Telecom companies have argued that other Internet players already have these creepy powers — although they didn’t use the word “creepy” — so why should they not have them as well? It’s a valid point.

Surveillance is already the business model of the Internet, and literally hundreds of companies spy on your Internet activity against your interests and for their own profit.

Your e-mail provider already knows everything you write to your family, friends, and colleagues. Google already knows our hopes, fears, and interests, because that’s what we search for.

Your cellular provider already tracks your physical location at all times: it knows where you live, where you work, when you go to sleep at night, when you wake up in the morning, and — because everyone has a smartphone — who you spend time with and who you sleep with.

And some of the things these companies do with that power is no less creepy. Facebook has run experiments in manipulating your mood by changing what you see on your news feed. Uber used its ride data to identify one-night stands. Even Sony once installed spyware on customers’ computers to try and detect if they copied music files.

Aside from spying for profit, companies can spy for other purposes. Uber has already considered using data it collects to intimidate a journalist. Imagine what an Internet service provider can do with the data it collects: against politicians, against the media, against rivals.

Of course the telecom companies want a piece of the surveillance capitalism pie. Despite dwindling revenues, increasing use of ad blockers, and increases in clickfraud, violating our privacy is still a profitable business — especially if it’s done in secret.

The bigger question is: why do we allow for-profit corporations to create our technological future in ways that are optimized for their profits and anathema to our own interests?

When markets work well, different companies compete on price and features, and society collectively rewards better products by purchasing them. This mechanism fails if there is no competition, or if rival companies choose not to compete on a particular feature. It fails when customers are unable to switch to competitors. And it fails when what companies do remains secret.

Unlike service providers like Google and Facebook, telecom companies are infrastructure that requires government involvement and regulation. The practical impossibility of consumers learning the extent of surveillance by their Internet service providers, combined with the difficulty of switching them, means that the decision about whether to be spied on should be with the consumer and not a telecom giant. That this new bill reverses that is both wrong and harmful.

Today, technology is changing the fabric of our society faster than at any other time in history. We have big questions that we need to tackle: not just privacy, but questions of freedom, fairness, and liberty. Algorithms are making decisions about policing, healthcare.

Driverless vehicles are making decisions about traffic and safety. Warfare is increasingly being fought remotely and autonomously. Censorship is on the rise globally. Propaganda is being promulgated more efficiently than ever. These problems won’t go away. If anything, the Internet of things and the computerization of every aspect of our lives will make it worse.

In today’s political climate, it seems impossible that Congress would legislate these things to our benefit. Right now, regulatory agencies such as the FTC and FCC are our best hope to protect our privacy and security against rampant corporate power. That Congress has decided to reduce that power leaves us at enormous risk.

It’s too late to do anything about this bill — Trump will certainly sign it — but we need to be alert to future bills that reduce our privacy and security.

This post previously appeared on the Guardian.

EDITED TO ADD: Former FCC Commissioner Tom Wheeler wrote a good op-ed on the subject. And here’s an essay laying out what this all means to the average Internet user.

Amazon SES Can Now Automatically Warm Up Your Dedicated IP Addresses

Post Syndicated from Cristian Smochina original https://aws.amazon.com/blogs/ses/amazon-ses-can-now-automatic-warm-up-your-dedicated-ip-addresses/

The SES team is pleased to announce that, starting today, Amazon SES can automatically warm up your new dedicated IP addresses. Before the automatic warm-up feature was available, Amazon SES customers who leased dedicated IPs implemented their own warm-up mechanisms. Customers gradually increased email sending through a new dedicated IP before using the dedicated IP to its full capacity. This blog post explains how Amazon SES warms up your dedicated IPs, and how to enable the warm-up feature.

Why do I have to warm up my dedicated IPs?

You must warm up your dedicated IPs before you send a high volume of emails. Many receiving ISPs do not accept emails from an IP that suddenly sends a large volume of email. ISPs perceive this behavior as an indicator of abuse and a possible source of spam. To avoid emails getting dropped or having your sending severely throttled, warm up your IPs by gradually increasing the volume of emails you send through a new IP address. You can find more guidance about the warm up process in the developer guide.

How does Amazon SES warm up my dedicated IPs?

After you enable automatic warm-up, Amazon SES limits the maximum number of emails that you send daily through your new dedicated IP addresses according to a predefined warm-up plan. This automated warm-up process takes up to 45 days. The process ensures that traffic through the newly leased dedicated IP address is gradually increased to establish a positive reputation with receiving ISPs. The maximum daily amount of mail increases from the first day until a maximum of 50,000 emails can be sent from an IP.

When do I have to enable the automatic warm-up?

By default, automatic warm-up is enabled for your account. All newly leased dedicated IP addresses are placed in the automatic warm-up plan. You can disable the automatic warm-up from the Dedicated IPs page in the Amazon SES console. If you are already using dedicated IPs to send emails, go to the Amazon SES console to turn this feature on to take advantage of automatic warm-up.

PrintScreen

Note: disabling automatic warm-up stops the warm-up process. All of your IP addresses will be considered fully warmed up. Re-enabling automatic warm-up does not start the warm-up for the dedicated IPs already allocated to your Amazon SES account.

What happens with the emails sent beyond the daily maximum limit from the warm-up plan?

If you enabled automatic warm-up and you are leasing dedicated IPs for the first time, then all emails that you send beyond the pre-planned daily warm-up plan are sent through shared IPs instead. This means that during the warm-up period, Amazon SES uses your dedicated and shared IPs from the Amazon SES IP pools to send your emails. After the warm-up is complete, Amazon SES sends emails only through your dedicated IPs, and the maximum number of emails you can send is limited by your daily email sending quota. For more information, see Managing Your Amazon SES Sending Limits in the Amazon SES Developer Guide.

If you are an existing dedicated IP customer requesting additional dedicated IPs, emails beyond the daily maximum limit per dedicated IP in the warm-up plan are sent only through dedicated IPs already allocated to your account.

Does automatic warm-up incur extra cost?

No. See the Amazon SES pricing page for dedicated IP pricing information.

We hope you find this feature useful! If you have any questions or comments, let us know in the SES Forum or in the comment section of the blog.

China To Outlaw All Unapproved VPN Services

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/0J0Igtp77Z0/

So the latest news from behind the Great Firewall of China is that they plan to crack down on all unapproved VPN services. This means all VPN providers, cloud service providers and ISPs will have to seek an annually renewed licence to operate a VPN Service. Really, not very surprising coming out of China and […]

The post China To Outlaw All…

Read the full post at darknet.org.uk

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca language we have… because we have no other lingua franca languages.

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I complain about argument order and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Bash” generally means zsh and ksh and other derivatives as well, and when referring to expressions, means the $(( ... )) syntax; “Unix shells” means Bourne and thus almost certainly everything else as well. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Ruby, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: C#, D, expr, JavaScript, Perl 5, PHP.

Wisened up: Bash, F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are ?& ?| ?^), Python, Ruby, SQL, Swift.

Special mention: Java has C’s precedence, but forbids using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: Bash, C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Bash, Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Scala.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Racket (#o), Ruby, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), Bash, COBOL, CoffeeScript, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript has both for...in and for...of, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: C#, Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Swift (has a fallthrough statement), Unix shells.

Special mention: D requires break, but requires something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: Java, Perl 6

Looks Lovely: Python 3 (annotation syntax and the typing module), Rust, Swift, TypeScript

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), Unix shells (everything’s a string, but even arrays/scalars are somewhat interchangeable)

Strong: Rust (even numeric upcasts must be explicit).

Special mention: Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: Bash, bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid).

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Perl 6, Python, and probably others have separate integral division operators: div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust.

Special mention: Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Autoincrement and autodecrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood:

Double plus good: Python

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter), but I believe the named forms are preferred.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Go, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Lua.

Special mention: Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us!

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation, so I think it’s all sugar for a monadic return value. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in), R.

Nullable types: Swift.

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Some languages go entirely the opposite direction and make everything an expression, including block constructs like if. In those languages, it makes sense for assignment to be an expression, for consistency with everything else.

Assignment’s an expression: ACS, C#, D, Java, JavaScript, Perl, PHP, Swift.

Everything’s an expression: Ruby, Rust.

Assignment’s a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. Functional languages generally don’t have an assignment operator. Rust has a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low score: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-named: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: JavaScript (kinda — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

Let’s stop copying C

Post Syndicated from Eevee original https://eev.ee/blog/2016/12/01/lets-stop-copying-c/

Ah, C. The best lingua franca we have… because we have no other lingua francas. Linguae franca. Surgeons general?

C is fairly old — 44 years, now! — and comes from a time when there were possibly more architectures than programming languages. It works well for what it is, and what it is is a relatively simple layer of indirection atop assembly.

Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design, and parts of its design are… slightly questionable. I’ve gone through some common features that probably should’ve stayed in C and my justification for saying so. The features are listed in rough order from (I hope) least to most controversial. The idea is that C fans will give up when I call it “weakly typed” and not even get to the part where I rag on braces. Wait, crap, I gave it away.

I’ve listed some languages that do or don’t take the same approach as C. Plenty of the listed languages have no relation to C, and some even predate it — this is meant as a cross-reference of the landscape (and perhaps a list of prior art), not a genealogy. The language selections are arbitrary and based on what I could cobble together from documentation, experiments, Wikipedia, and attempts to make sense of Rosetta Code. I don’t know everything about all of them, so I might be missing some interesting quirks. Things are especially complicated for very old languages like COBOL or Fortran, which by now have numerous different versions and variants and de facto standard extensions.

Unix shells” means some handwaved combination that probably includes bash and its descendants; for expressions, it means the (( ... )) syntax. I didn’t look too closely into, say, fish. Unqualified “Python” means both 2 and 3; likewise, unqualified “Perl” means both 5 and 6. Also some of the puns are perhaps a little too obtuse, but the first group listed is always C-like.

Textual inclusion

#include is not a great basis for a module system. It’s not even a module system. You can’t ever quite tell what symbols came from which files, or indeed whether particular files are necessary at all. And in languages with C-like header files, most headers include other headers include more headers, so who knows how any particular declaration is actually ending up in your code? Oh, and there’s the whole include guards thing.

It’s a little tricky to pick on individual languages here, because ultimately even the greatest module system in the world boils down to “execute this other file, and maybe do some other stuff”. I think the true differentiating feature is whether including/importing/whatevering a file creates a new namespace. If a file gets dumped into the caller’s namespace, that looks an awful lot like textual inclusion; if a file gets its own namespace, that’s a good sign of something more structured happening behind the scenes.

This tends to go hand-in-hand with how much the language relies on a global namespace. One surprising exception is Lua, which can compartmentalize required files quite well, but dumps everything into a single global namespace by default.

Quick test: if you create a new namespace and import another file within that namespace, do its contents end up in that namespace?

Included: ACS, awk, COBOL, Erlang, Forth, Fortran, most older Lisps, Perl 5 (despite that required files must return true), PHP, Unix shells.

Excluded: Ada, Clojure, D, Haskell, Julia, Lua (the file’s return value is returned from require), Nim, Node (similar to Lua), Perl 6, Python, Rust.

Special mention: ALGOL appears to have been designed with the assumption that you could include other code by adding its punch cards to your stack. C#, Java, OCaml, and Swift all have some concept of “all possible code that will be in this program”, sort of like C with inferred headers, so imports are largely unnecessary; Java’s import really just does aliasing. Inform 7 has no namespacing, but does have a first-class concept of external libraries, but doesn’t have a way to split a single project up between multiple files. Ruby doesn’t automatically give required files their own namespace, but doesn’t evaluate them in the caller’s namespace either.

Optional block delimiters

Old and busted and responsible for gotofail:

1
2
if (condition)
    thing;

New hotness, which reduces the amount of punctuation overall and eliminates this easy kind of error:

1
2
3
if condition {
    thing;
}

To be fair, and unlike most of these complaints, the original idea was a sort of clever consistency: the actual syntax was merely if (expr) stmt, and also, a single statement could always be replaced by a block of statements. Unfortunately, the cuteness doesn’t make up for the ease with which errors sneak in. If you’re stuck with a language like this, I advise you always use braces, possibly excepting the most trivial cases like immediately returning if some argument is NULL. Definitely do not do this nonsense, which I saw in actual code not 24 hours ago.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for (x = ...)
    for (y = ...) {
        ...
    }

    // more code

    for (x = ...)
        for (y = ...)
            buffer[y][x] = ...

The only real argument for omitting the braces is that the braces take up a lot of vertical space, but that’s mostly a problem if you put each { on its own line, and you could just not do that.

Some languages use keywords instead of braces, and in such cases it’s vanishingly rare to make the keywords optional.

Blockheads: ACS, awk, C#, D, Erlang (kinda?), Java, JavaScript.

New kids on the block: Go, Perl 6, Rust, Swift.

Had their braces removed: Ada, ALGOL, BASIC, COBOL, CoffeeScript, Forth, Fortran (but still requires parens), Haskell, Lua, Ruby.

Special mention: Inform 7 has several ways to delimit blocks, none of them vulnerable to this problem. Perl 5 requires both the parentheses and the braces… but it lets you leave off the semicolon on the last statement. Python just uses indentation to delimit blocks in the first place, so you can’t have a block look wrong. Lisps exist on a higher plane of existence where the very question makes no sense.

Bitwise operator precedence

For ease of transition from B, in C, the bitwise operators & | ^ have lower precedence than the comparison operators == and friends. That means they happen later. For binary math operators, this is nonsense.

1
2
3
1 + 2 == 3  // (1 + 2) == 3
1 * 2 == 3  // (1 * 2) == 3
1 | 2 == 3  // 1 | (2 == 3)

Many other languages have copied C’s entire set of operators and their precedence, including this. Because a new language is easier to learn if its rules are familiar, you see. Which is why we still, today, have extremely popular languages maintaining compatibility with a language from 1969 — so old that it probably couldn’t get a programming job.

Honestly, if your language is any higher-level than C, I’m not sure bit operators deserve to be operators at all. Free those characters up to do something else. Consider having a first-class bitfield type; then 99% of the use of bit operations would go away.

Quick test: 1 & 2 == 2 evaluates to 1 with C precedence, false otherwise. Or just look at a precedence table: if equality appears between bitwise ops and other math ops, that’s C style.

A bit wrong: D, expr, JavaScript, Perl 5, PHP.

Wisened up: F# (ops are &&& ||| ^^^), Go, Julia, Lua (bitwise ops are new in 5.3), Perl 6 (ops are +& +| +^), Python, Ruby, Rust, SQL, Swift, Unix shells.

Special mention: C# and Java have C’s precedence, but forbid using bitwise operators on booleans, so the quick test is a compile-time error. Lisp-likes have no operator precedence.

Negative modulo

The modulo operator, %, finds the remainder after division. Thus you might think that this always holds:

1
0 <= a % b < abs(b)

But no — if a is negative, C will produce a negative value. (Well, since C99; before that it was unspecified, which is probably worse.) This is so a / b * b + a % b is always equal to a. Truncating integer division rounds towards zero, so the sign of a % b always needs to be away from zero.

I’ve never found this behavior (or the above equivalence) useful. An easy example is that checking for odd numbers with x % 2 == 1 will fail for negative numbers, which produce -1. But the opposite behavior can be pretty handy.

Consider the problem of having n items that you want to arrange into rows with c columns. A calendar, say; you want to include enough empty cells to fill out the last row. n % c gives you the number of items on the last row, so c - n % c seems like it will give you the number of empty spaces. But if the last row is already full, then n % c is zero, and c - n % c equals c! You’ll have either a double-width row or a spare row of empty cells. Fixing this requires treating n % c == 0 as a special case, which is unsatisfying.

Ah, but if we have positive %, the answer is simply… -n % c! Consider this number line for n = 5 and c = 3:

1
2
-6      -3       0       3       6
 | - x x | x x x | x x x | x x - |

a % b tells you how far to count down to find a multiple of b. For positive a, that means “backtracking” over a itself and finding a smaller number. For negative a, that means continuing further away from zero. If you look at negative numbers as the mirror image of positive numbers, then % on a positive number tells you how to much to file off to get a multiple, whereas % on a negative number tells you how much further to go to get a multiple. 5 % 3 is 2, but -5 % 3 is 1. And of course, -6 % 3 is still zero, so that’s not a special case.

Positive % effectively lets you choose whether to round up or down. It doesn’t come up often, but when it’s handy, it’s really handy.

(I have no strong opinion on what 5 % -3 should be; I don’t think I’ve ever tried to use % with a negative divisor. Python makes it negative; Pascal makes it positive. Wikipedia has a whole big chart.)

Quick test: -5 % 3 is -2 with C semantics, 1 with “positive” semantics.

Leftovers: C#, D, expr, Go, Java, JavaScript, OCaml, PowerShell, PHP, Rust, Scala, SQL, Swift, Unix shells, VimL, Visual Basic. Notably, some of these languages don’t even have integer division.

Paying dividends: Dart, MUMPS (#), Perl, Python, R (%%), Ruby, Smalltalk (\\\\), Standard ML, Tcl.

Special mention: Ada, Haskell, Julia, many Lisps, MATLAB, VHDL, and others have separate mod (Python-like) and rem (C-like) operators. CoffeeScript has separate % (C-like) and %% (Python-like) operators.

Leading zero for octal

Octal notation like 0777 has three uses.

One: to make a file mask to pass to chmod().

Two: to confuse people when they write 013 and it comes out as 11.

Three: to confuse people when they write 018 and get a syntax error.

If you absolutely must have octal (?!) in your language, it’s fine to use 0o777. Really. No one will mind. Or you can go the whole distance and allow literals written in any base, as several languages do.

Gets a zero: awk (gawk only), Clojure, Go, Groovy, Java, JavaScript, m4, Perl 5, PHP, Python 2, Unix shells.

G0od: ECMAScript 6, Eiffel (0c — cute!), F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl 6, Python 3, Ruby, Rust, Scheme (#o), Swift, Tcl.

Based literals: Ada (8#777#), Bash (8#777), Erlang (8#777), Icon (8r777), J (8b777), Perl 6 (:8<777>), PostScript (8#777), Smalltalk (8r777).

Special mention: BASIC uses &O and &H prefixes for octal and hex. bc and Forth allow the base used to interpret literals to be changed on the fly, via ibase and BASE respectively. C#, D, expr, Lua, Scala, and Standard ML have no octal literals at all. Some COBOL extensions use O# and H#/X# prefixes for octal and hex. Fortran uses the slightly odd O'777' syntax.

No power operator

Perhaps this makes sense in C, since it doesn’t correspond to an actual instruction on most CPUs, but in JavaScript? If you can make + work for strings, I think you can add a **.

If you’re willing to ditch the bitwise operators (or lessen their importance a bit), you can even use ^, as most people would write in regular ASCII text.

Powerless: ACS, C#, Eiffel, Erlang, expr, Forth, Go.

Two out of two stars: Ada, ALGOL ( works too), COBOL, CoffeeScript, ECMAScript 7, Fortran, F#, Groovy, OCaml, Perl, PHP, Python, Ruby, Unix shells.

I tip my hat: awk, BASIC, bc, COBOL, fish, Lua.

Otherwise powerful: APL (), D (^^).

Special mention: Lisps tend to have a named function rather than a dedicated operator (e.g. Math/pow in Clojure, expt in Common Lisp), but since operators are regular functions, this doesn’t stand out nearly so much. Haskell uses all three of ^, ^^, and ** for typing reasons.

C-style for loops

This construct is bad. It very rarely matches what a human actually wants to do, which 90% of the time is “go through this list of stuff” or “count from 1 to 10”. A C-style for obscures those wishes. The syntax is downright goofy, too: nothing else in the language uses ; as a delimiter and repeatedly executes only part of a line. It’s like a tuple of statements.

I said in my previous post about iteration that having an iteration protocol requires either objects or closures, but I realize that’s not true. I even disproved it in the same post. Lua’s own iteration protocol can be implemented without closures — the semantics of for involve keeping a persistent state value and passing it to the iterator function every time. It could even be implemented in C! Awkwardly. And with a bunch of macros. Which aren’t hygenic in C. Hmm, well.

Loopy: ACS, bc, Fortran.

Cool and collected: C#, Clojure, D, Delphi (recent), ECMAScript 6, Eiffel (recent), Go, Groovy, Icon, Inform 7, Java, Julia, Logo, Lua, Nemerle, Nim, Objective-C, Perl, PHP, PostScript, Prolog, Python, R, Rust, Scala, Smalltalk, Swift, Tcl, Unix shells, Visual Basic.

Special mention: Functional languages and Lisps are laughing at the rest of us here. awk has for...in, but it doesn’t iterate arrays in order which makes it rather less useful. JavaScript (pre ES6) has both for...in and for each...in, but both are differently broken, so you usually end up using C-style for or external iteration. BASIC has an ergonomic numeric loop, but no iteration loop. Ruby mostly uses external iteration, and its for block is actually expressed in those terms.

Switch with default fallthrough

We’ve been through this before. Wanting completely separate code per case is, by far, the most common thing to want to do. It makes no sense to have to explicitly opt out of the more obvious behavior.

Breaks my heart: Java, JavaScript.

Follows through: Ada, BASIC, CoffeeScript, Go (has a fallthrough statement), Lisps, Ruby, Swift (has a fallthrough statement), Unix shells.

Special mention: C# and D require break, but require something one way or the other — implicit fallthrough is disallowed except for empty cases. Perl 5 historically had no switch block built in, but it comes with a Switch module, and the last seven releases have had an experimental given block which I stress is still experimental. Python has no switch block. Erlang, Haskell, and Rust have pattern-matching instead (which doesn’t allow fallthrough at all).

Type first

1
int foo;

In C, this isn’t too bad. You get into problems when you remember that it’s common for type names to be all lowercase.

1
foo * bar;

Is that a useless expression, or a declaration? It depends entirely on whether foo is a variable or a type.

It gets a little weirder when you consider that there are type names with spaces in them. And storage classes. And qualifiers. And sometimes part of the type comes after the name.

1
extern const volatile _Atomic unsigned long long int * restrict foo[];

That’s not even getting into the syntax for types of function pointers, which might have arbitrary amounts of stuff after the variable name.

And then C++ came along with generics, which means a type name might also have other type names nested arbitrarily deep.

1
extern const volatile std::unordered_map<unsigned long long int, std::unordered_map<const long double * const, const std::vector<std::basic_string<char>>::const_iterator>> foo;

And that’s just a declaration! Imagine if there were an assignment in there too.

The great thing about static typing is that I know the types of all the variables, but that advantage is somewhat lessened if I can’t tell what the variables are.

Between type-first, function pointer syntax, Turing-complete duck-typed templates, and C++’s initialization syntax, there are several ways where parsing C++ is ambiguous or even undecidable! “Undecidable” here means that there exist C++ programs which cannot even be parsed into a syntax tree, because the same syntax means two different things depending on whether some expression is a value or a type, and that question can depend on an endlessly recursive template instantiation. (This is also a great example of ambiguity, where x * y(z) could be either an expression or a declaration.)

Contrast with, say, Rust:

1
let x: ... = ...;

This is easy to parse, both for a human and a computer. The thing before the colon must be a variable name, and it stands out immediately; the thing after the colon must be a type name. Even better, Rust has pretty good type inference, so the type is probably unnecessary anyway.

Of course, languages with no type declarations whatsoever are immune to this problem.

Most vexing: ACS, ALGOL, C#, D (though [] goes on the type), Fortran, Java, Perl 6.

Looks Lovely: Ada, Boo, F#, Go, Python 3 (via annotation syntax and the typing module), Rust, Swift, TypeScript.

Special mention: BASIC uses trailing type sigils to indicate scalar types.

Weak typing

Please note: this is not the opposite of static typing. Weak typing is more about the runtime behavior of values — if I try to use a value of type T as though it were of type U, will it be implicitly converted?

C lets you assign pointers to int variables and then take square roots of them, which seems like a bad idea to me. C++ agreed and nixed this, but also introduced the ability to make your own custom types implicitly convertible to as many other types you want.

This one is pretty clearly a spectrum, and I don’t have a clear line. For example, I don’t fault Python for implicitly converting between int and float, because int is infinite-precision and float is 64-bit, so it’s usually fine. But I’m a lot more suspicious of C, which lets you assign an int to a char without complaint. (Well, okay. Literal integers in C are ints, which poses a slight problem.)

I do count a combined addition/concatenation operator that accepts different types of arguments as a form of weak typing.

Weak: JavaScript (+), PHP, Unix shells (almost everything’s a string, but even arrays/scalars are somewhat interchangeable).

Strong: F#, Go (explicit numeric casts), Haskell, Python, Rust (explicit numeric casts).

Special mention: ACS only has integers; even fixed-point values are stored in integers, and the compiler has no notion of a fixed-point type, making it the weakest language imaginable. C++ and Scala both allow defining implicit conversions, for better or worse. Perl 5 is weak, but it avoids most of the ambiguity by having entirely separate sets of operators for string vs numeric operations. Python 2 is mostly strong, but that whole interchangeable bytes/text thing sure caused some ruckus. Tcl only has strings.

Integer division

Hey, new programmers!” you may find yourself saying. “Don’t worry, it’s just like math, see? Here’s how to use $LANGUAGE as a calculator.”

Oh boy!” says your protégé. “Let’s see what 7 ÷ 2 is! Oh, it’s 3. I think the computer is broken.”

They’re right! It is broken. I have genuinely seen a non-trivial number of people come into #python thinking division is “broken” because of this.

To be fair, C is pretty consistent about making math operations always produce a value whose type matches one of the arguments. It’s also unclear whether such division should produce a float or a double. Inferring from context would make sense, but that’s not something C is really big on.

Quick test: 7 / 2 is 3½, not 3.

Integrous: bc, C#, D, expr, F#, Fortran, Go, OCaml, Python 2, Ruby, Rust (hard to avoid), Unix shells.

Afloat: awk (no integers), Clojure (produces a rational!), Groovy, JavaScript (no integers), Lua (no integers until 5.3), Nim, Perl 5 (no integers), Perl 6, PHP, Python 3.

Special mention: Haskell disallows / on integers. Nim, Haskell, Perl 6, Python, and probably others have separate integral division operators: div, div, div, and //, respectively.

Bytestrings

Strings” in C are arrays of 8-bit characters. They aren’t really strings at all, since they can’t hold the vast majority of characters without some further form of encoding. Exactly what the encoding is and how to handle it is left entirely up to the programmer. This is a pain in the ass.

Some languages caught wind of this Unicode thing in the 90s and decided to solve this problem once and for all by making “wide” strings with 16-bit characters. (Even C95 has this, in the form of wchar_t* and L"..." literals.) Unicode, you see, would never have more than 65,536 characters.

Whoops, so much for that. Now we have strings encoded as UTF-16 rather than UTF-8, so we’re paying extra storage cost and we still need to write extra code to do basic operations right. Or we forget, and then later we have to track down a bunch of wonky bugs because someone typed a 💩.

Note that handling characters/codepoints is very different from handling glyphs, i.e. the distinct shapes you see on screen. Handling glyphs doesn’t even really make sense outside the context of a font, because fonts are free to make up whatever ligatures they want. Remember “diverse” emoji? Those are ligatures of three to seven characters, completely invented by a font vendor. A programming language can’t reliably count the display length of that, especially when new combining behaviors could be introduced at any time.

Also, it doesn’t matter how you solve this problem, as long as it appears to be solved. I believe Ruby uses bytestrings, for example, but they know their own encoding, so they can be correctly handled as sequences of codepoints. Having a separate non-default type or methods does not count, because everyone will still use the wrong thing first — sorry, Python 2.

Quick test: what’s the length of “💩”? If 1, you have real unencoded strings. If 2, you have UTF-16 strings. If 4, you have UTF-8 strings. If something else, I don’t know what the heck is going on.

Totally bytes: Go, Lua, Python 2 (separate unicode type).

Comes up short: Java, JavaScript.

One hundred emoji: Python 3, Ruby, Rust, Swift (even gets combining characters right!).

Special mention: Go’s strings are explicitly arbitrary byte sequences, but iterating over a string with for..range decodes UTF-8 code points. Perl 5 gets the quick test right if you put use utf8; at the top of the file, but Perl 5’s Unicode support is such a confusing clusterfuck that I can’t really give it a 💯.

Hmm. This one is kind of hard to track down for sure without either knowing a lot about internals or installing fifty different interpreters/compilers.

Increment and decrement

I don’t think there are too many compelling reasons to have ++. It means the same as += 1, which is still nice and short. The only difference is that people can do stupid unreadable tricks with ++.

One exception: it is possible to overload ++ in ways that don’t make sense as += 1 — for example, C++ uses ++ to advance iterators, which may do any arbitrary work under the hood.

Double plus ungood: ACS, awk, C#, D, Go, Java, JavaScript, Perl, Unix shells, Vala.

Double plus good: Lua (which doesn’t have += either), Python, Ruby, Rust, Swift (removed in v3).

Special mention: Perl 5 and PHP both allow ++ on strings, in which case it increments letters or something, but I don’t know if much real code has ever used this.

!

A pet peeve. Spot the difference:

1
2
3
4
5
6
if (looks_like_rain()) {
    ...
}
if (!looks_like_rain()) {
    ...
}

That single ! is ridiculously subtle, which seems wrong to me when it makes an expression mean its polar opposite. Surely it should stick out like a sore thumb. The left parenthesis makes it worse, too; it blends in slightly as just noise.

It helps a bit to space after the ! in cases like this:

1
2
3
if (! looks_like_rain()) {
    ...
}

But this seems to be curiously rare. The easy solution is to just spell the operator not. At which point the other two might as well be and and or.

Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

Not right: ACS, awk, C#, D, Go, Groovy, Java, JavaScript, Nemerle, PHP, R, Rust, Scala, Swift, Tcl, Vala.

Spelled out: Ada, ALGOL, BASIC, COBOL, Erlang, F#, Fortran, Haskell, Inform 7, Lisps, Lua, Nim, OCaml, Pascal, PostScript, Python, Smalltalk, Standard ML.

Special mention: APL and Julia both use ~, which is at least easier to pick out, which is more than I can say for most of APL. bc and expr, which are really calculators, have no concept of Boolean operations. Forth and Icon, which are not calculators, don’t seem to either. Inform 7 often blends the negation into the verb, e.g. if the player does not have.... Perl and Ruby have both symbolic and named Boolean operators (Perl 6 has even more), with different precedence (which inside if won’t matter); I believe Perl 5 prefers the words and Ruby prefers the symbols. Perl and Ruby also both have a separate unless block, with the opposite meaning to if. Python has is not and not in operators.

Single return and out parameters

Because C can only return a single value, and that value is often an indication of failure for the sake of an if, “out” parameters are somewhat common.

1
2
double x, y;
get_point(&x, &y);

It’s not immediately clear whether x and y are input or output. Sometimes they might function as both. (And of course, in this silly example, you’d be better off returning a single point struct. Or would you use a point out parameter because returning structs is potentially expensive?)

Some languages have doubled down on this by adding syntax to declare “out” parameters, which removes the ambiguity in the function definition, but makes it worse in function calls. In the above example, using & on an argument is at least a decent hint that the function wants to write to those values. If you have implicit out parameters or pass-by-reference or whatever, that would just be get_point(x, y) and you’d have no indication that those arguments are special in any way.

The vast majority of the time, this can be expressed in a more straightforward way by returning multiple values:

1
x, y = get_point()

That was intended as Python, but technically, Python doesn’t have multiple returns! It seems to, but it’s really a combination of several factors: a tuple type, the ability to make a tuple literal with just commas, and the ability to unpack a tuple via multiple assignment. In the end it works just as well. Also this is a way better use of the comma operator than in C.

But the exact same code could appear in Lua, which has multiple return/assignment as an explicit feature… and no tuples. The difference becomes obvious if you try to assign the return value to a single variable instead:

1
point = get_point()

In Python, point would be a tuple containing both return values. In Lua, point would be the x value, and y would be silently discarded. I don’t tend to be a fan of silently throwing data away, but I have to admit that Lua makes pretty good use of this in several places for “optional” return values that the caller can completely ignore if desired. An existing function can even be extended to return more values than before — that would break callers in Python, but work just fine in Lua.

(Also, to briefly play devil’s advocate: I once saw Python code that returned 14 values all with very complicated values, types, and semantics. Maybe don’t do that. I think I cleaned it up to return an object, which simplified the calling code considerably too.)

It’s also possible to half-ass this. ECMAScript 6::

1
2
3
4
5
function get_point() {
    return [1, 2];
}

var [x, y] = get_point();

It works, but it doesn’t actually look like multiple return. The trouble is that JavaScript has C’s comma operator and C’s variable declaration syntax, so neither of the above constructs could’ve left off the brackets without significantly changing the syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function get_point() {
    // Whoops!  This uses the comma operator, which evaluates to its last
    // operand, so it just returns 2
    return 1, 2;
}

// Whoops!  This is multiple declaration, where each variable gets its own "=",
// so it assigns nothing to x and the return value to y
var x, y = get_point();
// Now x is undefined and y is 2

This is still better than either out parameters or returning an explicit struct that needs manual unpacking, but it’s not as good as comma-delimited tuples. Note that some languages require parentheses around tuples (and also call them tuples), and I’m arbitrarily counting that as better than bracket.

Single return: Ada, ALGOL, BASIC, C#, COBOL, Fortran, Groovy, Java, Smalltalk.

Half-assed multiple return: C++11, D, ECMAScript 6, Erlang, PHP.

Multiple return via tuples: F#, Haskell, Julia, Nemerle, Nim, OCaml, Perl (just lists really), Python, Ruby, Rust, Scala, Standard ML, Swift, Tcl.

Native multiple return: Common Lisp, Go, Lua.

Special mention: C# has explicit syntax for out parameters, but it’s a compile-time error to not assign to all of them, which is slightly better than C. Forth is stack-based, and all return values are simply placed on the stack, so multiple return isn’t a special case. Unix shell functions don’t return values. Visual Basic sets a return value by assigning to the function’s name (?!), so good luck fitting multiple return in there.

Silent errors

Most runtime errors in C are indicated by one of two mechanisms: returning an error code, or segfaulting. Segfaulting is pretty noisy, so that’s okay, except for the exploit potential and all.

Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them, and of course we silly squishy humans have the habit of assuming everything will succeed at all times. Which is how I segfaulted git two days ago: I found a spot where it didn’t check for a NULL returned as an error.

There are several alternatives here: exceptions, statically forcing the developer to check for an error code, or using something monad-like to statically force the developer to distinguish between an error and a valid return value. Probably some others. In the end I was surprised by how many languages went the exception route.

Quietly wrong: Unix shells. Wow, yeah, I’m having a hard time naming anything else. Good job, us! And even Unix shells have set -e; it’s just opt-in.

Exceptional: Ada, C++, C#, D, Erlang, Forth, Java (exceptions are even part of function signature), JavaScript, Nemerle, Nim, Objective-C, OCaml, Perl 6, Python, Ruby, Smalltalk, Standard ML, Visual Basic.

Monadic: Haskell (Either), Rust (Result).

Special mention: ACS doesn’t really have many operations that can error, and those that do simply halt the script. ALGOL apparently has something called “mending” that I don’t understand. Go tends to use secondary return values, which calling code has to unpack, making them slightly harder to forget about; it also allows both the assignment and the error check together in the header of an if. Lisps have conditions and call/cc, which are different things entirely. Lua and Perl 5 handle errors by taking down the whole program, but offer a construct that can catch that further up the stack, which is clumsy but enough to emulate try..catch. PHP has exceptions, and errors (which are totally different), and a lot of builtin functions that return error codes. Swift has something that looks like exceptions, but it doesn’t involve stack unwinding and does require some light annotation — apparently sugar for an “out” parameter holding an error. Visual Basic, and I believe some other BASICs, decided C wasn’t bad enough and introduced the bizarre On Error Resume Next construct which does exactly what it sounds like.

Nulls

The billion dollar mistake.

I think it’s considerably worse in a statically typed language like C, because the whole point is that you can rely on the types. But a double* might be NULL, which is not actually a pointer to a double; it’s a pointer to a segfault. Other kinds of bad pointers are possible, of course, but those are more an issue of memory safety; allowing any reference to be null violates type safety. The root of the problem is treating null as a possible value of any type, when really it’s its own type entirely.

The alternatives tend to be either opt-in nullability or an “optional” generic type (a monad!) which eliminates null as its own value entirely. Notably, Swift does it both ways: optional types are indicated by a trailing ?, but that’s just syntactic sugar for Option<T>.

On the other hand, while it’s annoying to get a None where I didn’t expect one in Python, it’s not like I’m surprised. I occasionally get a string where I expected a number, too. The language explicitly leaves type concerns in my hands. My real objection is to having a static type system that lies. So I’m not going to list every single dynamic language here, because not only is it consistent with the rest of the type system, but they don’t really have any machinery to prevent this anyway.

Nothing doing: C#, D, Go, Java, Nim (non-nullable types are opt in).

Nullable types: Swift (sugar for a monad).

Monads: F# (Option — though technically F# also inherits null from .NET), Haskell (Maybe), Rust (Option), Swift (Optional).

Special mention: awk, Tcl, and Unix shells only have strings, so in a surprising twist, they have no concept of null whatsoever. Java recently introduced an Optional<T> type which explicitly may or may not contain a value, but since it’s still a non-primitive, it could also be null. C++17 doesn’t quite have the same problem with std::optional<T>, since non-reference values can’t be null. Inform 7’s nothing value is an object (the root of half of its type system), which means any object variable might be nothing, but any value of a more specific type cannot be nothing. JavaScript has two null values, null and undefined. Perl 6 is really big on static types, but claims its Nil object doesn’t exist, and I don’t know how to even begin to unpack that. R and SQL have a more mathematical kind of NULL, which tends to e.g. vanish from lists.

Assignment as expression

How common a mistake is this:

1
2
3
if (x = 3) {
    ...
}

Well, I don’t know, actually. Maybe not that common, save for among beginners. But I sort of wonder whether allowing this buys us anything. I can only think of two cases where it does. One is with something like iteration:

1
2
3
4
// Typical linked list
while (p = p->next) {
    ...
}

But this is only necessary in C in the first place because it has no first-class notion of iteration. The other is shorthand for checking that a function returned a useful value:

1
2
3
if (ptr = get_pointer()) {
    ...
}

But if a function returns NULL, that’s really an error condition, and presumably you have some other way to handle that too.

What does that leave? The only time I remotely miss this in Python (where it’s illegal) is when testing a regex. You tend to see this a lot instead.

1
2
3
m = re.match('x+y+z+', some_string)
if m:
    ...

re treats failure as an acceptable possibility and returns None, rather than raising an exception. I’m not sure whether this was the right thing to do or not, but off the top of my head I can’t think of too many other Python interfaces that sometimes return None.

Freedom of expression: ACS, C#, Java, JavaScript, Perl, PHP, Swift.

Makes a statement: Inform 7, Lua, Python, Unix shells.

Special mention: BASIC uses = for both assignment and equality testing — the meaning is determined from context. D allows variable declaration as an expression, so if (int x = 3) is allowed, but regular assignment is not. Functional languages generally don’t have an assignment operator. Go disallows assignment as an expression, but assignment and a test can appear together in an if condition, and this is an idiomatic way to check success. Ruby makes everything an expression, so assignment might as well be too. Rust makes everything an expression, but assignment evaluates to the useless () value (due to ownership rules), so it’s not actually useful. Rust and Swift both have a special if let block that explicitly combines assignment with pattern matching, which is way nicer than the C approach.

No hyphens in identifiers

snake_case requires dancing on the shift key (unless you rearrange your keyboard, which is perfectly reasonable). It slows you down slightly and leads to occasional mistakes like snake-Case. The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

Why not just allow hyphens in identifiers, so we can avoid this argument and use kebab-case?

Ah, but then it’s ambiguous whether you mean an identifier or the subtraction operator. No problem: require spaces for subtraction. I don’t think a tiny way you’re allowed to make your code harder to read is really worth this clear advantage.

Low scoring: ACS, C#, D, Java, JavaScript, OCaml, Pascal, Perl 5, PHP, Python, Ruby, Rust, Swift, Unix shells.

Nicely-designed: COBOL, CSS (and thus Sass), Forth, Inform 7, Lisps, Perl 6, XML.

Special mention: Perl has a built-in variable called $-, and Ruby has a few called $-n for various values of “n”, but these are very special cases.

Braces and semicolons

Okay. Hang on. Bear with me.

C code looks like this.

1
2
3
4
5
some block header {
    line 1;
    line 2;
    line 3;
}

The block is indicated two different ways here. The braces are for the compiler; the indentation is for humans.

Having two different ways to say the same thing means they can get out of sync. They can disagree. And that can be, as previously mentioned, really bad. This is really just a more general form of the problem of optional block delimiters.

The only solution is to eliminate one of the two. Programming languages exist for the benefit of humans, so we obviously can’t get rid of the indentation. Thus, we should get rid of the braces. QED.

As an added advantage, we reclaim all the vertical space wasted on lines containing only a }, and we can stop squabbling about where to put the {.

If you accept this, you might start to notice that there are also two different ways of indicating where a line ends: with semicolons for the compiler, and with vertical whitespace for humans. So, by the same reasoning, we should lose the semicolons.

Right? Awesome. Glad we’re all on the same page.

Some languages use keywords instead of braces, but the effect is the same. I’m not aware of any languages that use keywords instead of semicolons.

Bracing myself: C#, D, Erlang, Java, Perl, Rust.

Braces, but no semicolons: Go (ASI), JavaScript (ASI — see below), Lua, Ruby, Swift.

Free and clear: CoffeeScript, Haskell, Python.

Special mention: Lisp, just, in general. Inform 7 has an indented style, but it still requires semicolons. MUMPS doesn’t support nesting at all, but I believe there are extensions that use dots to indicate it.

Here’s some interesting trivia. JavaScript, Lua, and Python all optionally allow semicolons at the end of a statement, but the way each language determines line continuation is very different.

JavaScript takes an “opt-out” approach: it continues reading lines until it hits a semicolon, or until reading the next line would cause a syntax error. (This approach is called automatic semicolon insertion.) That leaves a few corner cases like starting a new line with a (, which could look like the last thing on the previous line is a function you’re trying to call. Or you could have -foo on its own line, and it would parse as subtraction rather than unary negation. You might wonder why anyone would do that, but using unary + is one way to make function parse as an expression rather than a statement! I’m not so opposed to semicolons that I want to be debugging where the language thinks my lines end, so I just always use semicolons in JavaScript.

Python takes an “opt-in” approach: it assumes, by default, that a statement ends at the end of a line. However, newlines inside parentheses or brackets are ignored, which takes care of 99% of cases — long lines are most frequently caused by function calls (which have parentheses!) with a lot of arguments. If you really need it, you can explicitly escape a newline with \\, but this is widely regarded as incredibly ugly.

Lua avoids the problem almost entirely. I believe Lua’s grammar is designed such that it’s almost always unambiguous where a statement ends, even if you have no newlines at all. This has a few weird side effects: void expressions are syntactically forbidden in Lua, for example, so you just can’t have -foo as its own statement. Also, you can’t have code immediately following a return, because it’ll be interpreted as a return value. The upside is that Lua can treat newlines just like any other whitespace, but still not need semicolons. In fact, semicolons aren’t statement terminators in Lua at all — they’re their own statement, which does nothing. Alas, not for lack of trying, Lua does have the same ( ambiguity as JavaScript (and parses it the same way), but I don’t think any of the others exist.

Oh, and the colons that Python has at the end of its block headers, like if foo:? As far as I can tell, they serve no syntactic purpose whatsoever. Purely aesthetic.

Blaming the programmer

Perhaps one of the worst misfeatures of C is the ease with which responsibility for problems can be shifted to the person who wrote the code. “Oh, you segfaulted? I guess you forgot to check for NULL.” If only I had a computer to take care of such tedium for me!

Clearly, computers can’t be expected to do everything for us. But they can be expected to do quite a bit. Programming languages are built for humans, and they ought to eliminate the sorts of rote work humans are bad at whenever possible. A programmer is already busy thinking about the actual problem they want to solve; it’s no surprise that they’ll sometimes forget some tedious detail the language forces them to worry about.

So if you’re designing a language, don’t just copy C. Don’t just copy C++ or Java. Hell, don’t even just copy Python or Ruby. Consider your target audience, consider the problems they’re trying to solve, and try to get as much else out of the way as possible. If the same “mistake” tends to crop up over and over, look for a way to modify the language to reduce or eliminate it. And be sure to look at a lot of languages for inspiration — even ones you hate, even weird ones no one uses! A lot of clever people have had a lot of other ideas in the last 44 years.


I hope you enjoyed this accidental cross-reference of several dozen languages! I enjoyed looking through them all, though it was incredibly time-consuming. Some of them look pretty interesting; maybe give them a whirl.

Also, dammit, now I’m thinking about language design again.

Amazon SES Now Offers Dedicated IP Addresses

Post Syndicated from Cristian Smochina original https://aws.amazon.com/blogs/ses/amazon-ses-now-offers-dedicated-ip-addresses/

The SES team is pleased to announce that, starting today, you can lease IP addresses dedicated exclusively to your email sending. Prior to the availability of dedicated IPs, all Amazon SES customers sent their email through IP addresses shared by other Amazon SES customers. Although shared IPs remain the best fit for many of our customers, dedicated IPs offer several benefits that might be a great fit for your use case. This blog post explains the use cases that benefit from dedicated IPs, how to request them, and the considerations you need to take when you use them.

Why should I use dedicated IPs?

You might choose to lease dedicated IPs so that you can fully control the email coming from your IPs. Because you know exactly what mail is being sent from those IP addresses, it can make troubleshooting deliverability issues simpler.

Also, most email certification programs require you to have dedicated IPs because they demonstrate your commitment to managing your email reputation.

You might also choose dedicated IPs if you send multiple types of mail, such as marketing and transactional mail, and you want to isolate the reputation of your mail streams to improve your visibility and control over your email deliverability.

Dedicated IPs also address whitelisting and security requirements, because they are an unchanging set of IP addresses. For example, you might use SES to send yourself operational notifications such as alarms. Knowing that all mail originating from your dedicated IPs is safe, you can whitelist the dedicated IPs with your incoming mail servers to ensure that they always accept mail from those IP addresses.

Am I a good candidate for dedicated IPs?

To be a good candidate for dedicated IPs, we typically recommend a sustained and consistent sending pattern and a minimum daily sending volume of 175,000 emails per day (on average). However, if you need dedicated IPs for whitelisting or security purposes and your sending does not meet the minimum volume requirement, feel free to apply anyway, and we will evaluate your request.

SES provides extensive documentation on email-sending best practices as well as guidelines regarding IP-blocking events, but with dedicated IPs the onus is on you to ensure that you send high-quality email that your recipients expect, want, and engage with. Even though you control the email coming from your dedicated IP addresses, SES will continue to monitor SES’s entire IP space to ensure a high deliverability bar for all SES customers.

To learn more about the requirements and whether dedicated IPs are right for you, see Trade-offs Between Dedicated IPs and Shared IPs in the developer guide.

How can I request dedicated IPs?

To request dedicated IPs, open an SES Sending Limits Increase case in Support Center. You can also reach the limit increase form from the Amazon SES console. First, however, be sure to read Requesting Dedicated IPs in the developer guide for instructions on how to use an SES Sending Limits Increase case to request dedicated IPs.

How will my sending process change?

If your request for dedicated IPs is granted, you have to warm up your dedicated IPs before you continue to send emails in the usual way. Warming up your dedicated IPs before you start sending large amounts of email is needed because many receiving ISPs will not accept email from IPs that suddenly send a large volume of email. You warm up your IPs by gradually increasing the volume of emails you send through a new IP address. Each IP address needs some time to build a positive reputation. ISPs determine an IP address’s reputation in part based on the quality and the volume of email sent through that IP address. If you’re not sending a lot of email, a dedicated IP can actually hurt your deliverability because receiving servers aren’t seeing enough email to know whether to trust the IP address or not. Also, your traffic might be rejected by ISPs if you don’t warm up your IPs before sending. You can find more guidance about the warm-up process in the developer guide.

Do dedicated IPs incur an extra cost?

Yes. See the Amazon SES pricing page for dedicated IP pricing information.

Why would I choose to continue using the shared IP model instead of moving to dedicated IPs?

With SES, shared IPs are commitment-free, enable you to send with arbitrary email volume, and there is no cost beyond the regular SES email sending price. Shared IPs are also pre-warmed, which means that you can start sending a large volume of emails through them right away. With shared IPs, it is easier to maintain a good reputation with irregular sending spikes and there is no minimum usage required.

Can I use both shared IPs and dedicated IPs for my sending?

Yes. You can do this by using two different AWS accounts. Request dedicated IPs for one account, and leave the other account at its default option, which is the shared IP model. For example, you could send your transactional mail, which has a steadier sending cadence, through dedicated IPs. For large bursts of mail, you could use shared IPs instead. More guidance about sending from multiple AWS accounts can be found here.

We hope you find this feature useful! If you have any questions or comments, let us know in the SES Forum or here in the comment section of the blog.

New – Sending Metrics for Amazon Simple Email Service (SES)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-sending-metrics-for-amazon-simple-email-service-ses/

Amazon Simple Email Service (SES) focuses on deliverability – getting email through to the intended recipients. In my launch blog post (Introducing the Amazon Simple Email Service), I noted that several factors influence delivery, including the level of trust that you have earned with multiple Internet Service Providers (ISPs) and the number of complaints and bounces that you generate.

Today we are launching a new set of sending metrics for SES. There are two aspects to this feature:

Event Stream – You can configure SES to publish a JSON-formatted record to Amazon Kinesis Firehose each time a significant event (sent, rejected, delivered, bounced, or complaint generated) occurs.

Metrics – You can configure SES to publish aggregate metrics to Amazon CloudWatch. You can add one or more tags to each message and use them as CloudWatch dimensions. Tagging messages gives you the power to track deliverability based on campaign, team, department, and so forth.  You can then use this information to fine-tune your messages and your email strategy.

To learn more, read about Email Sending Metrics on the Amazon Simple Email Service Blog.


Jeff;

Debunking Trump’s "secret server"

Post Syndicated from Robert Graham original http://blog.erratasec.com/2016/11/debunking-trumps-secret-server.html

According to this Slate article, Trump has a secret server for communicating with Russia. Even Hillary has piled onto this story.

This is nonsense. The evidence available on the Internet is that Trump neither (directly) controls the domain “trump-email.com“, nor has access to the server. Instead, the domain was setup and controlled by Cendyn, a company that does marketing/promotions for hotels, including many of Trump’s hotels. Cendyn outsources the email portions of its campaigns to a company called Listrak, which actually owns/operates the physical server in a data center in Philidelphia.

In other words,  Trump’s response is (minus the political bits) likely true, supported by the evidence. It’s the conclusion I came to even before seeing the response.

When you view this “secret” server in context, surrounded by the other email servers operated by Listrak on behalf of Cendyn, it becomes more obvious what’s going on. In the same Internet address range of Trump’s servers you see a bunch of similar servers, many named [client]-email.com. In other words, trump-email.com is not intended as a normal email server you and I are familiar with, but as a server used for marketing/promotional campaigns.

It’s Cendyn that registered and who controls the trump-email.com domain, as seen in the WHOIS information. That the Trump Organization is the registrant, but not the admin, demonstrates that Trump doesn’t have direct control over it.

When the domain information was changed last September 23, it was Cendyn who did the change, not the Trump Organization. This link lists a bunch of other hotel-related domains that Cendyn likewise controls, some Trump related, some related to Trump’s hotel competitors, like Hyatt and Sheraton.

Cendyn’s claim they are reusing the server for some other purpose is likely true. If you are an enterprising journalist with $399 in your budget, you can find this out. Use the website http://reversewhois.domaintools.com/ to get a complete list of the 641 other domains controlled by Cendyn, then do an MX query for each one to find out which of them is using mail1.trump-email.com as their email server.

This is why we can’t have nice things on the Internet. Investigative journalism is dead. The Internet is full of clues like this if only somebody puts a few resources into figuring things out. For example, organizations that track spam will have information on exactly which promotions this server has been used for in the recent past. Those who operate public DNS resolvers, like Google’s 8.8.8.8, OpenDNS, or Dyn, may have knowledge which domain was related to mail1.trump-email.com.

Indeed, one journalist did call one of the public resolvers, and found other people queried this domain than the two listed in the Slate story — debunking it. I’ve heard from other DNS malware researchers (names remain anonymous) who confirm they’ve seen lookups for “mail1.trump-email.com” from all over the world, especially from tools like FireEye that process lots of spam email. One person claimed that lookups started failing for them back in late June — and thus the claim of successful responses until September are false. In other words, the “change” after the NYTimes queried Alfa Bank may not be because Cendyn (or Trump) changed anything, but because that was the first they checked and noticed that lookup errors were happening.

Since I wrote this blog post at midnight, so I haven’t confirmed this with anybody yet, but there’s a good chance that the IP address 66.216.133.29 has continued to spew spam for Trump hotels during this entire time. This would, of course would generate lookups (both reverse and forward). It seems like everyone who works for IT for a large company should be able to check their incoming email logs and see if they’ve been getting emails from that address over the last few months. If you work in IT, please check your logs for the last few months and Tweet me at @erratarob with the results, either positive or negative.

And finally, somebody associated with Alfa Bank IT operations confirms that executives like to stay at Trump hotels all the time (like in Vegas and New York), and there was a company function one of Trump’s golf courses. In other words, there’s good reason for the company to get spam from, and need to communicate with, Trump hotels to coordinate events.

And so on and so forth — there’s a lot of information out there if we just start digging.

Conclusion

That this is just normal marketing business from Cendyn and Listrak is the overwhelming logical explanation for all this. People are tempted to pull nefarious explanations out of their imaginations for things they don’t understand. But for those of us with experience in this sort of thing, what we see here is a normal messed up marketing (aka. spam) system that the Trump Organization doesn’t have control over. Knowing who owns and controls these servers, it’s unreasonable to believe that Trump is using them for secret emails. Far from “secret” or “private” servers as Hillary claims, these servers are wide open and obvious.

This post provides a logic explanation, but we can’t count on this being provably debunked until those like Dyn come forward, on the record, and show us lookups that don’t come from Alfa Bank. Or, those who work in big companies can pull records from their incoming email servers, to show that they’ve been receiving spam from that IP address over the last few months. Either of these would conclusively debunk the story.


But experts say…

But the article quotes several experts confirming the story, so how does that jibe with this blog post. The answer is that none of the experts confirmed the story.

Read more carefully. None of the identified experts confirmed the story. Instead, the experts looked at pieces, and confirmed part of the story. Vixie rightly confirmed that the pattern of DNS requests came from humans, and not automated systems. Chris Davis rightly confirmed the server doesn’t look like a normal email server.

Neither of them, however, confirmed that Trump has a secret server for communicating with the Russians. Both of their statements are consistent with what I describe above — that’s it’s a Cendyn operated server for marketing campaigns independent of the Trump Organization.


Those researchers violated their principles

The big story isn’t the conspiracy theory about Trump, but that these malware researchers exploited their privileged access for some purpose other than malware research.

Malware research consists of a lot of informal relationships. Researchers get DNS information from ISPs, from root servers, from services like Google’s 8.8.8.8 public DNS. It’s a huge privacy violation — justified on the principle that it’s for the general good. Sometimes the fact that DNS information is shared is explicit, like with Google’s service. Sometimes people don’t realize how their ISP shares information, or how many of the root DNS servers are monitored.

People should be angrily calling their ISPs and ask them if they share DNS information with untrustworthy researchers. People should be angrily asking ICANN, which is no longer controlled by the US government (sic), whether it’s their policy to share DNS lookup information with those who would attempt to change US elections.

There’s not many sources for this specific DNS information. Alfa Bank’s servers do their own resolution, direction from the root on down. It’s unlikely they were monitoring Alfa Bank’s servers directly, or monitoring Cendyn’s authoritative servers. That means some sort of passive DNS on some link in between, which is unlikley. Conversely, they could be monitoring one of the root domain servers — but this monitoring wouldn’t tell them the difference between a successful or failed lookup, which they claim to have. In short, of all the sources of “DNS malware information” I’ve heard about, none of it would deliver the information these researchers claim to have (well, except the NSA with their transatlantic undersea taps, of course).

Update: this tweet points out original post mentions getting data from “ams-ix23” node, which hints at AMS-IX, Amsterdam InterXchange, where many root server nodes are located.

Accessible games

Post Syndicated from Eevee original https://eev.ee/blog/2016/10/29/accessible-games/

I’ve now made a few small games. One of the trickiest and most interesting parts of designing them has been making them accessible.

I mean that in a very general and literal sense. I want as many people as possible to experience as much of my games as possible. Finding and clearing out unnecessary hurdles can be hard, but every one I leave risks losing a bunch of players who can’t or won’t clear it.

I’ve noticed three major categories of hurdle, all of them full of tradeoffs. Difficulty is what makes a game challenging, but if a player can’t get past a certain point, they can never see the rest of the game. Depth is great, but not everyone has 80 hours to pour into a game, and it’s tough to spend weeks of dev time on stuff most people won’t see. Distribution is a question of who can even get your game in the first place.

Here are some thoughts.

Mario Maker

Mario Maker is most notable for how accessible it is to budding game designers, which is important but also a completely different sense of accessibility.

The really nice thing about Mario Maker is that its levels are also accessible to players. Virtually everyone who’s heard of video games has heard of Mario. You don’t need to know many rules to be able to play. Move to the right, jump over/on things, and get to the flag.

(The “distribution” model is a bit of a shame, though — you need to own a particular console and a $60 game. If I want people to play a single individual level I made, that’s a lot of upfront investment to ask for. Ultimately Nintendo is in this to sell their own game more than to help people show off their own.)

But the emergent depth of Mario Maker’s myriad objects — the very property that makes the platform more than a toy — also makes it less accessible. Everyone knows you move around and jump, but not everyone knows you can pick up an item with B, or that you can put on a hat you’re carrying by pressing , or that you can spinjump on certain hazards. And these are fairly basic controls — Mario Maker contains plenty of special interactions between more obscure objects, and no manual explaining them all.

I thought it was especially interesting that Nintendo’s own comic series on building Mario Maker levels specifically points out that running jumps don’t come naturally to everyone. It’s hard to imagine too many people playing Mario Maker and not knowing how to jump while running.

And yet.

And yet, imagine being one such person, and encountering a level that requires a running jump early on. You can’t get past it. You might not even understand how to get past it; perhaps you don’t even know Mario can run. Now what? That’s it, you’re stuck. You’ll never see the rest of that level. It’s a hurdle, in a somewhat more literal sense.

Why make the level that way in the first place, then? Does any seasoned Mario player jump over a moderate-width gap and come away feeling proud for having conquered it? Seems unlikely.

I’ve tried playing through 100 Mario Challenge on Expert a number of times (without once managing to complete it), and I’ve noticed three fuzzy categories. Some levels are an arbitrary mess of hazards right from the start, so I don’t expect them to get any easier. Some levels are clearly designed as difficult obstacle courses, so again, I assume they’ll be just as hard all the way through. In both cases, if I give up and skip to the next level, I don’t feel like I’m missing out on anything — I’m not the intended audience.

But there are some Expert-ranked levels that seem pretty reasonable… until this one point where all hell breaks loose. I always wonder how deliberate those parts are, and I vaguely regret skipping them — would the rest of the level have calmed back down and been enjoyable?

That’s the kind of hurdle I think about when I see conspicuous clusters of death markers in my own levels. How many people died there and gave up? I make levels intending for people to play them, to see them through, but how many players have I turned off with some needlessly tricky part?

One of my levels is a Boo house with a few cute tricks in it. Unfortunately, I also put a ring of Boos right at the beginning that’s tricky to jump through, so it’s very easy for a player to die several times right there and never see anything else.

I wanted my Boo house to be interesting rather than difficult, but I let difficulty creep in accidentally, and so I’ve reduced the number of people who can appreciate the interestingness. Every level I’ve made since then, I’ve struggled to keep the difficulty down, and still sometimes failed. It’s easy to make a level that’s very hard; it’s surprisingly hard to make a level that’s fairly easy. All it takes is a single unintended hurdle — a tricky jump, an awkwardly-placed enemy — to start losing players.

This isn’t to say that games should never be difficult, but difficulty needs to be deliberately calibrated, and that’s a hard thing to do. It’s very easy to think only in terms of “can I beat this”, and even that’s not accurate, since you know every nook and cranny of your own level. Can you beat it blind, on the first few tries? Could someone else?

Those questions are especially important in Mario Maker, where the easiest way to encounter an assortment of levels is to play 100 Mario Challenge. You have 100 lives and need to beat 16 randomly-chosen levels. If you run out of lives, you’re done, and you have to start over. If I encounter your level here, I can’t afford to burn more than six or seven lives on it, or I’ll game over and have wasted my time. So if your level looks ridiculously hard (and not even in a fun way), I’ll just skip it and hope I get a better level next time.

I wonder if designers forget to calibrate for this. When you spend a lot of time working on something, it’s easy to imagine it exists in a vacuum, to assume that other people will be as devoted to playing it as you were to making it.

Mario Maker is an extreme case: millions of levels are available, and any player can skip to another one with the push of a button. That might be why I feel like I’ve seen a huge schism in level difficulty: most Expert levels are impossible for me, whereas most Normal levels are fairly doable with one or two rough patches. I haven’t seen much that’s in the middle, that feels like a solid challenge. I suspect that people who are very good at Mario are looking for an extreme challenge, and everyone else just wants to play some Mario, so moderate-difficulty levels just aren’t as common. The former group will be bored by them, and the latter group will skip them.

Or maybe that’s a stretch. It’s hard to generalize about the game’s pool of levels when they number in the millions, and I can’t have played more than a few hundred.

What Mario Maker has really taught me is what a hurdle looks like. The game keeps track of everywhere a player has ever died. I may not be able to watch people play my levels, but looking back at them later and seeing clumps of death markers is very powerful. Those are the places people failed. Did they stop playing after that? Did I intend for those places to be so difficult?

Doom

Doom is an interesting contrast to Mario Maker. A great many Doom maps have been produced over the past two decades, but nowhere near as many levels as Mario Maker has produced in a couple years. On the other hand, many people who still play Doom have been playing Doom this entire time, so a greater chunk of the community is really good at the game and enjoys a serious challenge.

I’ve only released a couple Doom maps of my own: Throughfare (the one I contributed to DUMP 2 earlier this year) and a few one-hour speedmaps I made earlier this week. I like building in Doom, with its interesting balance of restrictions — it’s a fairly accessible way to build an interesting 3D world, and nothing else is quite like it.

I’ve had the privilege of watching a few people play through my maps live, and I have learned some things.

The first is that the community’s love of difficulty is comically misleading. It’s not wrong, but, well, that community isn’t actually my target audience. So far I’ve “published” maps on this blog and Twitter, where my audience hasn’t necessarily even played Doom in twenty years. If at all! Some of my followers are younger than Doom.

Most notably, this creates something of a distribution problem: to play my maps, you need to install a thing (ZDoom) and kinda figure out how to use it and also get a copy of Doom 2 which probably involves spending five bucks. Less of a hurdle than getting Mario Maker, yes, but still some upfront effort.

Also, ZDoom’s default settings are… not optimal. Out of the box, it’s similar to classic Doom: no WASD, no mouselook. I don’t know who this is meant to appeal to. If you’ve never played Doom, the controls are goofy. If you’ve played other shooters, the controls are goofy. If you played Doom when it came out but not since, you probably don’t remember the controls, so they’re still goofy. Oof.

Not having mouselook is more of a problem than you’d think. If you as the designer play with mouselook, it’s really easy to put important things off the top or bottom of the screen and never realize it’ll be a problem. I watched someone play through Throughfare a few days ago and get completely stuck at what seemed to be a dead end — because he needed to drop down a hole in a small platform, and the hole was completely hidden by the status bar.

That’s actually an interesting example for another reason. Here’s the room where he got stuck.

A small room with a raised platform at the end, a metal section in the floor, and a switch on the side wall

When you press the switch, the metal plates on the ground rise up and become stairs, so you can get onto the platform. He did that, saw nowhere obvious to go, and immediately turned around and backtracked quite a ways looking for some other route.

This surprised me! The room makes no sense as a dead end. It’s not an easter egg or interesting feature; it has no obvious reward; it has a button that appears to help you progress. If I were stuck here, I’d investigate the hell out of this room — yet this player gave up almost immediately.

Not to say that the player is wrong and the level is right. This room was supposed to be trivially simple, and I regret that it became a hurdle for someone. It’s just a difference in playstyle I didn’t account for. Besides the mouselook problem, this player tended to move very quickly in general, charging straight ahead in new areas without so much as looking around; I play more slowly, looking around for nooks and crannies. He ended up missing the plasma gun for much the same reason — it was on a ledge slightly below the default view angle, making it hard to see without mouselook.

Speaking of nooks and crannies: watching someone find or miss secrets in a world I built is utterly fascinating. I’ve watched several people play Throughfare now, and the secrets are the part I love watching the most. I’ve seen people charge directly into secrets on accident; I’ve seen people run straight to a very clever secret just because they had the same idea I did; I’ve seen people find a secret switch and then not press it. It’s amazing how different just a handful of players have been.

I think the spread of secrets in Throughfare is pretty good, though I slightly regret using the same trick three times; either you get it right away and try it everywhere, or you don’t get it at all and miss out on a lot of goodies. Of course, the whole point of secrets is that not everyone will find them on the first try (or at all), so it’s probably okay to err on the trickier side.


As for the speedmaps, I’ve only watched one person play them live. The biggest hurdle was a room I made that required jumping.

Jumping wasn’t in the original Doom games. People thus don’t really expect to need to jump in Doom maps. Worse, ZDoom doesn’t even have a key bound to jump out of the box, which I only discovered later.

See, when I made the room (very quickly), I was imagining a ZDoom veteran seeing it and immediately thinking, “oh, this is one of those maps where I need to jump”. I’ve heard people say that about other maps before, so it felt like common knowledge. But it’s only common knowledge if you’re part of the community and have run into a few maps that require jumping.

The situation is made all the more complicated by the way ZDoom handles it. Maps can use a ZDoom-specific settings file to explicitly allow or forbid jumping, but the default is to allow it. The stock maps and most third-party vanilla maps won’t have this ZDoom-specific file, so jumping will be allowed, even though they’re not designed for it. Most mappers only use this file at all if they’re making something specifically for ZDoom, in which case they might as well allow jumping anyway. It’s opt-out, but the maps that don’t want it are the ones least likely to use the opt-out, so in practice everyone has to assume jumping isn’t allowed until they see some strong indication otherwise. It’s a mess. Oh, and ZDoom also supports crouching, which is even more obscure.

I probably should’ve thought of all that at the time. In my defense, you know, speedmap.

One other minor thing was that, of course, ZDoom uses the traditional Doom HUD out of the box, and plenty of people play that way on purpose. I’m used to ZDoom’s “alternative” HUD, which not only expands your field of view slightly, but also shows a permanent count of how many secrets are in the level and how many you’ve found. I love that, because it tells me how much secret-hunting I’ll need to do from the beginning… but if you don’t use that HUD (and don’t look at the count on the automap), you won’t even know whether there are secrets or not.


For a third-party example: a recent (well, late 2014) cool release was Going Down, a set of small and devilish maps presented as the floors of a building you’re traversing from the roof downwards. I don’t actually play a lot of Doom, but I liked this concept enough to actually play it, and I enjoyed the clever traps and interwoven architecture.

Then I reached MAP12, Dead End. An appropriate name, because I got stuck here. Permanently stuck. The climax of the map is too many monsters in not enough space, and it’s cleverly rigged to remove the only remaining cover right when you need it. I couldn’t beat it.

That was a year ago. I haven’t seen any of the other 20 maps beyond this point. I’m sure they’re very cool, but I can’t get to them. This one is too high a hurdle.

Granted, hopping around levels is trivially easy in Doom games, but I don’t want to cheat my way through — and anyway, if I can’t beat MAP12, what hope do I have of beating MAP27?

I feel ambivalent about this. The author describes the gameplay as “chaotic evil”, so it is meant to be very hard, and I appreciate the design of the traps… but I’m unable to appreciate any more of them.

This isn’t the author’s fault, anyway; it’s baked into the design of Doom. If you can’t beat one level, you don’t get to see any future levels. In vanilla Doom it was particularly bad: if you die, you restart the level with no weapons or armor, probably making it even harder than it was before. You can save any time, and some modern source ports like ZDoom will autosave when you start a level, but the original game never saved automatically.

Isaac’s Descent

Isaac’s Descent is the little PICO-8 puzzle platformer I made for Ludum Dare 36 a couple months ago. It worked out surprisingly well; pretty much everyone who played it (and commented on it to me) got it, finished it, and enjoyed it. The PICO-8 exports to an HTML player, too, so anyone with a keyboard can play it with no further effort required.

I was really happy with the puzzle design, especially considering I hadn’t really made a puzzle game before and was rushing to make some rooms in a very short span of time. Only two were perhaps unfair. One was the penultimate room, which involved a tricky timing puzzle, so I’m not too bothered about that. The other was this room:

A cavern with two stone slab doors, one much taller than the other, and a wooden wheel on the wall

Using the wheel raises all stone doors in the room. Stone doors open at a constant rate, wait for a fixed time, and then close again. The tricky part with this puzzle is that by the time the very tall door has opened, the short door has already closed again. The solution is simply to use the wheel again right after the short door has closed, while the tall door is still opening. The short door will reopen, while the tall door won’t be affected since it’s already busy.

This isn’t particularly difficult to figure out, but it did catch a few people, and overall it doesn’t sit particularly well with me. Using the wheel while a door is opening feels like a weird edge case, not something that a game would usually rely on, yet I based an entire puzzle around it. I don’t know. I might be overthinking this. The problem might be that “ignore the message” is a very computery thing to do and doesn’t match with how such a wheel would work in practice; perhaps I’d like the puzzle more if the wheel always interrupted whatever a door was doing and forced it to raise again.

Overall, though, the puzzles worked well.

The biggest snags I saw were control issues with the PICO-8 itself. The PICO-8 is a “fantasy console” — effectively an emulator for a console that never existed. One of the consequences of this is that the controls aren’t defined in terms of keyboard keys, but in terms of the PICO-8’s own “controller”. Unfortunately, that controller is only defined indirectly, and the web player doesn’t indicate in any way how it works.

The controller’s main inputs — the only ones a game can actually read — are a directional pad and two buttons, and , which map to z and x on a keyboard. The PICO-8 font has glyphs for and , so I used those to indicate which button does what. Unfortunately, if you aren’t familiar with the PICO-8, those won’t make a lot of sense to you. It’s nice that looks like the keyboard key it’s bound to, but looks like the wrong keyboard key. This caused a little confusion.

Well,” I hear you say, “why not just refer to the keys directly?” Ah, but there’s a very good reason the PICO-8 is defined in terms of buttons: those aren’t the only keys you can use! n and m also work, as do c and v. The PocketCHIP also allows… 0 and =, I think, which is good because z and x are directly under the arrow keys on the PocketCHIP keyboard. And of course you can play on a USB controller, or rebind the keys.

I could’ve mentioned that z and x are the defaults, but that’s wrong for the PocketCHIP, and now I’m looking at a screenful of text explaining buttons that most people won’t read anyway.

A similar problem is the pause menu, accessible with p or enter. I’d put an option on the pause menu for resetting the room you’re in, just in case, but didn’t bother to explain how to get to the pause menu.Or that a pause menu exists. Also, the ability to put custom things on the pause menu is new, so a lot of people might not even know about it. I’m sure you can see this coming: a few rooms (including the two-door one) had places you could get stuck, and without any obvious way to restart the room, a few people thought they had to start the whole game over. Whoops.

In my defense, the web player is actively working against me here: it has a “pause” link below the console, but all the link does is freeze the player, not bring up the pause menu.

This is a recurring problem, and perhaps a fundamental question of making games accessible: how much do you need to explain to people who aren’t familiar with the platform or paradigm? Should every single game explain itself? Players who don’t need the explanation can easily get irritated by it, and that’s a bad way to start a game. The PICO-8 in particular has the extra wrinkle that its cartridge space is very limited, and any kind of explanation/tutorial costs space you could be using for gameplay. On the other hand, I’ve played more than one popular PICO-8 game that was completely opaque to me because it didn’t explain its controls at all.

I’m reminded of Counterfeit Monkey, a very good interactive fiction game that goes out of its way to implement a hint system and a gentle tutorial. The tutorial knits perfectly with the story, and the hints are trivially turned off, so neither is a bother. The game also has a hard mode, which eliminates some of the more obvious solutions and gives a nod to seasoned IF players as well. The author is very interested in making interactive fiction more accessible in general, and it definitely shows. I think this game alone convinced me it’s worth the effort — I’m putting many of the same touches in my own IF foray.

Under Construction

Under Construction is the PICO-8 game that Mel and I made early this year. It’s a simple, slightly surreal, slightly obtuse platformer.

Traditional wisdom has it that you don’t want games to be obtuse. That acts as a hurdle, and loses you players. Here, though, it’s part of the experience, so the question becomes how to strike a good balance without losing the impact.

A valid complaint we heard was that the use of color is slightly inconsistent in places. For the most part, foreground objects (those you can stand on) are light and background decorations are gray, but a couple tiles break that pattern. A related problem that came up almost immediately in beta testing was that spikes were difficult to pick out. I addressed that — fairly effectively, I think — by adding a single dark red pixel to the tip of the spikes.

But the most common hurdle by far was act 3, which caught us completely by surprise. Spoilers!

From the very beginning, the world contains a lot of pillars containing eyeballs that look at you. They don’t otherwise do anything, beyond act as platforms you can stand on.

In act 2, a number of little radios appear throughout the world. Mr. 5 complains that it’s very noisy, so you need to break all the radios by jumping on them.

In act 3, the world seems largely the same… but the eyes in the pillars now turn to ❌’s when you touch them. If this happens before you make it to the end, Mr. 5 complains that he’s in pain, and the act restarts.

The correct solution is to avoid touching any of the eye pillars. But because this comes immediately after act 2, where we taught the player to jump on things to defeat them — reinforcing a very common platforming mechanic — some players thought you were supposed to jump on all of them.

I don’t know how we could’ve seen that coming. The acts were implemented one at a time and not in the order they appear in the game, so we were both pretty used to every individual mechanic before we started playing through the entire game at once. I suppose when a game is developed and tested in pieces (as most games are), the order and connection between those pieces is a weak point and needs some extra consideration.

We didn’t change the game to address this, but the manual contains a strong hint.

Under Construction also contains a couple of easter eggs and different endings. All are fairly minor changes, but they added a lot of character to the game and gave its fans something else to delve into once they’d beaten it.

Crucially, these things worked as well as they did because they weren’t accessible. Easily-accessed easter eggs aren’t really easter eggs any more, after all. I don’t think the game has any explicit indication that the ending can vary, which meant that players would only find out about it from us or other fans.

I don’t yet know the right answer for balancing these kinds of extras, and perhaps there isn’t one. If you spend a lot of time on easter eggs, multiple endings, or even just multiple paths through the game, you’re putting a lot of effort into stuff that many players will never see. On the other hand, they add an incredible amount of depth and charm to a game and reward those players who do stick around to explore.

This is a lot like the balancing act with software interfaces. You want your thing to be accessible in the sense that a newcomer can sit down and get useful work done, but you also want to reward long-time users with shortcuts and more advanced features. You don’t want to hide advanced features too much, but you also don’t want to have an interface with a thousand buttons.

How larger and better-known games deal with this

I don’t have the patience for Zelda I. I never even tried it until I got it for free on my 3DS, as part of a pack of Virtual Console games given to everyone who bought a 3DS early. I gave it a shot, but I got bored really quickly. The overworld was probably the most frustrating part: the connections between places are weird, everything looks pretty much the same, the map is not very helpful, and very little acts as a landmark. I could’ve drawn my own map, but, well, I usually can’t be bothered to do that for games.

I contrast this with Skyward Sword, which I mostly enjoyed. Ironically, one of my complaints is that it doesn’t quite have an overworld. It almost does, but they stopped most of the way, leaving us with three large chunks of world and a completely-open sky area reminiscent of Wind Waker’s ocean.

Clearly, something about huge open spaces with no barriers whatsoever appeals to the Zelda team. I have to wonder if they’re trying to avoid situations like my experience with Zelda I. If a player gets lost in an expansive overworld, either they’ll figure out where to go eventually, or they’ll give up and never see the rest of the game. Losing players that way, especially in a story-driven game, is a huge shame.

And this is kind of a problem with the medium in general. For all the lip service paid to nonlinearity and sandboxes, the vast majority of games require some core progression that’s purely linear. You may be able to wander around a huge overworld, but you still must complete these dungeons and quests in this specific order. If something prevents you from doing one of them, you won’t be able to experience the others. You have to do all of the first x parts of the game before you can see part x + 1.

This is really weird! No other media is like this. If you watch a movie or read a book or listen to a song and some part of it is inaccessible for whatever reason — the plot is poorly explained, a joke goes over your head, the lyrics are mumbled — you can still keep going and experience the rest. The stuff that comes later might even help you make sense of the part you didn’t get.

In games, these little bumps in the road can become walls.

It’s not even necessarily difficulty, or getting lost, or whatever. A lot of mobile puzzle games use the same kind of artificial progression where you can only do puzzles in sequential batches; solving enough of the available puzzles will unlock the next batch. But in the interest of padding out the length, many of these games will have dozens of trivially easy and nearly identical puzzles in the beginning, which you have to solve to get to the later interesting ones. Sometimes I’ve gotten so bored by this that I’ve given up on a game before reaching the interesting puzzles.

In a way, that’s the same problem as getting lost in an overworld. Getting lost isn’t a hard wall, after all — you can always do an exhaustive search and talk to every NPC twice. But that takes time, and it’s not fun, much like the batches of required baby puzzles. People generally don’t like playing games that waste their time.

I love the Picross “e” series on the 3DS, because over time they’ve largely figured out that this is pointless: in the latest game in the series, everything is available from the beginning. Want to do easy puzzles? Do easy puzzles. Want to skip right to the hard stuff? Sure, do that. Don’t like being told when you made a wrong move? Turn it off.

(It’s kinda funny that the same people then made Pokémon Picross, which has some of the most absurd progression I’ve ever seen. Progressing beyond the first half-dozen puzzles requires spending weeks doing a boring minigame every day to grind enough pseudocurrency to unlock more puzzles. Or you can just pay for pseudocurrency, and you’ll have unlocked pretty much the whole game instantly. It might as well just be a demo; the non-paid progression is useless.)

Chip’s Challenge also handled this pretty well. You couldn’t skip around between levels arbitrarily, which was somewhat justified by the (very light) plot. Instead, if you died or restarted enough times, the game would offer to skip you to the next level, and that would be that. You weren’t denied the rest of the game just because you couldn’t figure out an ice maze or complete some horrible nightmare like Blobnet.

I wish this sort of mechanic were more common. Not so games could be more difficult, but so games wouldn’t have to worry as much about erring on the side of ease. I don’t know how it could work for a story-driven game where much of the story is told via experiencing the game itself, though — skipping parts of Portal would work poorly. On the other hand, Portal took the very clever step of offering “advanced” versions of several levels, which were altered very slightly to break all the obvious easy solutions.

Slapping on difficulty settings is nice for non-puzzle games (and even some puzzle games), but unless your game lets you change the difficulty partway through, someone who hits a wall still has to replay the entire game to change the difficulty. (Props to Doom 4, which looks to have taken difficulty levels very seriously — some have entirely different rules, and you can change whenever you want.)

I have a few wisps of ideas for how to deal with this in Isaac HD, but I can’t really talk about them before the design of the game has solidified a little more. Ultimately, my goal is the same as with everything else I do: to make something that people have a chance to enjoy, even if they don’t otherwise like the genre.

Fixing the IoT isn’t going to be easy

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/45098.html

A large part of the internet became inaccessible today after a botnet made up of IP cameras and digital video recorders was used to DoS a major DNS provider. This highlighted a bunch of things including how maybe having all your DNS handled by a single provider is not the best of plans, but in the long run there’s no real amount of diversification that can fix this – malicious actors have control of a sufficiently large number of hosts that they could easily take out multiple providers simultaneously.

To fix this properly we need to get rid of the compromised systems. The question is how. Many of these devices are sold by resellers who have no resources to handle any kind of recall. The manufacturer may not have any kind of legal presence in many of the countries where their products are sold. There’s no way anybody can compel a recall, and even if they could it probably wouldn’t help. If I’ve paid a contractor to install a security camera in my office, and if I get a notification that my camera is being used to take down Twitter, what do I do? Pay someone to come and take the camera down again, wait for a fixed one and pay to get that put up? That’s probably not going to happen. As long as the device carries on working, many users are going to ignore any voluntary request.

We’re left with more aggressive remedies. If ISPs threaten to cut off customers who host compromised devices, we might get somewhere. But, inevitably, a number of small businesses and unskilled users will get cut off. Probably a large number. The economic damage is still going to be significant. And it doesn’t necessarily help that much – if the US were to compel ISPs to do this, but nobody else did, public outcry would be massive, the botnet would not be much smaller and the attacks would continue. Do we start cutting off countries that fail to police their internet?

Ok, so maybe we just chalk this one up as a loss and have everyone build out enough infrastructure that we’re able to withstand attacks from this botnet and take steps to ensure that nobody is ever able to build a bigger one. To do that, we’d need to ensure that all IoT devices are secure, all the time. So, uh, how do we do that?

These devices had trivial vulnerabilities in the form of hardcoded passwords and open telnet. It wouldn’t take terribly strong skills to identify this at import time and block a shipment, so the “obvious” answer is to set up forces in customs who do a security analysis of each device. We’ll ignore the fact that this would be a pretty huge set of people to keep up with the sheer quantity of crap being developed and skip straight to the explanation for why this wouldn’t work.

Yeah, sure, this vulnerability was obvious. But what about the product from a well-known vendor that included a debug app listening on a high numbered UDP port that accepted a packet of the form “BackdoorPacketCmdLine_Req” and then executed the rest of the payload as root? A portscan’s not going to show that up[1]. Finding this kind of thing involves pulling the device apart, dumping the firmware and reverse engineering the binaries. It typically takes me about a day to do that. Amazon has over 30,000 listings that match “IP camera” right now, so you’re going to need 99 more of me and a year just to examine the cameras. And that’s assuming nobody ships any new ones.

Even that’s insufficient. Ok, with luck we’ve identified all the cases where the vendor has left an explicit backdoor in the code[2]. But these devices are still running software that’s going to be full of bugs and which is almost certainly still vulnerable to at least half a dozen buffer overflows[3]. Who’s going to audit that? All it takes is one attacker to find one flaw in one popular device line, and that’s another botnet built.

If we can’t stop the vulnerabilities getting into people’s homes in the first place, can we at least fix them afterwards? From an economic perspective, demanding that vendors ship security updates whenever a vulnerability is discovered no matter how old the device is is just not going to work. Many of these vendors are small enough that it’d be more cost effective for them to simply fold the company and reopen under a new name than it would be to put the engineering work into fixing a decade old codebase. And how does this actually help? So far the attackers building these networks haven’t been terribly competent. The first thing a competent attacker would do would be to silently disable the firmware update mechanism.

We can’t easily fix the already broken devices, we can’t easily stop more broken devices from being shipped and we can’t easily guarantee that we can fix future devices that end up broken. The only solution I see working at all is to require ISPs to cut people off, and that’s going to involve a great deal of pain. The harsh reality is that this is almost certainly just the tip of the iceberg, and things are going to get much worse before they get any better.

Right. I’m off to portscan another smart socket.

[1] UDP connection refused messages are typically ratelimited to one per second, so it’ll take almost a day to do a full UDP portscan, and even then you have no idea what the service actually does.

[2] It’s worth noting that this is usually leftover test or debug code, not an overtly malicious act. Vendors should have processes in place to ensure that this isn’t left in release builds, but ha well.

[3] My vacuum cleaner crashes if I send certain malformed HTTP requests to the local API endpoint, which isn’t a good sign

comment count unavailable comments

Fixing the IoT isn’t going to be easy

Post Syndicated from Matthew Garrett original http://mjg59.dreamwidth.org/45098.html

A large part of the internet became inaccessible today after a botnet made up of IP cameras and digital video recorders was used to DoS a major DNS provider. This highlighted a bunch of things including how maybe having all your DNS handled by a single provider is not the best of plans, but in the long run there’s no real amount of diversification that can fix this – malicious actors have control of a sufficiently large number of hosts that they could easily take out multiple providers simultaneously.

To fix this properly we need to get rid of the compromised systems. The question is how. Many of these devices are sold by resellers who have no resources to handle any kind of recall. The manufacturer may not have any kind of legal presence in many of the countries where their products are sold. There’s no way anybody can compel a recall, and even if they could it probably wouldn’t help. If I’ve paid a contractor to install a security camera in my office, and if I get a notification that my camera is being used to take down Twitter, what do I do? Pay someone to come and take the camera down again, wait for a fixed one and pay to get that put up? That’s probably not going to happen. As long as the device carries on working, many users are going to ignore any voluntary request.

We’re left with more aggressive remedies. If ISPs threaten to cut off customers who host compromised devices, we might get somewhere. But, inevitably, a number of small businesses and unskilled users will get cut off. Probably a large number. The economic damage is still going to be significant. And it doesn’t necessarily help that much – if the US were to compel ISPs to do this, but nobody else did, public outcry would be massive, the botnet would not be much smaller and the attacks would continue. Do we start cutting off countries that fail to police their internet?

Ok, so maybe we just chalk this one up as a loss and have everyone build out enough infrastructure that we’re able to withstand attacks from this botnet and take steps to ensure that nobody is ever able to build a bigger one. To do that, we’d need to ensure that all IoT devices are secure, all the time. So, uh, how do we do that?

These devices had trivial vulnerabilities in the form of hardcoded passwords and open telnet. It wouldn’t take terribly strong skills to identify this at import time and block a shipment, so the “obvious” answer is to set up forces in customs who do a security analysis of each device. We’ll ignore the fact that this would be a pretty huge set of people to keep up with the sheer quantity of crap being developed and skip straight to the explanation for why this wouldn’t work.

Yeah, sure, this vulnerability was obvious. But what about the product from a well-known vendor that included a debug app listening on a high numbered UDP port that accepted a packet of the form “BackdoorPacketCmdLine_Req” and then executed the rest of the payload as root? A portscan’s not going to show that up[1]. Finding this kind of thing involves pulling the device apart, dumping the firmware and reverse engineering the binaries. It typically takes me about a day to do that. Amazon has over 30,000 listings that match “IP camera” right now, so you’re going to need 99 more of me and a year just to examine the cameras. And that’s assuming nobody ships any new ones.

Even that’s insufficient. Ok, with luck we’ve identified all the cases where the vendor has left an explicit backdoor in the code[2]. But these devices are still running software that’s going to be full of bugs and which is almost certainly still vulnerable to at least half a dozen buffer overflows[3]. Who’s going to audit that? All it takes is one attacker to find one flaw in one popular device line, and that’s another botnet built.

If we can’t stop the vulnerabilities getting into people’s homes in the first place, can we at least fix them afterwards? From an economic perspective, demanding that vendors ship security updates whenever a vulnerability is discovered no matter how old the device is is just not going to work. Many of these vendors are small enough that it’d be more cost effective for them to simply fold the company and reopen under a new name than it would be to put the engineering work into fixing a decade old codebase. And how does this actually help? So far the attackers building these networks haven’t been terribly competent. The first thing a competent attacker would do would be to silently disable the firmware update mechanism.

We can’t easily fix the already broken devices, we can’t easily stop more broken devices from being shipped and we can’t easily guarantee that we can fix future devices that end up broken. The only solution I see working at all is to require ISPs to cut people off, and that’s going to involve a great deal of pain. The harsh reality is that this is almost certainly just the tip of the iceberg, and things are going to get much worse before they get any better.

Right. I’m off to portscan another smart socket.

[1] UDP connection refused messages are typically ratelimited to one per second, so it’ll take almost a day to do a full UDP portscan, and even then you have no idea what the service actually does.

[2] It’s worth noting that this is usually leftover test or debug code, not an overtly malicious act. Vendors should have processes in place to ensure that this isn’t left in release builds, but ha well.

[3] My vacuum cleaner crashes if I send certain malformed HTTP requests to the local API endpoint, which isn’t a good sign

comment count unavailable comments