Tag Archives: software

Fear of COMITment

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8375

I shipped the first release of another retro-language revival today: COMIT. Dating from 1957 (coincidentally the year I was born) this was the first string-processing language, ancestral to SNOBOL and sed and ed and Unix shell. One of the notational conventions invented in COMIT, the use of $0, $1…etc. as substitution variables, survives in all these languages.

I actually wrote most of the interpreter three years ago, when a copy of the COMIT book fell into my hands (I think A&D regular Daniel Franke was responsible for that). It wasn’t difficult – 400-odd lines of Python, barely enough to make me break a sweat. That is, until I hit the parts when the book’s description of the language is vague and inadequate.

It was 1957 and nobody knew hardly anything about how to describe computer language systematically, so I can’t fault Dr. Victor Yngve too harshly. Where I came a particular cropper was trying to understand the intended relationship between indices into the workspace buffer and “right-half relative constituent numbers”. That defeated me, so I went off and did other things.

Over the last couple days, as part of my effort to promote my Patreon feed to a level where my medical expenses are less seriously threatening, I’ve been rebuilding all my project pages to include a Patreon button and an up-to-date list of Bronze and Institutional patrons. While doing this I tripped over the unshipped COMIT code and pondered what to do with it.

What I decided to do was ship it with an 0.1 version number as is. The alternative would have been to choose from several different possible interpretations of the language book and quite possibly get it wrong.

I think a good rule in this kind of situation is “First, do no harm”. I’d rather ship an incomplete implementation that can be verified by eyeball, and that’s what I’ve done – I was able to extract a pretty good set of regression tests for most of the features from the language book.

If someone else cares enough, some really obsessive forensics on the documentation and its code examples might yield enough certainty about the author’s intentions to support a full reconstruction. Alas, we can’t ask him for help, as he died in 2012.

A lot of the value in this revival is putting the language documentation and a code chrestomathy in a form that’s easy to find and read, anyway. Artifacts like COMIT are interesting to study, but actually using it for anything would be perverse.

The dangerous folly of “Software as a Service”

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8338

Comes the word that Saleforce.com has announced a ban on its customers selling “military-style rifles”.

The reason this ban has teeth is that the company provides “software as a service”; that is, the software you run is a client for servers that the provider owns and operates. If the provider decides it doesn’t want your business, you probably have no real recourse. OK, you could sue for tortious interference in business relationships, but that’s chancy and anyway you didn’t want to be in a lawsuit, you wanted to conduct your business.

This is why “software as a service” is dangerous folly, even worse than old-fashioned proprietary software at saddling you with a strategic business risk. You don’t own the software, the software owns you.

It’s 2019 and I feel like I shouldn’t have to restate the obvious, but if you want to keep control of your business the software you rely on needs to be open-source. All of it. All of it. And you can’t afford it to be tethered to a service provider even if the software itself is nominally open source.

Otherwise, how do you know some political fanatic isn’t going to decide your product is unclean and chop you off at the knees? It’s rifles today, it’ll be anything that can be tagged “hateful” tomorrow – and you won’t be at the table when the victim-studies majors are defining “hate”. Even if you think you’re their ally, you can’t count on escaping the next turn of the purity spiral.

And that’s disregarding all the more mundane risks that come from the fact that your vendor’s business objectives aren’t the same as yours. This is ground I covered twenty years ago, do I really have to put on the Mr. Famous Guy cape and do the rubber-chicken circuit again? Sigh…

Business leaders should to fear every piece of proprietary software and “service” as the dangerous addictions they are. If Salesforce.com’s arrogant diktat teaches that lesson, it will have been a service indeed.

Contributor agreements considered harmful

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8287

Yesterday I got email from a project asking me to wear my tribal-elder hat, looking for advice on how to re-invent its governance structure. I’m not going to name the project because they haven’t given me permission to air their problems in public, but I need to write about something that came up during the discussion, when my querent said they were thinking about requiring a contributor release form from people submitting code, “the way Apache does”.

“Don’t do it!” I said. Please don’t go the release-form route. It’s bad for the open-source community’s future every time someone does that. In the rest of this post I’ll explain why.

Every time a project says “we need you to sign a release before we’ll take your code”, it helps create a presumption that such releases are necessary – as opposed to the opposite theory, which is that the act of donating code to an open-source project constitutes in itself a voluntary cession of the project’s right to use it under terms implied by the open-source license of the project.

Obviously one of those theories is better for open source – no prize for guessing which.

Here is the language NTPsec uses in its hacking guide:

By submitting patches to this project, you agree to allow them to be redistributed under the project’s license according to the normal forms and usages of the open-source community.

There is as much legal ground for the cession theory of contribution as there is for any idea that contributor releases are required for some nebulous kind of legal safety. There’s no governing statute and no case law on this; no dispute over an attempt to revoke a contribution has yet been adjudicated.

And here’s the ironic part: if it ever comes to a court case, one of the first things the judge is going to look at is community expectations and practice around our licenses. A jurist is supposed to do this in contract and license cases; there’s some famous case law about the interpretation of handshake contracts among Hasidic Jewish diamond merchants in New York City that makes this very clear and explicit. Where there is doubt about interpretation and no overriding problem of of equity, the norms of the community within which the license/contract was arrived at should govern.

So, if the judge thinks that we expect contribution permissions to fail closed unless explicitly granted, he/she is more likely to make that happen. On the other hand, if he/she thinks that community norms treat contribution as an implied cession of certain rights in exchange for the benefits of participating in the project, that is almost certainly how the ruling will come out.

I say, therefore, that Apache and the FSF and the Golang project and everybody else requiring contributor releases are wrong. Because there is no governing law on the effect of these release forms, they are not actually protection against any risk, just a sort of ritual fundament-covering that a case of first impression could toss out in a heartbeat. Furthermore, the way they’ve gone wrong is dangerous; this ritual fundament-covering could someday bring about the very harm it was intended to prevent.

If your project has a contributor release, do our whole community a favor and scrap it. Any lawyer who tells you such a thing is necessary is talking out his ass – he doesn’t know that, and at the present state of the law he can’t know it.

(My wife Cathy, the attorney, concurs. So this advice isn’t just a layperson vaporing in ignorance.)

Instead, post a contract of adhesion on your website or in your guide for contributors. Use my language, or edit to taste. The one thing you should be sure stays in is some language equivalent to this part: “according to the normal forms and usages of the open-source community”.

That is important because, if it ever comes to a court case, we want to be able to point the judge at that as a clue that there are normal forms and usages and he/she can do what he’s supposed to and almost certainly wants to do by understanding them and applying them.

Am I really shipper’s only deployment case?

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8281

I released shipper 1.14 just now. It takes advantage of the conventional asciidoc extension – .adoc – that GitHub and GitLab have established, to do a useful little step if it can detect that your project README and NEWS files are asciidoc.

And I wondered, as I usually do when I cut a shipper release: am I really the only user this code has? My other small projects (things like SRC and irkerd) tend to attract user communities that stick with them, but I’ve never seen any sign of that with shipper – no bug reports or RFEs coming in over the transom.

This time, it occurred to me that if I am shipper’s only user, then maybe the typical work practices of the open-source community are rather different than I thought they were. That’s a question worth raising in public, so I’m posting it here to attract some comment.

The problem shipper solves for me is this: I run, by exact count that I checked just now, 52 small projects. By ‘small’ I really mean “not NTPsec or GPSD”; some, like reposurgeon, are relatively large as codebases. What really distinguishes GPSD and NTPsec is their need to have custom websites full of resource pages, blogs, and other impedimenta; a small project, on the other hand, can do with one page existing mainly to carry a description, recent project news, and a place to download the latest release tarball.

About a quarter of these projects are pretty dormant. On the other hand, among the dozen most active of them it’s not unusual for me to ship two or three point releases a week. Without tooling that would be a really annoying amount of hand-work – making distribution tarballs, copying them to my personal website or places like SourceForge that carry tarballs, editing web pages to publish the latest project news, emailing distribution packagers to tell them there’s a new release, posting announcements to IRC channels. Inevitably I’d make errors, for example by forgetting to email people who have asked to be notified when I ship Project X.

That’s the problem shipper solves. A shipperized project contains a control file that gathers all the metadata required to ship a release. Then its makefile (or equivalent) can have a “release” production that runs the project’s build and regression test to make sure it’s in a sane state, and if so proceeds to ship tarballs to their public download sites, template a web page with project news and documentation/download links and ship that to where it needs to live, and finally send out notifications on all required email and IRC channels.

Here’s what a typical control file looks like:

# This is not a real Debian control file, though the syntax is compatible.
# It's project metadata for the shipper tool

Package: shipper

Description: Automated shipping of open-source project releases.
 shipper is a power distribution tool for developers with multiple
 projects who do frequent releases.  It automates the tedious process
 of shipping a software release and (if desired) templating a project
 web page. It can deliver releases in correct form to SourceForge,
 Savannah.

Homepage: http://www.catb.org/~esr/shipper

XBS-IRC-Channel: irc://chat.freenode.net/shipper

XBS-Repository-URL: https://gitlab.com/esr/shipper

XBS-Logo: shipper-logo.png

#XBS-Project-Tags: packaging, distribution

XBS-Validate: make check

The shipper code figures out everything it needs to do from that or from looking at the files in your source tree – for example, it assumes that any file with an “.html” extension that exists in the top level of your project tree at the time it’s run should be shipped to the project website along with the templated index page. About all you need to pass it on each invocation is a release number.

From the user’s point of view, the good thing about a shipperized project is that the templated page puts the project blurb, links to up-to-date documentation, and project news in one place that’s easy to find – no need to go digging for them.

My shipperized projects also have a “refresh” production that just updates the project page. This is useful, for example, when I find a typo in the documentation or blurb and want to correct that without shipping an actual code release. Fix it, run “make refresh”, done.

The only real cost this tool imposes other than making the initial control file is that your NEWS file has to be in a particular format in order for it to extract the latest item: stanzas separated by blank lines, each with a recognizable date on the first line, newest at the top.

Over the years this little tool has saved me from immense amounts of process friction and substantially reduced the defect rate in what I actually ship. I can release more frequently, being more responsive to bugs and RFEs, because doing so isn’t much work. It’s easy to use, it’s well documented, it’s very customizable (you can set up your own web page template, etc). So…why isn’t anybody else using it? Why do I never get bug reports or RFEs the way I so frequently do on my other projects?

The possibility that occurs to me now is that maybe it solves a problem nobody else really has to the same extent. While shipper is useful for eliminating stupid finger errors while shipping even if you have just one project, it really comes into its own if you have a dozen. So maybe people aren’t doing that? Is being a lead on over a dozen projects more of an outlier than I thought?

The obvious alternative hypothesis is that repository-centered forges like GitHub and GitLab have eaten the role small-project pages used to have – your project page is now the README on its forge. I could almost believe that, except that (a) “stable release” is an important distinction for a lot of projects, those tarballs have to live somewhere, (b) forge READMEs also aren’t so good for advertising things like project mailing lists, and (c) because of a and b I see a lot of small-project web pages when I go browsing.

Another hypothesis is that there are lots of people who would find shipper useful, but they don’t know it exists. Also possible – but when I write other things that seem comparably useful, like SRC or irkerd or deheader, they don’t seem to have any trouble being found by their natural constituents. My project issue trackers tell the tale!

Is it really normal for hackers to be working on so few projects at a time, and releasing so infrequently, that they never experience release-process friction as a problem? Perhaps so – but I’m curious what my commenters think.

Declarative is greater than imperative

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8270

Sometimes I’m a helpless victim of my urges.

A while back -very late in 2016 – I started work on a program called loccount. This project originally had two purposes.

One is that I wanted a better, faster replacement for David Wheeler’s sloccount tool, which I was using to collect statistics on the amount of virtuous code shrinkage in NTPsec. David is good people and sloccount is a good idea, but internally it’s a slow and messy pile of kludges – so much so that it seems to have exceed his capacity to maintain, at time of writing in 2019 it hadn’t been updated since 2004. I’d been thinking about writing a better replacement, in Python, for a while.

Then I realized this problem was about perfectly sized to be my learn-Go project. Small enough to be tractable, large enough to not be entirely trivial. And there was the interesting prospect of using channels/goroutines to parallelize the data collection. I got it working well enough for NTP statistics pretty quickly, though I didn’t issue a first public release until a little over a month later (mainly because I wanted to have a good test corpus in place to demonstrate correctness). And the parallelized code was both satisfyingly fast and really pretty. I was quite pleased.

The only problem was that the implementation, having been a deliberately straight translation of sloccount’s algorithms in order to preserve comparability of the reports, was a bit of a grubby pile internally. Less so than sloccount’s because it was all in one language. but still. It’s difficult for me to leave that kind of thing alone; the urge to clean it up becomes like a maddening itch.

The rest of this post is about what happened when I succumbed. I got two lessons from this experience: one reinforcement of a thing I already knew, and one who-would-have-thought-it-could-go-this-far surprise. I also learned some interesting things about the landscape of programming languages.

For those of you who haven’t used sloccount, it counts source lines of code (SLOC) in a code tree. This is lines that are non-blank and not comments. SLOC counts are useful for complexity estimation and predicting defect incidence. SLOC is especially good for tracking how the complexity (and hence, for example, the potential attack surface) of a codebase has changed over time. There are more elaborate ways of making numbers that aim at similar usefuleness; the best known is called “cyclomatic complexity” and there’s another metric called LLOC (logical lines of code) that loccount can also tally for many languages. But it turns out empirically that it’s quite difficult to do better than SLOC for forecasting defect incidence, at least at the present state of our knowledge. More complex measures don’t seem to gain much; there is some evidence that LLOC does but even that is disputable.

I wanted to use loccount to post hard numbers about the continuing SLOC shrinkage of NTPSec. We’ve shrunk it by a little better than 4:1 since we forked it, from 231 KLOC to 55 KLOC, and that’s a big deal – a large reduction in attack surface and an equally large improvement in auditability. People have trouble believing a change that dramatic that unless you can show them numbers and how to reproduce those numbers; I wanted to be able to say “Here’s loccount and its test suite. Here’s our repository. See for yourself.”

But the code was still grubby, with multiple different rather ad-hoc parsers in it inherited from sloccount for different categories of languages; sloccount supports 27 in total. I didn’t like that, and over the next two years I would occasionally return to the code, gradually refactoring and simplifying.

Why parsers? Think about what you have to do to count SLOC. You need to run through a source file recognizing the following events:

  • Start of comment.
  • End of comment
  • Newline (so you can bump the SLOC counter when you’re outside a comment)
  • Start of string literal (so you start ignoring what would otherwise look like a comment start)
  • End of string (you stop ignoring comment starts)

This is complicated by the fact that in any given language there may be multiple syntaxes for start or end comment and start or end string literal. C and most other languages have two styles of comment; /* the block comment, which can contain newlines and has an explicit end delimiter */, versus // the winged comment, which can’t contain newlines and ends at the next one.

The most naive way to handle this simple grammar would be to write a custom parser for each language. Sometimes, when the language syntax is especially grotty you have to do that. There’s still a bespoke parser for Perl inside loccount. But most languages fall into fairly large syntactic families in which all the members can be parsed alike. Sloccount, supporting 27 languages, had at least four of these; C-like, Pascal-like, scripting-language-like. and Fortran-like.

Now, if you’re me, learning that most languages fall into families like that is interesting in itself. But the number of outliers that don’t fit, like Lisp and assembler (which sloccount can tally) and really odd ones like APL (which it can’t) is annoying. I can’t look at code like that without wanting to clean it up, simplify it, fold special cases into general ones, and learn about the problem space as I do so. Victim of my urges, I am.

I’ve written more parsers for messy data than I can remember, so I know the strategies for this. Let’s call the bespoke-parser-for-every-language level 0, and recognizing language families level 1. Sloccount was already at level 1 and loccount inherited that. Level 2 would be reducing the number of distinct parsers, ideally to one. How do we do that?

Let’s consider two specific, important family parsers: the C family (C, C++, Objective-C, Java, C#, Jana, Javascript, Scala) and the Pascal family. The main differences are (1) the block comment syntax is /* */ in C but (* *) in Pascal, (2) Pascal has no winged comments like C //, and (3) C double quotes around string literals vs. Pascal single quotes.

So instead of two parsers with two functions “is there a a /* next?” and “is there a (* next?”, you write one function “is there a block comment start next?” which dispatches to looking for /* or (*. Your “Is there a winged comment start?” file looks for // in a source file with a .c extension and always returns false in a file with a .pas extension.

That’s level 2. You’ve replaced “look for this family-specific syntax” with “look for the start-block-comment terminal” with the token-getter figuring out what to match based on which family the file contents is in. But that doesn’t handle idiosyncratic languages like – say – D, which uses /+ +/. Well, not unless you’re willing to let your start-block-comment matcher be a huge ugly switch statement with an arm not just for each family but for each outlier file extension.

And you’d need parallel big ugly switches in every other terminal – the one for end of block comment, the one for winged comment, the one for string delimiter, et tiresome cetera. Don’t do that; it’s ugly and hard to maintain.

Level 3 is where you change the big switch/case into a traits table. At start of processing you use the file extension to figure out which traits entry to use. You read your start and end block comment from that entry. It may have flags in it which declare, for example “this language uses single-quoted strings”, or “block comments nest in this language”.

This move from executable code to a traits table is really important, because it drastically reduces the complexity cost of adding support for new languages. Usually, now, each new one is just a table entry that can be easily composed from a read of the syntax part of the language specification. Declarative is greater than imperative!

The project NEWS file and repo history tell us how dramatic this effect is. In the first two years of the project’s lifetime I added about 14 languages to the original set of 27, less than one a month. Late last month I got the ontology of the traits table about right after staring long enough at the languages I’d been supporting. I immediately added five languages on one day.

Then I hit a bump because I needed more trait flags to handle the next couple of languages. But as I added the next half-dozen or so (Algol 60, F#, Dart, Cobra, VRML, Modula-2, and Kotlin) I noticed something interesting; the frequency with which I needed to add trait flags was dropping as I went. I was beginning to mine out the space of relevant syntactic variations.

Now I want to draw a contrast with the project I used to define the term Zeno tarpit. On doclifter I can always make more progress towards lifting the last few percent of increasingly weird malformed legacy documents out of the nroff muck if I’m willing to push hard enough, but the cost is that the pain never ends. Each increment of progress normally requires more effort than the last.

In loccount I’m clearly heading in the opposite direction. In the last week (and remember most of my attention is on NTPsec; this is a side project) I’ve added no fewer than 34 new languages and markup formats to loccount. My time to put in a new one is down to minutes – skim the spec, fill in a new table entry, drop a sample into the test directory, eyeball the results to make sure they’re sane, and update the news file.

The contrast with the effort required to get another 0.1% down the long tail of the man-page corpus is really extreme. What makes the difference?

OK, yes, the nroff-lifting problem is more complicated. You’re parsing for many more different kinds of terminals and relations among them. That’s true but it’s not a “true” that generates much insight. No; the real difference is that in that domain, I still have to write a parser extension – executable code – in order to make progress. In loccount, on the other hand, I’m just composing new table entries. Declarative is greater than imperative!

Actually it has become a game; I’ve made a couple of passes through Wikipedia’s List of programming languages looking for plausible candidates. I’ve put in historical relics like SNOBOL4 and B simply because I could. When you’re on a stamp-collecting streak like this, it’s natural to start wondering just how far you can push it. A victim of my urges sometimes, am I.

I do have some limits, though. I haven’t gone after esoteric languages, because those are often deliberately constructed to be so syntactically bizarre that upgrding my parser would be more like work than fun. And I really don’t care about the 17th dialect of Business BASIC or yet another proprietary database language, thank you. Also I have in general not added languages that I judged to be academic toys intended moe to generate research papers rather than production code, though I gave in on a few I thought to be of particular historical interest.

The lesson here is one I know I’ve written about before, but it deserves reinforcing. Declarative is greater than imperative. Simple code plus smart data is better than enough smart code to do the same job. The move from four parsers to one parser and a big trait table was a win on every level – the result is easier to understand and much easier to extend. This is still an underutilized design strategy.

There is a level 4. I may yet do something like what we did with Open Adventure and move all that smart data from a table initializer to a YAML file compiled back into the trait table at build time. Then adding new languages would usually just be an edit to the YAML, not touching the Go code at all.

I think at this point I’ve come pretty close to entirely excavating the space of production languages that might reasonably appear on a Unix system today. Now prove me wrong. Throw a language loccount -s doesn’t already list at me and let’s see if my generic parser and quirks can cope.

Some restrictions: No esolangs, please; nothing with only closed-source implementations; nothing outright extinct; no fad-of-the-week webdev crap; no academic toys only ever touched by the designer’s research group. Exceptions to these restrictions may be made in cases of exceptional historical interest – it amused me to add B and SNOBOL4 and I’d do it again.

Oh, and the current count of languages? 117 The real surprise is that 117 languages was this easy. There is less variation out there than one might suppose.

How not to design a wire protocol

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8254

A wire protocol is a way to pass data structures or aggregates over a serial channel between different computing environments. At the very lowest level of networking there are bit-level wire protocols to pass around data structures called “bytes”; further up the stack streams of bytes are used to serialize more complex things, starting with numbers and working up to aggregates more conventionally thought of as data structures. The one thing you generally cannot successfully pass over a wire is a memory address, so no pointers.

Designing wire protocols is, like other kinds of engineering, an art that responds to cost gradients. It’s often gotten badly wrong, partly because of clumsy technique but mostly because people have poor intuitions about those cost gradients and optimize for the wrong things. In this post I’m going to write about those cost gradients and how they push towards different regions of the protocol design space.

My authority for writing about this is that I’ve implemented endpoints for nearly two dozen widely varying wire protocols, and designed at least one wire protocol that has to be considered widely deployed and successful by about anybody’s standards. That is the JSON profile used by many location-aware applications to communicate with GPSD and thus deployed on a dizzying number of smartphones and other embedded devices.

I’m writing about this now because I’m contemplating two wire-protocol redesigns. One is of NTPv4, the packet format used to exchange timestamps among cooperating time-service programs. The other is an unnamed new protocol in IETF draft, deployed in prototype in NTPsec and intended to be used for key exchange among NTP daemons authenticating to each other.

Here’s how not to do it…

NTPv4 is a really classic example of one extreme in wire protocol design. A base NTP packet is 48 bytes of packed binary blob that looks like this:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |LI | VN  |Mode |    Stratum    |     Poll      |  Precision    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Root Delay                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Root Dispersion                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Reference ID                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                     Reference Timestamp (64)                  +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                      Origin Timestamp (64)                    +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                      Receive Timestamp (64)                   +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                      Transmit Timestamp (64)                  +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The numbers are bit widths. If I showed you an actual packet dump it would be a random-looking blob of characters with no significance at the character level; only the bits matter.

It’s not very relevant to this episode what the detailed semantics of those fields are, though you can make some guesses from the names and probably be right; just think of it as a clock sample being passed around. The only two we’re going to care about here is VN, which is a three-bit protocol version field normally set to 0b100 = 4, and mode – three more bits of packet type. Most of the others are interpreted as binary numbers except for “Reference ID”, which is either an IPv4 address or a 4-character string.

Here’s a GPSD report that exemplifies the opposite extreme in wire-protocol design. This is an actual if somewhat old Time-Position-Velocity packet capture from the GPSD documentation:

{"class":"TPV","time":"2010-04-30T11:48:20.10Z","ept":0.005,
               "lat":46.498204497,"lon":7.568061439,"alt":1327.689,
                "epx":15.319,"epy":17.054,"epv":124.484,"track":10.3797,
                "speed":0.091,"climb":-0.085,"eps":34.11,"mode":3}

Those of you with a web-services background will recognize this as a JSON profile.

You don’t have to guess what the principal fields in this report mean; they have tags that tell you. I’ll end the suspense by telling you that “track” is a course bearing and the fields beginning with “e” are 95%-confidence error bars for some of the others. But again, the detailed field semantics don’t matter much to this episode; what we want to do here is focus on the properties of the GPSD protocol itself and how they contrast with NTPv4.

The most obvious difference is discoverability. Unless you know you’re looking at an NTP packet in advance, seeing the data gives you no clue what it means. On the other hand, a GPSD packet is full of meaning to the naked eye even if you’ve never seen one before, and the rest is pretty transparent once you know what the field tags mean.

Another big difference is bit density. Every bit in an NTPv4 packet is significant; you’re squeezing the information into as short a packet as is even theoretically possible. The GPSD packet, on the other hand, has syntactic framing and tags that tell you about itself, not its payload.

These two qualities are diametrically opposed. The bits you spend on making a wire protocol discoverable are bits you’re not spending on payload. That both extremes exist in the world is a clue: it means there’s no one right way to do things, and the cost gradients around wire protocols differ wildly in different deployments.

Before I get to a direct examination of those cost gradients I’m going to point out a couple of other contrasting properties. One is that the base NTPv4 packet has a fixed length locked in; it’s 48 bytes, it’s never going to be anything but 48 bytes, and the 32- or 64-bit precision of the numeric fields can never change. The GPSD packet embodies the opposite choice; on the one hand it is variable-length as the number of decimal digits in the data items change, on the other hand it is quite easy to see how to ship more precision in the GPSD packet if and when it’s available.

Hardware independence is another important difference. A decimal digit string is a decimal digit string; there’s no real ambiguity about how to interpret it, certainly not if you’ve ever seen a JSON-based protocol before. The binary words in an NTPv4 packet, on the other hand, may need to be byte-swapped to turn into local machine words, and the packet itself does not imply what the right decoding is. You need to have prior knowledge that they’re big-endian…and getting this kind of detail wrong (byte-swapping when you shouldn’t, or failing to when you should) is a really notorious defect attractor.

More generally, these protocols differ greatly in two related qualities; extensibility is one. The other doesn’t have a term of art; it’s whether data encoded in the protocol can mix gracefully with other payload types traveling on the same wire. I’ll call it “sociability”.

(And why does sociability matter? One reason is because the friction cost of poking new holes for new protocols in network firewalls is considerable; it triggers security concerns. This is why so much stuff is multiplexed on HTTP port 80 these days; it isn’t only for convenience with browsers.)

Adding a new field to a JSON datagram, or more generally any other kind of self-describing protocol), is not difficult. Even if you’ve never seen JSON before, it’s pretty easy to see how a new field named (say) “acceleration” with a numeric value would fit in. Having different kinds of datagrams on the wire is also no problem because there’s a class field. GPSD actually ships several other reports besides TPV over the same service port.

It’s trickier to see how to do the analogous things with an NTPv4 packet. It is possible, and I’m now going to walk you through some fairly painful details not because they’re so important in themselves but because they illustrate some systematic problems with packed binary protocols in general. There will be no quiz afterwards and you can forget them once you’ve absorbed the general implications.

In fact NTPv4 has an extension-field mechanism, but it depends on a quirk of the transmission path: NTPv4 packets are UDP datagrams and arrive with a length. This gives you a dodge; if you see a length longer than 48 bytes, you can assume the rest is a sequence of extension fields. Here’s what those look like:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Type field             |      Payload length          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload (variable)                     |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Good luck eyeballing that! It’s simple in concept, but it’s more binary blob. As with the base packet, you need a special tool like wireshark and a detailed spec in front of you just to interpret the type fields, let alone whatever wacky private encodings get invented for the payload parts.

Actually, this last section was partly a fib. Detecting NTPv4 extension fields is tricky because it interacts with a different, older extension – an optional cryptographic signature which can itself have two different lengths:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Key Identifier                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        dgst (128 or 160)                      |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It is possible to work out whether one or both kinds of extension are present by doing some tricky modular arithmetic, but I’ve tortured you enough without getting into exactly how. The thing to take away is that gymnastics are required compared to what it takes to add extensions to a JSON-based protocol, and this isn’t any accident or evidence that NTPv4 is especially ill-designed. This kind of complexity is generic to packed binary protocols, and that has implications we’ll focus in on when we got to cost gradients.

In fact NTPv4 was not badly designed for its time – the Internet protocol design tradition is pretty healthy. I’ve seen (and been forced by standards to implement) much worse. For please-make-it-stop awfulness not much beats, for example, the binary packet protocol used in Marine AIS (Automatic Identification system). One of its packet types, 22 (Channel Management), even has a critical mode bit controlling the interpretation of an address field located after the address field rather than before. That is wrap-a-tire-iron-around-somebody’s-head stupid; it complicates writing a streaming decoder and is certain to attract bugs. By comparison the NTPv4 design is, with all its quirks, quite graceful.

It is also worth noting that we had a narrow escape here. UDP protocols are now unusual, because they have no retransmission guarantees. Under TCP, you don’t get a whole datagram and a length when you read off the network. A TCP equivalent of the NTPv4 packet protocol would either have been fixed at 48 bits no extension forever or have needed to give you some way to compute the expected packet length from data that’s within a minimum-size distance of the start of packet.

JSON evades this whole set of complications by having an unambiguous end delimiter. In general under TCP your packets need to have either that or an early length field. Computing a length from some constellation of mode bits is also available in principle, but it’s asking for trouble. It is…say it with me now…a defect attractor. In fact it took six years after the NTPv4 RFC to issue a correction that clarified the edge cases in combination of crypto-signature and extensions.

What about sociability? The key to it is those version and mode fields. They’re at fixed locations in the packet’s first 32-bit word. We could use them to dispatch among different ways of interpreting everything part those first 8 bits, allowing the field structure and packet length to vary.

NTPv4 does in fact do this. You might actually see two different kinds of packet structure on the wire. The diagram above shows a mode 2 or 3 packet; there’s a mode 6 packet used for control and monitoring that (leaving out an optional authentication trailer) looks like this instead:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |LI | VN  | 6   |R|E|M|  Opcode  |          Sequence            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               Status           |       Association ID         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               Offset           |            Count             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      .                                                               .
      .                        Payload (variable)                     .
      .                                                               .
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The count field tells you the length of the variable part. Self-description!

Two packet structures, seven potential mode values. You might be wondering what happened to the other five – and in fact this illustrates one of the problems with the small fixed-length fields in packed-binary formats. Here’s the relevant table from RFC5905:

                      +-------+--------------------------+
                      | Value | Meaning                  |
                      +-------+--------------------------+
                      | 0     | reserved                 |
                      | 1     | symmetric active         |
                      | 2     | symmetric passive        |
                      | 3     | client                   |
                      | 4     | server                   |
                      | 5     | broadcast                |
                      | 6     | NTP control message      |
                      | 7     | reserved for private use |
                      +-------+--------------------------+

You don’t have to know the detailed meanings of all of these to get that the mode field mixes information about the packet structure with other control bits. In fact values 1 through 5 all have the same structure, mode 6 has the one I just diagrammed, and all bets are off with mode 7.

When you’re optimizing for highest bit tendency – which is what they were doing in 1985 when this protocol was originally designed – the temptation to do this sort of thing is pretty near irresistible. The result, 34 years later, is that all the bits are taken! We can’t hardly get any more multivalent with this field without committing a backward incompatibility – not a really safe thing to do when there are lots of big-iron legacy implementations still out there, pinned in place by certification requirements and sheer bureaucratic inertia.

OK, in theory we could claim mode 0. But I’ve seen several of the decoders out there and I would not warrant that a mode 0 packet won’t ever slip past anyone’s sanity checks to be misinterpreted as something else. On the other hand, decoders do check the version field; they have to, because versions 0 to 3 have existed and there could be ancient time servers out there still speaking them. So the version field gives us a way out; as long as the version field reads 5, 6, or 7, the rest of the packet part that first byte could look like anything we like and can write an RFC for.

I’ve walked you through this maze to illustrate an important point: packed binary formats are extremely brittle under the pressure of changing requirements. They’re unsociable, difficult to extend without defect-attracting trickery, and eventually you run into hard limits due to the fixed field sizes.

NTP has a serious functional problem that stems directly from this. Its timestamps are 64-bit but only half of that is whole seconds; those counters are going to wrap around in 2036, a couple of years before the more widely anticipated Unix-timestamp turnover in 2038. In theory the existing implementations will cope with this smoothly using more clever modular arithmetic. In practice, anybody who knows enough to have gamed out the possible failure scenarios is nervous…and the more we know the more nervous-making it gets.

This is why I’m thinking about NTPv5 now. 2019 is not too soon. Closing the circle, all this would have been avoided if NTP timestamps had looked like “2010-04-30T11:48:20.10Z”, with variable-length integer and decimal parts, from the beginning. So why wasn’t it done that way?

To address that question let’s start by looking at where the advantages in self-describing textual formats vs. packed binary ones stack up. For self-describing: auditability, hardware independence, extensibility, and sociability. For packed binary: highest possible bit density.

A lot of people would adder “faster, simpler decoding” to the list of advantages for packed binary. But this (at least in the “simpler” part) is exactly where peoples’ design intuitions often start to go wrong, and the history of NTPv4 demonstrates why. Packed protocols start out with “simpler”, but they don’t stay that way. In the general case you always end up doing things like tricky modular arithmetic to get around those fixed limits. You always exhaust your mode-bit space eventually. The “faster” advantage does tend to be stable over time, but the “simpler” does not.

(And oh, boy, will you ever have this lesson pounded home to you if, as I did, you spend a decade on GPSD implementing decoders for at least nineteen different GPS wire protocols.)

If you are a typically arrogant software engineer or EE, you may be thinking at this point “But I’m smarter! And I’ve learned from the past! I can design an optimum-bit-density wire-format that avoids these problems!”

And this is my forty years of field experience, with specific and proven domain expertise in wire-protocol design, telling you: This. Never. Happens. The limitations of that style of protocol are inherent, and they are more binding than you understand. You aren’t smart enough to evade them, I’m not, and nobody else is either.

Which brings us back to the question of why NTPv4 was designed the way it was. And when it is still appropriate to design wire protocols in that packed-binary style. Which means that now it’s time to look at cost gradients and deployment environments.

One clue is that the NTP wire protocol was designed decades ago when computing cycles and bits-per-second on the wire were vastly more expensive than they are now. We can put numbers on that. NTP was designed under the cost profile of early ARPANET to operate well with connection speeds not much higher than 50Kbps. Today (2019) average U.S broadband speeds are 64Mps. That’s a factor of 10^3 difference. Over the same period processor speeds have gone up by about 10^3-10^4. There’s room for argument there based on different performance measures, but assuming the low end of that range we’re still looking at about the same cost change as bits on the wire.

Now let me throw an interesting number at you that I hope brings home the implications of that change. A few weeks ago we at NTPsec had an email conversation with a guy who is running time service out of the National Metrology Institute in Germany. This is undoubtedly Deutschland’s most heavily loaded Stratum 1 NTP provider.

We were able to get his requests per-second figure, do a bit of back-of-the-envelope calculation, and work out that the production NTP load on a a national time authority for the most populous nation in Europe (excluding transcontinental Russia) wouldn’t come even close to maxing out my home broadband or even just one of the Raspberry Pi 3s on the windowsill above my desk. With all six of them and only a modest bandwidth increase I could probably come pretty close to servicing the Stratum 2 sites of the entire planet in a pinch, if only because time service demand per head is so much lower outside North America/Europe/Japan.

Now that I have your attention, let’s look at the fundamentals behind this. That 10^3 drop tracks the change in one kind of protocol cost that is basically thermodynamic. How much power do you have to use, what kind of waste heat do you generate, if you throw enough hardware at your application to handle its expected transaction load? Most of what are normally thought of as infrastructure costs (equipping your data center, etc.) are derivative of that thermodynamic cost. And that is the cost you are minimizing with a packed binary format.

In the case of NTP, we’ve just seen that cost is trivial. The reason for this is instructive and not difficult to work out. It’s because NTP transaction loads per user are exceptionally low. This ain’t streaming video, folks – what it takes to keep two machines synchronized is a 48-byte call and a 48-byte response at intervals which (as I look at a live peers display just now) average about 38 minutes.

There’s just a whisper of a nuance of a hint there that just mmmaybe, three decades after it was first deployed, optimizing NTP for bit density on the wire might not be the most productive use of our effort!

Maybe, in another application with 10^3 more transaction volume per user, or with a 10^3 increase in userbase numbers, we’d incur as much thermodynamic cost as landed on a typical NTP server in 1981, and a packed binary format would make the kind of optimization sense it did then. But that was then, this is now, and peoples’ intuitions about this tend to be grossly out of whack. It’s almost as though a lot of software engineers and EEs who really ought to know better are still living in 1981 even when they weren’t born yet.

OK, so what should we be optimizing NTP for instead? Take a moment to think about this before you keep reading, because the answer is really stupid obvious.

We should be designing to minimize the cost of human attention. Like thermodynamic cost, attention cost unifies a lot of things we normally think of as separate line items. Initial development. Test. Debugging. Coping with the long-term downstream defect rate. Configuring working setups. Troubleshooting not-quite-working setups. And – this is where you should start to hear triumphant music – dealing gracefully with changes in requirements.

It is also a relevant fact that the cost of human attention has not dropped by 10^3 along with thermodynamic cost per unit of information output since 1981. To a good first approximation it has held constant. Labor-hours are labor-hours are labor-hours.

Now let’s review where the advantages of discoverable/textual formats are. Auditability. Hardware independence. Sociability. Extensibility. These are all attention-cost minimizers. They’re, very specifically, enablers of forward design. In the future you’ll always need what a Go player calls “aji” (potential to extend and maneuver). Discoverable textual wire protocols are good at that; packed binary protocols are bad at it.

But I’m absolutely not here to propose a cost model under which discoverability is in a simple linear struggle with bit-density that discoverability always wins in the end. That’s what you might think if you notice that the ratio between attention cost and thermodynamic cost keeps shifting to favor discoverability as thermodynamic cost falls while attention cost stays near constant. But there’s a third factor that our NTP estimation has already called out.

That factor is transaction volume. If you pull that low enough, your thermodynamic costs nearly vanish and packed binary formats look obviously idiotic. That’s where we are with NTP service today. Consequently, my design sketch for NTPv5 is a JSON profile.

On the other hand, suppose you’re running a Google-sized data center, the kind that’s so big you need to site it very near cheap power as though it were an aluminum smelter. Power and heat dissipation are your major running costs; it’s all about the thermodynamics, baby.

Even that kind of deployment, NTP service will still be thermodynamically cheap. But there will be lots of other wire protocols in play that have transaction volumes many orders of magnitude higher…and now you know why protocol buffers, which are sure enough packed binary, are a good idea.

The thought I want to leave you all with is this: to design wire protocols well, you need to know what your cost drivers really are, how their relative magnitudes stack up. And – I’m sorry, but this needs emphasizing – I constantly run into engineers (even very bright and capable ones) whose intuitions about this are spectacularly, ludicrously wrong.

You, dear reader, might be one of them. If it surprised you that a credit-card-sized hobby computer could supply Stratum 1 service for a major First-World country, you are one of them. Time to check your assumptions.

I think I know why people get stuck this way. It’s what Frederic Bastiat called a “things seen versus things not seen” problem in economic estimation. We over-focus on metrics we can visualize, measure, and estimate crisply; thermodynamic costs and the benefits of perfect bit density tends to be like that. Attention costs are squishier and more contingent, it’s more difficult to value options, and it’s especially easy to underestimate the attention cost of having to do a major re-engineering job in the future because the design you’re sketching today is too rigid and didn’t leave you the option to avoid a disruption.

One of the deeper meanings of the quote “Premature optimization is the root of all evil” (often misattributed to Donald Knuth but actually by Tony Hoare), is that you should constantly beware of doing that. Nassem Taleb, the “Black Swan” guy, would rightly call it fragilista behavior, over-planner’s arrogance. In the real world, aji usually beats arrogance – not every time, but that’s the way to bet.

Announcing loccount 2.0 – now up to 74 languages

Post Syndicated from Eric Raymond original http://esr.ibiblio.org/?p=8250

I just released the 2.0 version of loccount.

This is a major release with many new features and upgrades. It’s gone well beyond just being a faster, cleaner, bug-fixed port of David A. Wheeler’s sloccount. The count of supported languages is now up to 74 from sloccount’s 30. But the bigger change is that for 33 of those languages the tool can now deliver a statement count (LLOC = Logical Lines Of Code) as well as opposed to a line count (SLOC = Source Lines of Code, ignoring whitespace and comments)

To go with this, the tool can now perform COCOMO II cost and schedule estimation based on LLOC as well as COCOMO I based on SLOC.

The manual page includes the following cautions:

SLOC/LLOC figures should be used with caution. While they do predict project costs and defect incidence reasonably well, they are not appropriate for use as ‘productivity’ measures; good code is often less bulky than bad code. Comparing SLOC across languages is also dubious, as differing languages can have very different complexity per line.

With these qualifications, SLOC/LLOC does have some other uses. It is quite effective for tracking changes in complexity and attack surface as a codebase evolves over time.

That’s how I’ve used it – to track the drastic reduction in NTPsec’s codebase size. It was also my finger exercise in learning Go. For those of you who enjoy reading code, there are a couple of points of interest…

One is how much of the program’s intelligence is in tables rather than executable code. Adding support for a new language is usually as simple as adding a new table entry and a test load. Some of the internal changes in 2.0 relate to enriching the table format – for example, two things you can now declare are that a language uses the C preprocessor and that C-style backslash escapes are implemented in its string constants. I’ve often advocated this kind of programming by declarative specification in my writings; here you can see it in action.

The contrast with sloccount is extreme; sloccount was clever, but it’s a messy pile of scripting-language hacks that makes adding a new language – or, really, changing anything about it at all – rather difficult. This is probably why it hasn’t been updated since 2004.

Another point of interest is the way the program uses Go’s CSP concurrency primitives. It’s a pattern that could generalize to other ways of data-mining file trees. Walk the tree, spawning a thread to gather stats on each file; each thread writes a summary to one single rendezvous channel; the main thread reads summary blocks off the rendezvous channel and aggregates them into a report. There’s no explicit lock or release anywhere, all the synchronization is supplied by the channel’s atomicity guarantees.

That’s pretty simple, but as a readily accessible demonstration of why CSP rocks compared to conventional mailbox-and-mutex synchronization its very simplicity makes it hard to beat.

Ironically, one of the source languages this tool written in Go cannot deliver LLOC reports for is Go itself. That’s because Go doesn’t have an explicit end-of-statement marker; counting statements would in principle require a full parse. Something might be done with the assumption that the input source in in the canonical form that go fmt produces.

Plans for future releases:

* Beat Go into not barfing on ABC comments? (We have a test load)

* Tell Objective-C .m from MUMPS .m with a verifier?

* sloccount handles different asm comment conventions. We should too.

* How to weight EAF: https://dwheeler.com/sloccount/sloccount.html#cocomo

* Check that we handle every extension in sloccount’s list.

* Detect inline assembler in C? https://en.wikipedia.org/wiki/Inline_assembler.

Here’s the set of languages supported: ada algol60 asm autotools awk c c# c++ c-header clojure clojurescript clu cobol cobra csh css d dart eiffel elisp erlang expect f# fortran fortran03 fortran90 fortran95 go haskell icon java javascript kotlin lex lisp lua m4 makefile ml modula2 modula3 mumps oberon objective-c occam pascal perl php php3 php4 php5 php6 php7 pl/1 pop11 prolog python rebol ruby rust sather scheme scons sed shell simula sql swift tcl verilog vhdl vrml waf yacc.

If something you actually use isn’t on there, send me a code sample with the name of the language in it. The code samples should contain a block comment and a winged comment as well as code comprising at least one full statement. “Hello, world” will do.

NEW – Machine Learning algorithms and model packages now available in AWS Marketplace

Post Syndicated from Shaun Ray original https://aws.amazon.com/blogs/aws/new-machine-learning-algorithms-and-model-packages-now-available-in-aws-marketplace/

At AWS, our mission is to put machine learning in the hands of every developer. That’s why in 2017 we launched Amazon SageMaker. Since then it has become one of the fastest growing services in AWS history, used by thousands of customers globally. Customers using Amazon SageMaker can use optimized algorithms offered in Amazon SageMaker, to run fully-managed MXNet, TensorFlow, PyTorch, and Chainer algorithms, or bring their own algorithms and models. When it comes to building their own machine learning model, many customers spend significant time developing algorithms and models that are solutions to problems that have already been solved.

 

Introducing Machine Learning in AWS Marketplace

I am pleased to announce the new Machine Learning category of products offered by AWS Marketplace, which includes over 150+ algorithms and model packages, with more coming every day. AWS Marketplace offers a tailored selection for vertical industries like retail (35 products), media (19 products), manufacturing (17 products), HCLS (15 products), and more. Customers can find solutions to critical use cases like breast cancer prediction, lymphoma classifications, hospital readmissions, loan risk prediction, vehicle recognition, retail localizer, botnet attack detection, automotive telematics, motion detection, demand forecasting, and speech recognition.

Customers can search and browse a list of algorithms and model packages in AWS Marketplace. Once customers have subscribed to a machine learning solution, they can deploy it directly from the SageMaker console, a Jupyter Notebook, the SageMaker SDK, or the AWS CLI. Amazon SageMaker protects buyers data by employing security measures such as static scans, network isolation, and runtime monitoring.

The intellectual property of sellers on the AWS Marketplace is protected by encrypting the algorithms and model package artifacts in transit and at rest, using secure (SSL) connections for communications, and ensuring role based access for deployment of artifacts. AWS provides a secure way for the sellers to monetize their work with a frictionless self-service process to publish their algorithms and model packages.

 

Machine Learning category in Action

Having tried to build my own models in the past, I sure am excited about this feature. After browsing through the available algorithms and model packages from AWS Marketplace, I’ve decided to try the Deep Vision vehicle recognition model, published by Deep Vision AI. This model will allow us to identify the make, model and type of car from a set of uploaded images. You could use this model for insurance claims, online car sales, and vehicle identification in your business.

I continue to subscribe and accept the default options of recommended instance type and region. I read and accept the subscription contract, and I am ready to get started with our model.

My subscription is listed in the Amazon SageMaker console and is ready to use. Deploying the model with Amazon SageMaker is the same as any other model package, I complete the steps in this guide to create and deploy our endpoint.

With our endpoint deployed I can start asking the model questions. In this case I will be using a single image of a car; the model is trained to detect the model, maker, and year information from any angle. First, I will start off with a Volvo XC70 and see what results I get:

Results:

{'result': [{'mmy': {'make': 'Volvo', 'score': 0.97, 'model': 'Xc70', 'year': '2016-2016'}, 'bbox': {'top': 146, 'left': 50, 'right': 1596, 'bottom': 813}, 'View': 'Front Left View'}]}

My model has detected the make, model and year correctly for the supplied image. I was recently on holiday in the UK and stayed with a relative who had a McLaren 570s supercar. The thought that crossed my mind as the gulf-wing doors opened for the first time and I was about to be sitting in the car, was how much it would cost for the insurance excess if things went wrong! Quite apt for our use case today.

Results:

{'result': [{'mmy': {'make': 'Mclaren', 'score': 0.95, 'model': '570S', 'year': '2016-2017'}, 'bbox': {'top': 195, 'left': 126, 'right': 757, 'bottom': 494}, 'View': 'Front Right View'}]}

The score (0.95) measures how confident the model is that the result is right. The range of the score is 0.0 to 1.0. My score is extremely accurate for the McLaren car, with the make, model and year all correct. Impressive results for a relatively rare type of car on the road. I test a few more cars given to me by the launch team who are excitedly looking over my shoulder and now it’s time to wrap up.

Within ten minutes, I have been able to choose a model package, deploy an endpoint and accurately detect the make, model and year of vehicles, with no data scientists, expensive GPU’s for training or writing any code. You can be sure I will be subscribing to a whole lot more of these models from AWS Marketplace throughout re:Invent week and trying to solve other use cases in less than 15 minutes!

Access for the machine learning category in AWS Marketplace can be achieved through the Amazon SageMaker console, or directly through AWS Marketplace itself. Once an algorithm or model has been successfully subscribed to, it is accessible via the console, SDK, and AWS CLI. Algorithms and models from the AWS Marketplace can be deployed just like any other model or algorithm, by selecting the AWS Marketplace option as your package source. Once you have chosen an algorithm or model, you can deploy it to Amazon SageMaker by following this guide.

 

Availability & Pricing

Customers pay a subscription fee for the use of an algorithm or model package and the AWS resource fee. AWS Marketplace provides a consolidated monthly bill for all purchased subscriptions.

At launch, AWS Marketplace for Machine Learning includes algorithms and models from Deep Vision AI Inc, Knowledgent, RocketML, Sensifai, Cloudwick Technologies, Persistent Systems, Modjoul, H2Oai Inc, Figure Eight [Crowdflower], Intel Corporation, AWS Gluon Model Zoos, and more with new sellers being added regularly. If you are interested in selling machine learning algorithms and model packages, please reach out to [email protected]

 

 

NEW – AWS Marketplace makes it easier to govern software procurement with Private Marketplace

Post Syndicated from Shaun Ray original https://aws.amazon.com/blogs/aws/new-aws-marketplace-makes-it-easier-to-govern-software-procurement-with-private-marketplace/

Over six years ago, we launched AWS Marketplace with the ambitious goal of providing users of the cloud with the software applications and infrastructure they needed to run their business. Today, more than 200,000 AWS active customers are using software from AWS Marketplace from categories such as security, data and analytics, log analysis and machine learning. Those customers use over 650 million hours a month of Amazon EC2 for products in AWS Marketplace and have more than 950,000 active software subscriptions. AWS Marketplace offers 35 categories and more than 4,500 software listings from more than 1,400 Independent Software Vendors (ISVs) to help you on your cloud journey, no matter what stage of adoption you are up to.

Customers have told us that they love the flexibility and myriad of options that AWS Marketplace provides. Today, I am excited to announce we are offering even more flexibility for AWS Marketplace with the launch of Private Marketplace from AWS Marketplace.

Private Marketplace is a new feature that enables you to create a custom digital catalog of pre-approved products from AWS Marketplace. As an administrator, you can select products that meet your procurement policies and make them available for your users. You can also further customize Private Marketplace with company branding, such as logo, messaging, and color scheme. All controls for Private Marketplace apply across your entire AWS Organizations entity, and you can define fine-grained controls using Identity and Access Management for roles such as: administrator, subscription manager and end user.

Once you enable Private Marketplace, users within your AWS Organizations redirect to Private Marketplace when they sign into AWS Marketplace. Now, your users can quickly find, buy, and deploy products knowing they are pre-approved.

 

Private Marketplace in Action

To get started we need to be using a master account, if you have a single account, it will automatically be classified as a master account. If you are a member of an AWS Organizations managed account, the master account will need to enable Private Marketplace access. Once done, you can add subscription managers and administrators through AWS Identity and Access Management (IAM) policies.

 

1- My account meets the requirement of being a master, I can proceed to create a Private Marketplace. I click “Create Private Marketplace” and am redirected to the admin page where I can whitelist products from AWS Marketplace. To grant other users access to approve products for listing, I can use AWS Organizations policies to grant the AWSMarketplaceManageSubscriptions role.

2- I select some popular software and operating systems from the list and add them to Private Marketplace. Once selected we can now see our whitelisted products.

3- One thing that I appreciate, and I am sure that the administrators of their organization’s Private Marketplace will, is some customization to bring the style and branding inline with the company. In this case, we can choose the name, logo, color, and description of our Private Marketplace.

4- After a couple of minutes we have our freshly minted Private Marketplace ready to go, there is an explicit step that we need to complete to push our Private Marketplace live. This allows us to create and edit without enabling access to users.

 

5 -For the next part, we will switch to a member account and see what our Private Marketplace looks like.

6- We can see the five pieces of software I whitelisted and our customizations to our Private Marketplace. We can also see that these products are “Approved for Procurement” and can be subscribed to by our end users. Other products are still discoverable by our users, but cannot be subscribed to until an administrator whitelists the product.

 

Conclusion

Users in a Private Marketplace can launch products knowing that all products in their Private Marketplace comply with their company’s procurement policies. When users search for products in Private Marketplace, they can see which products are labeled as “Approved for Procurement” and quickly filter between their company’s catalog and the full catalog of software products in AWS Marketplace.

 

Pricing and Availability

Subscription costs remain the same as all products in AWS Marketplace once consumed. Private Marketplace from AWS Marketplace is available in all commercial regions today.

 

 

 

Flight Sim Company Threatens Reddit Mods Over “Libelous” DRM Posts

Post Syndicated from Andy original https://torrentfreak.com/flight-sim-company-threatens-reddit-mods-over-libellous-drm-posts-180604/

Earlier this year, in an effort to deal with piracy of their products, flight simulator company FlightSimLabs took drastic action by installing malware on customers’ machines.

The story began when a Reddit user reported something unusual in his download of FlightSimLabs’ A320X module. A file – test.exe – was being flagged up as a ‘Chrome Password Dump’ tool, something which rang alarm bells among flight sim fans.

As additional information was made available, the story became even more sensational. After first dodging the issue with carefully worded statements, FlightSimLabs admitted that it had installed a password dumper onto ALL users’ machines – whether they were pirates or not – in an effort to catch a particular software cracker and launch legal action.

It was an incredible story that no doubt did damage to FlightSimLabs’ reputation. But the now the company is at the center of a new storm, again centered around anti-piracy measures and again focused on Reddit.

Just before the weekend, Reddit user /u/walkday reported finding something unusual in his A320X module, the same module that caused the earlier controversy.

“The latest installer of FSLabs’ A320X puts two cmdhost.exe files under ‘system32\’ and ‘SysWOW64\’ of my Windows directory. Despite the name, they don’t open a command-line window,” he reported.

“They’re a part of the authentication because, if you remove them, the A320X won’t get loaded. Does someone here know more about cmdhost.exe? Why does FSLabs give them such a deceptive name and put them in the system folders? I hate them for polluting my system folder unless, of course, it is a dll used by different applications.”

Needless to say, the news that FSLabs were putting files into system folders named to make them look like system files was not well received.

“Hiding something named to resemble Window’s “Console Window Host” process in system folders is a huge red flag,” one user wrote.

“It’s a malware tactic used to deceive users into thinking the executable is a part of the OS, thus being trusted and not deleted. Really dodgy tactic, don’t trust it and don’t trust them,” opined another.

With a disenchanted Reddit userbase simmering away in the background, FSLabs took to Facebook with a statement to quieten down the masses.

“Over the past few hours we have become aware of rumors circulating on social media about the cmdhost file installed by the A320-X and wanted to clear up any confusion or misunderstanding,” the company wrote.

“cmdhost is part of our eSellerate infrastructure – which communicates between the eSellerate server and our product activation interface. It was designed to reduce the number of product activation issues people were having after the FSX release – which have since been resolved.”

The company noted that the file had been checked by all major anti-virus companies and everything had come back clean, which does indeed appear to be the case. Nevertheless, the critical Reddit thread remained, bemoaning the actions of a company which probably should have known better than to irritate fans after February’s debacle. In response, however, FSLabs did just that once again.

In private messages to the moderators of the /r/flightsim sub-Reddit, FSLabs’ Marketing and PR Manager Simon Kelsey suggested that the mods should do something about the thread in question or face possible legal action.

“Just a gentle reminder of Reddit’s obligations as a publisher in order to ensure that any libelous content is taken down as soon as you become aware of it,” Kelsey wrote.

Noting that FSLabs welcomes “robust fair comment and opinion”, Kelsey gave the following advice.

“The ‘cmdhost.exe’ file in question is an entirely above board part of our anti-piracy protection and has been submitted to numerous anti-virus providers in order to verify that it poses no threat. Therefore, ANY suggestion that current or future products pose any threat to users is absolutely false and libelous,” he wrote, adding:

“As we have already outlined in the past, ANY suggestion that any user’s data was compromised during the events of February is entirely false and therefore libelous.”

Noting that FSLabs would “hate for lawyers to have to get involved in this”, Kelsey advised the /r/flightsim mods to ensure that no such claims were allowed to remain on the sub-Reddit.

But after not receiving the response he would’ve liked, Kelsey wrote once again to the mods. He noted that “a number of unsubstantiated and highly defamatory comments” remained online and warned that if something wasn’t done to clean them up, he would have “no option” than to pass the matter to FSLabs’ legal team.

Like the first message, this second effort also failed to have the desired effect. In fact, the moderators’ response was to post an open letter to Kelsey and FSLabs instead.

“We sincerely disagree that you ‘welcome robust fair comment and opinion’, demonstrated by the censorship on your forums and the attempted censorship on our subreddit,” the mods wrote.

“While what you do on your forum is certainly your prerogative, your rules do not extend to Reddit nor the r/flightsim subreddit. Removing content you disagree with is simply not within our purview.”

The letter, which is worth reading in full, refutes Kelsey’s claims and also suggests that critics of FSLabs may have been subjected to Reddit vote manipulation and coordinated efforts to discredit them.

What will happen next is unclear but the matter has now been placed in the hands of Reddit’s administrators who have agreed to deal with Kelsey and FSLabs’ personally.

It’s a little early to say for sure but it seems unlikely that this will end in a net positive for FSLabs, no matter what decision Reddit’s admins take.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

ISP Questions Impartiality of Judges in Copyright Troll Cases

Post Syndicated from Andy original https://torrentfreak.com/isp-questions-impartiality-of-judges-in-copyright-troll-cases-180602/

Following in the footsteps of similar operations around the world, two years ago the copyright trolling movement landed on Swedish shores.

The pattern was a familiar one, with trolls harvesting IP addresses from BitTorrent swarms and tracing them back to Internet service providers. Then, after presenting evidence to a judge, the trolls obtained orders that compelled ISPs to hand over their customers’ details. From there, the trolls demanded cash payments to make supposed lawsuits disappear.

It’s a controversial business model that rarely receives outside praise. Many ISPs have tried to slow down the flood but most eventually grow tired of battling to protect their customers. The same cannot be said of Swedish ISP Bahnhof.

The ISP, which is also a strong defender of privacy, has become known for fighting back against copyright trolls. Indeed, to thwart them at the very first step, the company deletes IP address logs after just 24 hours, which prevents its customers from being targeted.

Bahnhof says that the copyright business appeared “dirty and corrupt” right from the get go, so it now operates Utpressningskollen.se, a web portal where the ISP publishes data on Swedish legal cases in which copyright owners demand customer data from ISPs through the Patent and Market Courts.

Over the past two years, Bahnhof says it has documented 76 cases of which six are still ongoing, 11 have been waived and a majority 59 have been decided in favor of mainly movie companies. Bahnhof says that when it discovered that 59 out of the 76 cases benefited one party, it felt a need to investigate.

In a detailed report compiled by Bahnhof Communicator Carolina Lindahl and sent to TF, the ISP reveals that it examined the individual decision-makers in the cases before the Courts and found five judges with “questionable impartiality.”

“One of the judges, we can call them Judge 1, has closed 12 of the cases, of which two have been waived and the other 10 have benefitted the copyright owner, mostly movie companies,” Lindahl notes.

“Judge 1 apparently has written several articles in the magazine NIR – Nordiskt Immateriellt Rättsskydd (Nordic Intellectual Property Protection) – which is mainly supported by Svenska Föreningen för Upphovsrätt, the Swedish Association for Copyright (SFU).

“SFU is a member-financed group centered around copyright that publishes articles, hands out scholarships, arranges symposiums, etc. On their website they have a public calendar where Judge 1 appears regularly.”

Bahnhof says that the financiers of the SFU are Sveriges Television AB (Sweden’s national public TV broadcaster), Filmproducenternas Rättsförening (a legally-oriented association for filmproducers), BMG Chrysalis Scandinavia (a media giant) and Fackförbundet för Film och Mediabranschen (a union for the movie and media industry).

“This means that Judge 1 is involved in a copyright association sponsored by the film and media industry, while also judging in copyright cases with the film industry as one of the parties,” the ISP says.

Bahnhof’s also has criticism for Judge 2, who participated as an event speaker for the Swedish Association for Copyright, and Judge 3 who has written for the SFU-supported magazine NIR. According to Lindahl, Judge 4 worked for a bureau that is partly owned by a board member of SFU, who also defended media companies in a “high-profile” Swedish piracy case.

That leaves Judge 5, who handled 10 of the copyright troll cases documented by Bahnhof, waiving one and deciding the remaining nine in favor of a movie company plaintiff.

“Judge 5 has been questioned before and even been accused of bias while judging a high-profile piracy case almost ten years ago. The accusations of bias were motivated by the judge’s membership of SFU and the Swedish Association for Intellectual Property Rights (SFIR), an association with several important individuals of the Swedish copyright community as members, who all defend, represent, or sympathize with the media industry,” Lindahl says.

Bahnhof hasn’t named any of the judges nor has it provided additional details on the “high-profile” case. However, anyone who remembers the infamous trial of ‘The Pirate Bay Four’ a decade ago might recall complaints from the defense (1,2,3) that several judges involved in the case were members of pro-copyright groups.

While there were plenty of calls to consider them biased, in May 2010 the Supreme Court ruled otherwise, a fact Bahnhof recognizes.

“Judge 5 was never sentenced for bias by the court, but regardless of the court’s decision this is still a judge who shares values and has personal connections with [the media industry], and as if that weren’t enough, the judge has induced an additional financial aspect by participating in events paid for by said party,” Lindahl writes.

“The judge has parties and interest holders in their personal network, a private engagement in the subject and a financial connection to one party – textbook characteristics of bias which would make anyone suspicious.”

The decision-makers of the Patent and Market Court and their relations.

The ISP notes that all five judges have connections to the media industry in the cases they judge, which isn’t a great starting point for returning “objective and impartial” results. In its summary, however, the ISP is scathing of the overall system, one in which court cases “almost looked rigged” and appear to be decided in favor of the movie company even before reaching court.

In general, however, Bahnhof says that the processes show a lack of individual attention, such as the court blindly accepting questionable IP address evidence supplied by infamous anti-piracy outfit MaverickEye.

“The court never bothers to control the media company’s only evidence (lists generated by MaverickMonitor, which has proven to be an unreliable software), the court documents contain several typos of varying severity, and the same standard texts are reused in several different cases,” the ISP says.

“The court documents show a lack of care and control, something that can easily be taken advantage of by individuals with shady motives. The findings and discoveries of this investigation are strengthened by the pure numbers mentioned in the beginning which clearly show how one party almost always wins.

“If this is caused by bias, cheating, partiality, bribes, political agenda, conspiracy or pure coincidence we can’t say for sure, but the fact that this process has mainly generated money for the film industry, while citizens have been robbed of their personal integrity and legal certainty, indicates what forces lie behind this machinery,” Bahnhof’s Lindahl concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Majority of Canadians Consume Online Content Legally, Survey Finds

Post Syndicated from Andy original https://torrentfreak.com/majority-of-canadians-consume-online-content-legally-survey-finds-180531/

Back in January, a coalition of companies and organizations with ties to the entertainment industries called on local telecoms regulator CRTC to implement a national website blocking regime.

Under the banner of Fairplay Canada, members including Bell, Cineplex, Directors Guild of Canada, Maple Leaf Sports and Entertainment, Movie Theatre Association of Canada, and Rogers Media, spoke of an industry under threat from marauding pirates. But just how serious is this threat?

The results of a new survey commissioned by Innovation Science and Economic Development Canada (ISED) in collaboration with the Department of Canadian Heritage (PCH) aims to shine light on the problem by revealing the online content consumption habits of citizens in the Great White North.

While there are interesting findings for those on both sides of the site-blocking debate, the situation seems somewhat removed from the Armageddon scenario predicted by the entertainment industries.

Carried out among 3,301 Canadians aged 12 years and over, the Kantar TNS study aims to cover copyright infringement in six key content areas – music, movies, TV shows, video games, computer software, and eBooks. Attitudes and behaviors are also touched upon while measuring the effectiveness of Canada’s copyright measures.

General Digital Content Consumption

In its introduction, the report notes that 28 million Canadians used the Internet in the three-month study period to November 27, 2017. Of those, 22 million (80%) consumed digital content. Around 20 million (73%) streamed or accessed content, 16 million (59%) downloaded content, while 8 million (28%) shared content.

Music, TV shows and movies all battled for first place in the consumption ranks, with 48%, 48%, and 46% respectively.

Copyright Infringement

According to the study, the majority of Canadians do things completely by the book. An impressive 74% of media-consuming respondents said that they’d only accessed material from legal sources in the preceding three months.

The remaining 26% admitted to accessing at least one illegal file in the same period. Of those, just 5% said that all of their consumption was from illegal sources, with movies (36%), software (36%), TV shows (34%) and video games (33%) the most likely content to be consumed illegally.

Interestingly, the study found that few demographic factors – such as gender, region, rural and urban, income, employment status and language – play a role in illegal content consumption.

“We found that only age and income varied significantly between consumers who infringed by downloading or streaming/accessing content online illegally and consumers who did not consume infringing content online,” the report reads.

“More specifically, the profile of consumers who downloaded or streamed/accessed infringing content skewed slightly younger and towards individuals with household incomes of $100K+.”

Licensed services much more popular than pirate haunts

It will come as no surprise that Netflix was the most popular service with consumers, with 64% having used it in the past three months. Sites like YouTube and Facebook were a big hit too, visited by 36% and 28% of content consumers respectively.

Overall, 74% of online content consumers use licensed services for content while 42% use social networks. Under a third (31%) use a combination of peer-to-peer (BitTorrent), cyberlocker platforms, or linking sites. Stream-ripping services are used by 9% of content consumers.

“Consumers who reported downloading or streaming/accessing infringing content only are less likely to use licensed services and more likely to use peer-to-peer/cyberlocker/linking sites than other consumers of online content,” the report notes.

Attitudes towards legal consumption & infringing content

In common with similar surveys over the years, the Kantar research looked at the reasons why people consume content from various sources, both legal and otherwise.

Convenience (48%), speed (36%) and quality (34%) were the most-cited reasons for using legal sources. An interesting 33% of respondents said they use legal sites to avoid using illegal sources.

On the illicit front, 54% of those who obtained unauthorized content in the previous three months said they did so due to it being free, with 40% citing convenience and 34% mentioning speed.

Almost six out of ten (58%) said lower costs would encourage them to switch to official sources, with 47% saying they’d move if legal availability was improved.

Canada’s ‘Notice-and-Notice’ warning system

People in Canada who share content on peer-to-peer systems like BitTorrent without permission run the risk of receiving an infringement notice warning them to stop. These are sent by copyright holders via users’ ISPs and the hope is that the shock of receiving a warning will turn consumers back to the straight and narrow.

The study reveals that 10% of online content consumers over the age of 12 have received one of these notices but what kind of effect have they had?

“Respondents reported that receiving such a notice resulted in the following: increased awareness of copyright infringement (38%), taking steps to ensure password protected home networks (27%), a household discussion about copyright infringement (27%), and discontinuing illegal downloading or streaming (24%),” the report notes.

While these are all positives for the entertainment industries, Kantar reports that almost a quarter (24%) of people who receive a notice simply ignore them.

Stream-ripping

Once upon a time, people obtaining music via P2P networks was cited as the music industry’s greatest threat but, with the advent of sites like YouTube, so-called stream-ripping is the latest bogeyman.

According to the study, 11% of Internet users say they’ve used a stream-ripping service. They are most likely to be male (62%) and predominantly 18 to 34 (52%) years of age.

“Among Canadians who have used a service to stream-rip music or entertainment, nearly half (48%) have used stream-ripping sites, one-third have used downloader apps (38%), one-in-seven (14%) have used a stream-ripping plug-in, and one-in-ten (10%) have used stream-ripping software,” the report adds.

Set-Top Boxes and VPNs

Few general piracy studies would be complete in 2018 without touching on set-top devices and Virtual Private Networks and this report doesn’t disappoint.

More than one in five (21%) respondents aged 12+ reported using a VPN, with the main purpose of securing communications and Internet browsing (57%).

A relatively modest 36% said they use a VPN to access free content while 32% said the aim was to access geo-blocked content unavailable in Canada. Just over a quarter (27%) said that accessing content from overseas at a reasonable price was the main motivator.

One in ten (10%) of respondents reported using a set-top box, with 78% stating they use them to access paid-for content. Interestingly, only a small number say they use the devices to infringe.

“A minority use set-top boxes to access other content that is not legal or they are unsure if it is legal (16%), or to access live sports that are not legal or they are unsure if it is legal (11%),” the report notes.

“Individuals who consumed a mix of legal and illegal content online are more likely to use VPN services (42%) or TV set-top boxes (21%) than consumers who only downloaded or streamed/accessed legal content.”

Kantar says that the findings of the report will be used to help policymakers evaluate how Canada’s Copyright Act is coping with a changing market and technological developments.

“This research will provide the necessary information required to further develop copyright policy in Canada, as well as to provide a foundation to assess the effectiveness of the measures to address copyright infringement, should future analysis be undertaken,” it concludes.

The full report can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Hiring a Director of Sales

Post Syndicated from Yev original https://www.backblaze.com/blog/hiring-a-director-of-sales/

Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.

The History of Backblaze from our CEO
In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.

Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.

Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:

  • A brand millions recognize for openness, ease-of-use, and affordability.
  • A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
  • Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
  • A growing, profitable and cash-flow positive company.
  • And last, but most definitely not least: a great sales team.

You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:

  • We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
  • We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
  • We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
  • We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
  • We want someone to manage our Customer Success program.

So that’s a bit about us. What are we looking for in you?

Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.

Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.

Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.

Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.

Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.

Additional Responsibilities needed for this role:

  • Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
  • Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
  • Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
  • Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
  • Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
  • Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
  • This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.

Requirements:

  • 7 – 10+ years of successful sales leadership experience as measured by sales performance against goals.
    Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
  • Background in selling SaaS technologies with a strong track record of success.
  • Strong presentation and communication skills.
  • Must be able to travel occasionally nationwide.
  • BA/BS degree required

Think you want to join us on this adventure?
Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:

  1. How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
  2. What are the goals you would set for yourself in the 3 month and 1-year timeframes?

Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.

Backblaze is an Equal Opportunity Employer.

The post Hiring a Director of Sales appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Randomly generated, thermal-printed comics

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/random-comic-strip-generation-vomit-comic-robot/

Python code creates curious, wordless comic strips at random, spewing them from the thermal printer mouth of a laser-cut body reminiscent of Disney Pixar’s WALL-E: meet the Vomit Comic Robot!

The age of the thermal printer!

Thermal printers allow you to instantly print photos, data, and text using a few lines of code, with no need for ink. More and more makers are using this handy, low-maintenance bit of kit for truly creative projects, from Pierre Muth’s tiny PolaPi-Zero camera to the sound-printing Waves project by Eunice Lee, Matthew Zhang, and Bomani McClendon (and our own Secret Santa Babbage).

Vomiting robots

Interaction designer and developer Cadin Batrack, whose background is in game design and interactivity, has built the Vomit Comic Robot, which creates “one-of-a-kind comics on demand by processing hand-drawn images through a custom software algorithm.”

The robot is made up of a Raspberry Pi 3, a USB thermal printer, and a handful of LEDs.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

At the press of a button, Processing code selects one of a set of Cadin’s hand-drawn empty comic grids and then randomly picks images from a library to fill in the gaps.

Vomit Comic Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

Each image is associated with data that allows the code to fit it correctly into the available panels. Cadin says about the concept behing his build:

Although images are selected and placed randomly, the comic panel format suggests relationships between elements. Our minds create a story where there is none in an attempt to explain visuals created by a non-intelligent machine.

The Raspberry Pi saves the final image as a high-resolution PNG file (so that Cadin can sell prints on thick paper via Etsy), and a Python script sends it to be vomited up by the thermal printer.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

For more about the Vomit Comic Robot, check out Cadin’s blog. If you want to recreate it, you can find the info you need in the Imgur album he has put together.

We ❤ cute robots

We have a soft spot for cute robots here at Pi Towers, and of course we make no exception for the Vomit Comic Robot. If, like us, you’re a fan of adorable bots, check out Mira, the tiny interactive robot by Alonso Martinez, and Peeqo, the GIF bot by Abhishek Singh.

Mira Alfonso Martinez Raspberry Pi

The post Randomly generated, thermal-printed comics appeared first on Raspberry Pi.

Hong Kong Customs Arrest Pirate Streaming Device Vendors

Post Syndicated from Andy original https://torrentfreak.com/hong-kong-customs-arrest-pirate-streaming-device-vendors-180529/

As Internet-capable set-top boxes pour into homes across all populated continents, authorities seem almost powerless to come up with a significant response to the growing threat.

In standard form these devices, which are often Android-based, are entirely legal. However, when configured with specialist software they become piracy powerhouses providing access to all content imaginable, often at copyright holders’ expense.

A large proportion of these devices come from Asia, China in particular, but it’s relatively rare to hear of enforcement action in that part of the world. That changed this week with an announcement from Hong Kong customs detailing a series of raids in the areas of Sham Shui Po and Wan Chai.

After conducting an in-depth investigation with the assistance of copyright holders, on May 25 and 26 Customs and Excise officers launched Operation Trojan Horse, carrying out a series of raids on four premises selling suspected piracy-configured set-top boxes.

During the operation, officers arrested seven men and one woman aged between 18 and 45. Four of them were shop owners and the other four were salespeople. Around 354 suspected ‘pirate’ boxes were seized with an estimated market value of HK$320,000 (US$40,700).

“In the past few months, the department has stepped up inspections of hotspots for TV set-top boxes,” a statement from authorities reads.

“We have discovered that some shops have sold suspected illegal set-top boxes that bypass the copyright protection measures imposed by copyright holders of pay television programs allowing people to watch pay television programs for free.”

Some of the devices seized by Hong Kong Customs

During a press conference yesterday, a representative from the Customs Copyright and Trademark Investigations (Action) Division said that in the run up to the World Cup in 2018, measures against copyright infringement will be strengthened both on and online.

The announcement was welcomed by the Cable and Satellite Broadcasting Association of Asia’s (CASBAA) Coalition Against Piracy, which is back by industry heavyweights including Disney, Fox, HBO Asia, NBCUniversal, Premier League, Turner Asia-Pacific, A&E Networks, Astro, BBC Worldwide, National Basketball Association, TV5MONDE, Viacom International, and others.

“We commend the great work of Hong Kong Customs in clamping down on syndicates who profit from the sale of Illicit Streaming Devices,” said General Manager Neil Gane.

“The prevalence of ISDs in Hong Kong and across South East Asia is staggering. The criminals who sell ISDs, as well as those who operate the ISD networks and pirate websites, are profiting from the hard work of talented creators, seriously damaging the legitimate content ecosystem as well as exposing consumers to dangerous malware.”

Malware warnings are very prevalent these days but it’s not something the majority of set-top box owners have a problem with. Indeed, a study carried by Sycamore Research found that pirates aren’t easily deterred by such warnings.

Nevertheless, there are definite risks for individuals selling devices when they’re configured for piracy.

Recent cases, particularly in the UK, have shown that hefty jail sentences can hit offenders while over in the United States (1,2,3), lawsuits filed by the Alliance for Creativity and Entertainment (ACE) have the potential to end in unfavorable rulings for multiple defendants.

Although rarely reported, offenders in Hong Kong also face stiff sentences for this kind of infringement including large fines and custodial sentences of up to four years.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Getting Rid of Your Mac? Here’s How to Securely Erase a Hard Drive or SSD

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-to-wipe-a-mac-hard-drive/

erasing a hard drive and a solid state drive

What do I do with a Mac that still has personal data on it? Do I take out the disk drive and smash it? Do I sweep it with a really strong magnet? Is there a difference in how I handle a hard drive (HDD) versus a solid-state drive (SSD)? Well, taking a sledgehammer or projectile weapon to your old machine is certainly one way to make the data irretrievable, and it can be enormously cathartic as long as you follow appropriate safety and disposal protocols. But there are far less destructive ways to make sure your data is gone for good. Let me introduce you to secure erasing.

Which Type of Drive Do You Have?

Before we start, you need to know whether you have a HDD or a SSD. To find out, or at least to make sure, you click on the Apple menu and select “About this Mac.” Once there, select the “Storage” tab to see which type of drive is in your system.

The first example, below, shows a SATA Disk (HDD) in the system.

SATA HDD

In the next case, we see we have a Solid State SATA Drive (SSD), plus a Mac SuperDrive.

Mac storage dialog showing SSD

The third screen shot shows an SSD, as well. In this case it’s called “Flash Storage.”

Flash Storage

Make Sure You Have a Backup

Before you get started, you’ll want to make sure that any important data on your hard drive has moved somewhere else. OS X’s built-in Time Machine backup software is a good start, especially when paired with Backblaze. You can learn more about using Time Machine in our Mac Backup Guide.

With a local backup copy in hand and secure cloud storage, you know your data is always safe no matter what happens.

Once you’ve verified your data is backed up, roll up your sleeves and get to work. The key is OS X Recovery — a special part of the Mac operating system since OS X 10.7 “Lion.”

How to Wipe a Mac Hard Disk Drive (HDD)

NOTE: If you’re interested in wiping an SSD, see below.

    1. Make sure your Mac is turned off.
    2. Press the power button.
    3. Immediately hold down the command and R keys.
    4. Wait until the Apple logo appears.
    5. Select “Disk Utility” from the OS X Utilities list. Click Continue.
    6. Select the disk you’d like to erase by clicking on it in the sidebar.
    7. Click the Erase button.
    8. Click the Security Options button.
    9. The Security Options window includes a slider that enables you to determine how thoroughly you want to erase your hard drive.

There are four notches to that Security Options slider. “Fastest” is quick but insecure — data could potentially be rebuilt using a file recovery app. Moving that slider to the right introduces progressively more secure erasing. Disk Utility’s most secure level erases the information used to access the files on your disk, then writes zeroes across the disk surface seven times to help remove any trace of what was there. This setting conforms to the DoD 5220.22-M specification.

  1. Once you’ve selected the level of secure erasing you’re comfortable with, click the OK button.
  2. Click the Erase button to begin. Bear in mind that the more secure method you select, the longer it will take. The most secure methods can add hours to the process.

Once it’s done, the Mac’s hard drive will be clean as a whistle and ready for its next adventure: a fresh installation of OS X, being donated to a relative or a local charity, or just sent to an e-waste facility. Of course you can still drill a hole in your disk or smash it with a sledgehammer if it makes you happy, but now you know how to wipe the data from your old computer with much less ruckus.

The above instructions apply to older Macintoshes with HDDs. What do you do if you have an SSD?

Securely Erasing SSDs, and Why Not To

Most new Macs ship with solid state drives (SSDs). Only the iMac and Mac mini ship with regular hard drives anymore, and even those are available in pure SSD variants if you want.

If your Mac comes equipped with an SSD, Apple’s Disk Utility software won’t actually let you zero the hard drive.

Wait, what?

In a tech note posted to Apple’s own online knowledgebase, Apple explains that you don’t need to securely erase your Mac’s SSD:

With an SSD drive, Secure Erase and Erasing Free Space are not available in Disk Utility. These options are not needed for an SSD drive because a standard erase makes it difficult to recover data from an SSD.

In fact, some folks will tell you not to zero out the data on an SSD, since it can cause wear and tear on the memory cells that, over time, can affect its reliability. I don’t think that’s nearly as big an issue as it used to be — SSD reliability and longevity has improved.

If “Standard Erase” doesn’t quite make you feel comfortable that your data can’t be recovered, there are a couple of options.

FileVault Keeps Your Data Safe

One way to make sure that your SSD’s data remains secure is to use FileVault. FileVault is whole-disk encryption for the Mac. With FileVault engaged, you need a password to access the information on your hard drive. Without it, that data is encrypted.

There’s one potential downside of FileVault — if you lose your password or the encryption key, you’re screwed: You’re not getting your data back any time soon. Based on my experience working at a Mac repair shop, losing a FileVault key happens more frequently than it should.

When you first set up a new Mac, you’re given the option of turning FileVault on. If you don’t do it then, you can turn on FileVault at any time by clicking on your Mac’s System Preferences, clicking on Security & Privacy, and clicking on the FileVault tab. Be warned, however, that the initial encryption process can take hours, as will decryption if you ever need to turn FileVault off.

With FileVault turned on, you can restart your Mac into its Recovery System (by restarting the Mac while holding down the command and R keys) and erase the hard drive using Disk Utility, once you’ve unlocked it (by selecting the disk, clicking the File menu, and clicking Unlock). That deletes the FileVault key, which means any data on the drive is useless.

FileVault doesn’t impact the performance of most modern Macs, though I’d suggest only using it if your Mac has an SSD, not a conventional hard disk drive.

Securely Erasing Free Space on Your SSD

If you don’t want to take Apple’s word for it, if you’re not using FileVault, or if you just want to, there is a way to securely erase free space on your SSD. It’s a little more involved but it works.

Before we get into the nitty-gritty, let me state for the record that this really isn’t necessary to do, which is why Apple’s made it so hard to do. But if you’re set on it, you’ll need to use Apple’s Terminal app. Terminal provides you with command line interface access to the OS X operating system. Terminal lives in the Utilities folder, but you can access Terminal from the Mac’s Recovery System, as well. Once your Mac has booted into the Recovery partition, click the Utilities menu and select Terminal to launch it.

From a Terminal command line, type:

diskutil secureErase freespace VALUE /Volumes/DRIVE

That tells your Mac to securely erase the free space on your SSD. You’ll need to change VALUE to a number between 0 and 4. 0 is a single-pass run of zeroes; 1 is a single-pass run of random numbers; 2 is a 7-pass erase; 3 is a 35-pass erase; and 4 is a 3-pass erase. DRIVE should be changed to the name of your hard drive. To run a 7-pass erase of your SSD drive in “JohnB-Macbook”, you would enter the following:

diskutil secureErase freespace 2 /Volumes/JohnB-Macbook

And remember, if you used a space in the name of your Mac’s hard drive, you need to insert a leading backslash before the space. For example, to run a 35-pass erase on a hard drive called “Macintosh HD” you enter the following:

diskutil secureErase freespace 3 /Volumes/Macintosh\ HD

Something to remember is that the more extensive the erase procedure, the longer it will take.

When Erasing is Not Enough — How to Destroy a Drive

If you absolutely, positively need to be sure that all the data on a drive is irretrievable, see this Scientific American article (with contributions by Gleb Budman, Backblaze CEO), How to Destroy a Hard Drive — Permanently.

The post Getting Rid of Your Mac? Here’s How to Securely Erase a Hard Drive or SSD appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Measuring the throughput for Amazon MQ using the JMS Benchmark

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/measuring-the-throughput-for-amazon-mq-using-the-jms-benchmark/

This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services

Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.

A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.

In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.

Benchmarking throughput for Amazon MQ

ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.

The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.

On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.

Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.

Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.

Non-Persistent Scenarios – Queue latency as you scale producer throughput

JMS Benchmark nonpersistent scenarios

Getting started

At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).

This walkthrough covers the following tasks:

  1.  Create and configure the broker.
  2. Create an EC2 instance to run your benchmark
  3. Configure the security groups
  4.  Run the benchmark.

Step 1 – Create and configure the broker
Create and configure the broker using Tutorial: Creating and Configuring an Amazon MQ Broker.

Step 2 – Create an EC2 instance to run your benchmark
Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.

Step 3 – Configure the security groups
Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.

  1. Sign in to the Amazon MQ console.
  2. From the broker list, choose the name of your broker (for example, MyBroker)
  3. In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
  4. From the security group list, choose your security group.
  5. At the bottom of the page, choose Inbound, Edit.
  6. In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker:
    • Choose Add Rule.
    • For Type, choose Custom TCP.
    • For Port Range, type the ActiveMQ SSL port (61617).
    • For Source, leave Custom selected and then type the security group of your EC2 instance.
    • Choose Save.

Your broker can now accept the connection from your EC2 instance.

Step 4 – Run the benchmark
Connect to your EC2 instance using SSH and run the following commands:

$ cd ~
$ curl -L https://github.com/alanprot/jms-benchmark/archive/master.zip -o master.zip
$ unzip master.zip
$ cd jms-benchmark-master
$ chmod a+x bin/*
$ env \
  SERVER_SETUP=false \
  SERVER_ADDRESS={activemq-endpoint} \
  ACTIVEMQ_TRANSPORT=ssl\
  ACTIVEMQ_PORT=61617 \
  ACTIVEMQ_USERNAME={activemq-user} \
  ACTIVEMQ_PASSWORD={activemq-password} \
  ./bin/benchmark-activemq

After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.

Amazon MQ architecture

The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.

Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.

Conclusion

We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.

To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

Enchanting images with Inky Lines, a Pi‑powered polargraph

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/enchanting-images-inky-lines-pi-powered-polargraph/

A hanging plotter, also known as a polar plotter or polargraph, is a machine for drawing images on a vertical surface. It does so by using motors to control the length of two cords that form a V shape, supporting a pen where they meet. We’ve featured one on this blog before: Norbert “HomoFaciens” Heinz’s video is a wonderfully clear introduction to how a polargraph works and what you have to consider when you’re putting one together.

Today, we look at Inky Lines, by John Proudlock. With it, John is creating a series of captivating and beautiful pieces, and with his most recent work, each rendering of an image is unique.

The Inky Lines plotter draws a flock of seagulls in blue ink on white paper. The print head is suspended near the bottom left corner of the image, as the pen inks the wing of a gull

An evolving project

The project isn’t new – John has been working on it for at least a couple of years – but it is constantly evolving. When we first spotted it, John had just implemented code to allow the plotter to produce mesmeric, spiralling patterns.

A blue spiral pattern featuring overlapping "bubbles"
A dense pink spiral pattern, featuring concentric circles and reminiscent of a mandala
A blue spirograph-type pattern formed of large overlapping squares, each offset from its neighbour by a few degrees, producing a four-spiral-armed "galaxy" shape where lines overlap. The plotter's print head is visible in a corner of the image

But we’re skipping ahead. Let’s go back to the beginning.

From pixels to motor movements

John starts by providing an image, usually no more than 100 pixels wide, to a Raspberry Pi. Custom software that he wrote evaluates the darkness of each pixel and selects a pattern of a suitable density to represent it.

The two cords supporting the plotter’s pen are wound around the shafts of two stepper motors, such that the movement of the motors controls the length of the cords: the program next calculates how much each motor must move in order to produce the pattern. The Raspberry Pi passes corresponding instructions to two motor circuits, which transform the signals to a higher voltage and pass them to the stepper motors. These turn by very precise amounts, winding or unwinding the cords and, very slowly, dragging the pen across the paper.

A Raspberry Pi in a case, with a wide flex connected to a GPIO header
The Inky Lines plotter's print head, featuring cardboard and tape, draws an apparently random squiggle
A large area of apparently random pattern drawn by the plotter

John explains,

Suspended in-between the two motors is a print head, made out of a new 3-d modelling material I’ve been prototyping called cardboard. An old coat hanger and some velcro were also used.

(He’s our kind of maker.)

Unique images

The earlier drawings that John made used a repeatable method to render image files as lines on paper. That is, if the machine drew the same image a number of times, each copy would be identical. More recently, though, he has been using a method that yields random movements of the pen:

The pen point is guided around the image, but moves to each new point entirely at random. Up close this looks like a chaotic squiggle, but from a distance of a couple of meters, the human eye (and brain) make order from the chaos and view an infinite number of shades and a smoother, less mechanical image.

An apparently chaotic squiggle

This method means that no matter how many times the polargraph repeats the same image, each copy will be unique.

A gallery of work

Inky Lines’ website and its Instagram feed offer a collection of wonderful pieces John has drawn with his polargraph, and he discusses the different techniques and types of image that he is exploring.

A 3 x 3 grid of varied and colourful images from inkylinespolargraph's Instagram feed

They range from holiday photographs, processed to extract particular features and rendered in silhouette, to portraits, made with a single continuous line that can be several hundred metres long, to generative images spirograph images like those pictured above, created by an algorithm rather than rendered from a source image.

The post Enchanting images with Inky Lines, a Pi‑powered polargraph appeared first on Raspberry Pi.