Software Freedom Is Elementary, My Dear Watson.

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2011/03/01/watson.html

I’ve watched the game
show, Jeopardy!
, regularly since its Trebek-hosted
relaunch on 1984-09-10. I even remember distinctly the Final Jeopardy
question that night as This date is the first day of the new
millennium
. At the age of 11, I got the answer wrong, falling for
the incorrect What is 2000-01-01?, but I recalled this memory
eleven years ago during the
debates
regarding when the millennium turnover happened
.

I had periods of life where I watched Jeopardy! only
rarely, but in recent years (as I’ve become more of a student of games
(in part, because of poker)), I’ve watched Jeopardy! almost
nightly over dinner with my wife. I’ve learned that I’m unlikely to
excel as a Jeopardy! player myself because (a) I read slow
and (b) my recall of facts, while reasonably strong, is not
instantaneous. I thus haven’t tried out for the show, but I’m
nevertheless a fan of strong players.

Jeopardy! isn’t my only spectator game. Right after
college, even though I’m a worse-than-mediocre chess player, I watched
with excitement
as Deep
Blue
played and defeated Kasparov. Kasparov has disputed the
results and how much humans were actually involved, but even so, such
interference was minimal (between matches) and the demonstration still
showed computer algorithmic mastery of chess.

Of course, the core algorithms that Deep Blue used were well known and
often implemented. I learned α-β pruning in my undergraduate
AI course and it was clear that a sufficiently fast computer, given a
few strong heuristics, could beat most any full information game with a
reasonable branching factor. And, computers typically do these days.

I suppose I never really thought about the issues of Deep Blue being
released as Free Software. First, because I was not as involved with
Free Software then as I am now, and also, as near as anyone could tell,
Deep Blue’s software was probably not useful for anything other than
playing chess, and its primary power was in its ability to go very deep
(hence the name, I guess) in the search tree. In short, Deep Blue was
primarily a hardware, not a software, success story.

It was nevertheless, impressive, and last month, I saw the next
installment in this IBM story. I watched with interest
as IBM’s
Watson defeated two champion Jeopardy! players
. Ken
Jennings, for one, even welcomed our new computer overlords.

Watson beating Jeopardy! is, frankly, a lot more
innovative than Deep Blue beating chess. Most don’t know this about me,
but I came very close to focusing my career on PhD work in Natural
Language Processing; I believe fundamentally it’s the area of AI most in
need of attention and research. Watson is a shining example of success
in modern NLP, and I actually believe some of the IBM hype about
how Watson’s
technology can be applied elsewhere, such as medical information
systems
. Indeed, IBM
has announced
a deal with Columbia University Medical Center to adapt the system for
medical diagnostics
. (Perhaps Watson’s next TV appearance will be
on House.)

This all sounds great to most people, but to me, my real concern is the
freedom of the software. We’ve shown in the software freedom community
that to advance software and improve it, sharing the software is
essential. Technology locked up in a vaulted cave doesn’t allow all the
great minds to collaborate. Just as we don’t lock up libraries so that
only the guilded overlords have access, nor should the best software
technology be restricted in proprietariness.

Indeed, Eric
Brown
, at
his Linux
Foundation End User Linux Summit talk
, told us that Watson relied
heavily on the publicly available software freedom codebase, such as
GNU/Linux, Hadoop, and other
FLOSS
components. They clearly couldn’t do their work without building upon the
work we shared with IBM, yet IBM apparently ignores its moral obligation to
reciprocate.

So, I just point-blank asked Brown why Watson is proprietary. Of
course, I long ago learned to never ask a confrontational question from
the crowd at a technical talk without knowing what the answer is likely to
be. Brown answered in the way I expected: We’re working with
Universities to provide a framework for their research
. I followed
up asking
when he would actually release the sources and what license
would be. He dodged the question, and instead speculated about what
licenses IBM sometimes like to use when it does chose to release code;
he did not indicate if Watson’s sources will ever be released. In
short, the answer from IBM is clear: Watson’s general ideas
will be shared with academics, but the source code won’t be.

This point is precisely one of the reasons I didn’t pursue a career in
academic Computer Science. Since most jobs — including
professorships at Universities — for PhDs in Computer Science
require that any code written be kept proprietary, most
Computer Science researchers have convinced themselves that code doesn’t
matter; only publishing ideas do. This belief is so pervasive that I
knew something like this would be Brown’s response to my query. (I was
even so sure, I wrote almost this entire blog post before I asked the
question).

I’d easily agree that publishing papers is better than the technology
being only a trade secret. At least we can learn a little bit about the
work. But in all but the pure theoretical areas of Computer
Science, code is written to exemplify, test, and exercise the
ideas. Merely publishing papers and not the code is akin to a chemist
publishing final results but nothing about the methodologies or raw
data. Science, in such cases, is unverifiable and unreproducible. If
we accepted such in fields other than CS, we’d have accepted the idea
that cold
fusion was discovered in 1989
.

I don’t think I’m going to convince IBM to release Watson’s sources as
Free Software. What I do hope is that perhaps this blog post convinces
a few more people that we just shouldn’t accept that Computer Science is
advanced by researchers who give us flashy demos and code-less
research papers. I, for one, welcome our computer overlords…but only
if I can study and modify their source code.