Tag Archives: Nokia

Create SLUG! It’s just like Snake, but with a slug

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/slug-snake/

Recreate Snake, the favourite mobile phone game from the late nineties, using a slug*, a Raspberry Pi, a Sense HAT, and our free resource!

Raspberry Pi Sense HAT Slug free resource

*A virtual slug. Not a real slug. Please leave the real slugs out in nature.

Snake SLUG!

Move aside, Angry Birds! On your bike, Pokémon Go! When it comes to the cream of the crop of mobile phone games, Snake holds the top spot.

Snake Nokia Game

I could while away the hours…

You may still have an old Nokia 3310 lost in the depths of a drawer somewhere — the drawer that won’t open all the way because something inside is jammed at an odd angle. So it will be far easier to grab your Pi and Sense HAT, or use the free Sense HAT emulator (online or on Raspbian), and code Snake SLUG yourself. In doing so, you can introduce the smaller residents of your household to the best reptile-focused game ever made…now with added mollusc.

The resource

To try out the game for yourself, head to our resource page, where you’ll find the online Sense HAT emulator embedded and ready to roll.

Raspberry Pi Sense HAT Slug free resource

It’ll look just like this, and you can use your computer’s arrow keys to direct your slug toward her tasty treats.

From there, you’ll be taken on a step-by-step journey from zero to SLUG glory while coding your own versionof the game in Python. On the way, you’ll learn to work with two-dimensional lists and to use the Sense HAT’s pixel display and joystick input. And by completing the resource, you’ll expand your understanding of applying abstraction and decomposition to solve more complex problems, in line with our Digital Making Curriculum.

The Sense HAT

The Raspberry Pi Sense HAT was originally designed and made as part of the Astro Pi mission in December 2015. With an 8×8 RGB LED matrix, a joystick, and a plethora of on-board sensors including an accelerometer, gyroscope, and magnetometer, it’s a great add-on for your digital making toolkit, and excellent for projects involving data collection and evaluation.

You can find more of our free Sense HAT tutorials here, including for making Flappy Bird Astronaut, a marble maze, and Pong.

The post Create SLUG! It’s just like Snake, but with a slug appeared first on Raspberry Pi.

2017 Holiday Gift Guide — Backblaze Style

Post Syndicated from Yev original https://www.backblaze.com/blog/2017-holiday-gift-guide-backblaze-style/


Here at Backblaze we have a lot of folks who are all about technology. With the holiday season fast approaching, you might have all of your gift buying already finished — but if not, we put together a list of things that the employees here at Backblaze are pretty excited about giving (and/or receiving) this year.

Smart Homes:

It’s no secret that having a smart home is the new hotness, and many of the items below can be used to turbocharge your home’s ascent into the future:

Raspberry Pi
The holidays are all about eating pie — well why not get a pie of a different type for the DIY fan in your life!

Wyze Cam
An inexpensive way to keep a close eye on all your favorite people…and intruders!

Snooz
Have trouble falling asleep? Try this portable white noise machine. Also great for the office!

Amazon Echo Dot
Need a cheap way to keep track of your schedule or play music? The Echo Dot is a great entry into the smart home of your dreams!

Google Wifi
These little fellows make it easy to Wifi-ify your entire home, even if it’s larger than the average shoe box here in Silicon Valley. Google Wifi acts as a mesh router and seamlessly covers your whole dwelling. Have a mansion? Buy more!

Google Home
Like the Amazon Echo Dot, this is the Google variant. It’s more expensive (similar to the Amazon Echo) but has better sound quality and is tied into the Google ecosystem.

Nest Thermostat
This is a smart thermostat. What better way to score points with the in-laws than installing one of these bad boys in their home — and then making it freezing cold randomly in the middle of winter from the comfort of your couch!

Wearables:

Homes aren’t the only things that should be smart. Your body should also get the chance to be all that it can be:

Apple AirPods
You’ve seen these all over the place, and the truth is they do a pretty good job of making sounds appear in your ears.

Bose SoundLink Wireless Headphones
If you like over-the-ear headphones, these noise canceling ones work great, are wireless and lovely. There’s no better way to ignore people this holiday season!

Garmin Fenix 5 Watch
This watch is all about fitness. If you enjoy fitness. This watch is the fitness watch for your fitness needs.

Apple Watch
The Apple Watch is a wonderful gadget that will light up any movie theater this holiday season.

Nokia Steel Health Watch
If you’re into mixing analogue and digital, this is a pretty neat little gadget.

Fossil Smart Watch
This stylish watch is a pretty neat way to dip your toe into smartwatches and activity trackers.

Pebble Time Steel Smart Watch
Some people call this the greatest smartwatch of all time. Those people might be named Yev. This watch is great at sending you notifications from your phone, and not needing to be charged every day. Bellissimo!

Random Goods:

A few of the holiday gift suggestions that we got were a bit off-kilter, but we do have a lot of interesting folks in the office. Hopefully, you might find some of these as interesting as they do:

Wireless Qi Charger
Wireless chargers are pretty great in that you don’t have to deal with dongles. There are even kits to make your electronics “wirelessly chargeable” which is pretty great!

Self-Heating Coffee Mug
Love coffee? Hate lukewarm coffee? What if your coffee cup heated itself? Brilliant!

Yeast Stirrer
Yeast. It makes beer. And bread! Sometimes you need to stir it. What cooler way to stir your yeast than with this industrial stirrer?

Toto Washlet
This one is self explanatory. You know the old rhyme: happy butts, everyone’s happy!

Good luck out there this holiday season!

blog-giftguide-present

The post 2017 Holiday Gift Guide — Backblaze Style appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

GNOME Foundation partners with Purism to support its efforts to build the Librem 5 smartphone

Post Syndicated from ris original https://lwn.net/Articles/734325/rss

Last week KDE announced that they were
working with Purism on the Librem 5 smartphone. The GNOME Foundation has
also provided
its endorsement and support
of Purism’s efforts to build the Librem 5.
As part of the collaboration, if the campaign is successful the GNOME Foundation plans to enhance GNOME shell and general performance of the system with Purism to enable features on the Librem 5.

Various GNOME technologies are used extensively in embedded devices today, and GNOME developers have experienced some of the challenges that face mobile computing specifically with the Nokia 770, N800 and N900, the One Laptop Per Child project’s XO laptop and FIC’s Neo1973 mobile phone.”

100 days of postmarketOS

Post Syndicated from corbet original https://lwn.net/Articles/732792/rss

The postmarketOS distribution looks
back at its first 100 days
. “One of our previously stated goals
is using the mainline Linux kernel on as many mobile devices as
possible. This is not as easy as it might sound, since many Linux-based
smartphones (Android) require binary drivers which depend on very specific
kernel versions. It’s a tremendous task to rewrite these drivers to work
with the current kernel APIs. Nevertheless, some people have been doing
that since long before postmarketOS existed. In the case of the Nokia N900
this has been going on for some number of years and almost all components
are now supported in the mainline kernel. This has allowed us to use the
mainline kernel as the default kernel for the N900, jumping from Maemo’s
2.6.x to mainline 4.12!

Scripto: the distraction-free writing tool

Post Syndicated from Laura Clay original https://www.raspberrypi.org/blog/scripto-distraction-free-writing-tool/

When I’m not copyediting for the lovely folks at Raspberry Pi and The MagPi, I write fiction, often involving teenagers running away from murderous water-horses. One of the main problems I have is avoiding distractions, like reading Brooding YA Hero‘s adventures on Twitter, the constant nagging thought that the next email will be another rejection, and the copious amounts of Taylor Lautner GIFs on Tumblr.

Taylor Lautner

Sorry, where was I?

Before you know it, it’s 5pm and your word count’s looking grim. Fortunately, Edinburgh’s Craig Lam (@siliconeidolon) has come up with an ingenious solution in the form of the Scripto, a distraction-eliminating writing tool with a Pi Zero at its heart. He explains:

Writing creatively to a deadline is psychologically challenging: any distraction or obstacle can derail the process. Multifunction devices like laptops are rife with distractions; writing longhand is painful; dedicated word processors are out-of-production or severely flawed. These observations inspired a vision for a distraction-free, portable, convenient, and supportive tool for writers.’

Procrastination

The Scripto, devised as part of Craig’s BSc project, takes the familiar form of a laptop, but with a difference: the only connection to the outside world is to cloud backup. No refreshing social media feeds, no ‘I’ll just check my email once more’… just glorious, uninterrupted writing.

Scripto

The Scripto has a Pixel Qi touchscreen readable in sunlight, with a photovoltaic strip to provide extra power on top of the ten hours battery life. No more squinting at a laptop on a summer day. The Pi boots into a fullscreen adapted version of FocusWriter and provides a host of motivational features, like timers for ‘word sprints’ and word count targets.

Scripto UI

What I love about the Scripto is how personal you can make it. Changeable covers are a nice nod to the old Nokia days (one for the 90s kids), and a companion app to track your word count is a great motivator. (Anyone who’s participated in National Novel Writing Month will know about the joys of racing to 50,000 words and watching your daily graph zooming upwards.)

The focus is on ease of use and low cost, just like the Pi Zero that powers it. The case material will be biodegradable – and café-proof, for those of us often writing hunched over a cappuccino and crumbly cake – and the word processor can come preloaded with the Open Dyslexic font, which is another plus. Social sustainability was important when Craig planned the Scripto; he specifically mentions our educational initiatives like Picademy as a factor when choosing a single-board computer to run the device.

Scripto side view

Craig is considering crowdfunding the Scripto, and I’m sure many writers will be queuing up for one of these nifty devices.

While you wait for the Scripto, you could read some brilliant books by Laura Lam, who told me all about her husband’s project. Or you could try short stories by some other Laura, available on Amazon, Smashwords, and in Blackwell’s Edinburgh. Ahem.

The post Scripto: the distraction-free writing tool appeared first on Raspberry Pi.

Raspberry Pi astrophotography

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-pi-astrophotography/

Tonight marks the appearance of the brightest supermoon to grace the sky since 1948, appearing 30% brighter and 14% bigger than the usual glowing orb. The moon will not be this close again until November 2034.

Given this, and assuming the sky remains clear enough tonight to catch a glimpse, here’s one of several Raspberry Pi-powered astrophotography projects to get your creative senses tingling.

Having already created a similar project with a Nokia Lumia, TJ “Lifetime tinkerer” Emsley decided to try attaching a Raspberry Pi and Camera Module to a newly adopted Tesco 45X refractor telescope. They added a $10 USB shield, wireless NIC, and the usual setup components, and the project was underway.

TJ EMSLEY Moon Photography

TJ designed and 3D-printed a mount and bracket; the files are available on Thingiverse for those interested in building their own. The two-part design allows for use with various telescopes, thanks to an adjustable eyepiece adapter.

A Pi Zero fits onto the bracket, the Pi camera snug to the eyepiece, and the build is ready.

TJ EMSLEY Moon Photography

The Pi runs code written by TJ, allowing for image preview and exposure adjustments. You can choose between video and still images, and you can trigger the camera via a keyboard; this way, you don’t unsettle the camera to capture an image by having to touch the adapter in any way.

TJ will eventually be uploading the project to GitHub, but a short search will help you to build your own camera code (start here), so why not share your astrophotography with us in the comments below?

Enjoy the supermoon!

The post Raspberry Pi astrophotography appeared first on Raspberry Pi.

Rethinking PID 1

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/systemd.html

If you are well connected or good at reading between the lines
you might already know what this blog post is about. But even then
you may find this story interesting. So grab a cup of coffee,
sit down, and read what’s coming.

This blog story is long, so even though I can only recommend
reading the long story, here’s the one sentence summary: we are
experimenting with a new init system and it is fun.

Here’s the code. And here’s the story:

Process Identifier 1

On every Unix system there is one process with the special
process identifier 1. It is started by the kernel before all other
processes and is the parent process for all those other processes
that have nobody else to be child of. Due to that it can do a lot
of stuff that other processes cannot do. And it is also
responsible for some things that other processes are not
responsible for, such as bringing up and maintaining userspace
during boot.

Historically on Linux the software acting as PID 1 was the
venerable sysvinit package, though it had been showing its age for
quite a while. Many replacements have been suggested, only one of
them really took off: Upstart, which has by now found
its way into all major distributions.

As mentioned, the central responsibility of an init system is
to bring up userspace. And a good init system does that
fast. Unfortunately, the traditional SysV init system was not
particularly fast.

For a fast and efficient boot-up two things are crucial:

  • To start less.
  • And to start more in parallel.

What does that mean? Starting less means starting fewer
services or deferring the starting of services until they are
actually needed. There are some services where we know that they
will be required sooner or later (syslog, D-Bus system bus, etc.),
but for many others this isn’t the case. For example, bluetoothd
does not need to be running unless a bluetooth dongle is actually
plugged in or an application wants to talk to its D-Bus
interfaces. Same for a printing system: unless the machine
physically is connected to a printer, or an application wants to
print something, there is no need to run a printing daemon such as
CUPS. Avahi: if the machine is not connected to a
network, there is no need to run Avahi, unless some application wants
to use its APIs. And even SSH: as long as nobody wants to contact
your machine there is no need to run it, as long as it is then
started on the first connection. (And admit it, on most machines
where sshd might be listening somebody connects to it only every
other month or so.)

Starting more in parallel means that if we have
to run something, we should not serialize its start-up (as sysvinit
does), but run it all at the same time, so that the available
CPU and disk IO bandwidth is maxed out, and hence
the overall start-up time minimized.

Hardware and Software Change Dynamically

Modern systems (especially general purpose OS) are highly
dynamic in their configuration and use: they are mobile, different
applications are started and stopped, different hardware added and
removed again. An init system that is responsible for maintaining
services needs to listen to hardware and software
changes. It needs to dynamically start (and sometimes stop)
services as they are needed to run a program or enable some
hardware.

Most current systems that try to parallelize boot-up still
synchronize the start-up of the various daemons involved: since
Avahi needs D-Bus, D-Bus is started first, and only when D-Bus
signals that it is ready, Avahi is started too. Similar for other
services: livirtd and X11 need HAL (well, I am considering the
Fedora 13 services here, ignore that HAL is obsolete), hence HAL
is started first, before livirtd and X11 are started. And
libvirtd also needs Avahi, so it waits for Avahi too. And all of
them require syslog, so they all wait until Syslog is fully
started up and initialized. And so on.

Parallelizing Socket Services

This kind of start-up synchronization results in the
serialization of a significant part of the boot process. Wouldn’t
it be great if we could get rid of the synchronization and
serialization cost? Well, we can, actually. For that, we need to
understand what exactly the daemons require from each other, and
why their start-up is delayed. For traditional Unix daemons,
there’s one answer to it: they wait until the socket the other
daemon offers its services on is ready for connections. Usually
that is an AF_UNIX socket in the file-system, but it could be
AF_INET[6], too. For example, clients of D-Bus wait that
/var/run/dbus/system_bus_socket can be connected to,
clients of syslog wait for /dev/log, clients of CUPS wait
for /var/run/cups/cups.sock and NFS mounts wait for
/var/run/rpcbind.sock and the portmapper IP port, and so
on. And think about it, this is actually the only thing they wait
for!

Now, if that’s all they are waiting for, if we manage to make
those sockets available for connection earlier and only actually
wait for that instead of the full daemon start-up, then we can
speed up the entire boot and start more processes in parallel. So,
how can we do that? Actually quite easily in Unix-like systems: we
can create the listening sockets before we actually start
the daemon, and then just pass the socket during exec()
to it. That way, we can create all sockets for all
daemons in one step in the init system, and then in a second step
run all daemons at once. If a service needs another, and it is not
fully started up, that’s completely OK: what will happen is that
the connection is queued in the providing service and the client
will potentially block on that single request. But only that one
client will block and only on that one request. Also, dependencies
between services will no longer necessarily have to be configured
to allow proper parallelized start-up: if we start all sockets at
once and a service needs another it can be sure that it can
connect to its socket.

Because this is at the core of what is following, let me say
this again, with different words and by example: if you start
syslog and and various syslog clients at the same time, what will
happen in the scheme pointed out above is that the messages of the
clients will be added to the /dev/log socket buffer. As
long as that buffer doesn’t run full, the clients will not have to
wait in any way and can immediately proceed with their start-up. As
soon as syslog itself finished start-up, it will dequeue all
messages and process them. Another example: we start D-Bus and
several clients at the same time. If a synchronous bus
request is sent and hence a reply expected, what will happen is
that the client will have to block, however only that one client
and only until D-Bus managed to catch up and process it.

Basically, the kernel socket buffers help us to maximize
parallelization, and the ordering and synchronization is done by
the kernel, without any further management from userspace! And if
all the sockets are available before the daemons actually start-up,
dependency management also becomes redundant (or at least
secondary): if a daemon needs another daemon, it will just connect
to it. If the other daemon is already started, this will
immediately succeed. If it isn’t started but in the process of
being started, the first daemon will not even have to wait for it,
unless it issues a synchronous request. And even if the other
daemon is not running at all, it can be auto-spawned. From the
first daemon’s perspective there is no difference, hence dependency
management becomes mostly unnecessary or at least secondary, and
all of this in optimal parallelization and optionally with
on-demand loading. On top of this, this is also more robust, because
the sockets stay available regardless whether the actual daemons
might temporarily become unavailable (maybe due to crashing). In
fact, you can easily write a daemon with this that can run, and
exit (or crash), and run again and exit again (and so on), and all
of that without the clients noticing or loosing any request.

It’s a good time for a pause, go and refill your coffee mug,
and be assured, there is more interesting stuff following.

But first, let’s clear a few things up: is this kind of logic
new? No, it certainly is not. The most prominent system that works
like this is Apple’s launchd system: on MacOS the listening of the
sockets is pulled out of all daemons and done by launchd. The
services themselves hence can all start up in parallel and
dependencies need not to be configured for them. And that is
actually a really ingenious design, and the primary reason why
MacOS manages to provide the fantastic boot-up times it
provides. I can highly recommend this
video
where the launchd folks explain what they are
doing. Unfortunately this idea never really took on outside of the Apple
camp.

The idea is actually even older than launchd. Prior to launchd
the venerable inetd worked much like this: sockets were
centrally created in a daemon that would start the actual service
daemons passing the socket file descriptors during
exec(). However the focus of inetd certainly
wasn’t local services, but Internet services (although later
reimplementations supported AF_UNIX sockets, too). It also wasn’t a
tool to parallelize boot-up or even useful for getting implicit
dependencies right.

For TCP sockets inetd was primarily used in a way that
for every incoming connection a new daemon instance was
spawned. That meant that for each connection a new
process was spawned and initialized, which is not a
recipe for high-performance servers. However, right from the
beginning inetd also supported another mode, where a
single daemon was spawned on the first connection, and that single
instance would then go on and also accept the follow-up connections
(that’s what the wait/nowait option in
inetd.conf was for, a particularly badly documented
option, unfortunately.) Per-connection daemon starts probably gave
inetd its bad reputation for being slow. But that’s not entirely
fair.

Parallelizing Bus Services

Modern daemons on Linux tend to provide services via D-Bus
instead of plain AF_UNIX sockets. Now, the question is, for those
services, can we apply the same parallelizing boot logic as for
traditional socket services? Yes, we can, D-Bus already has all
the right hooks for it: using bus activation a service can be
started the first time it is accessed. Bus activation also gives
us the minimal per-request synchronisation we need for starting up
the providers and the consumers of D-Bus services at the same
time: if we want to start Avahi at the same time as CUPS (side
note: CUPS uses Avahi to browse for mDNS/DNS-SD printers), then we
can simply run them at the same time, and if CUPS is quicker than
Avahi via the bus activation logic we can get D-Bus to queue the
request until Avahi manages to establish its service name.

So, in summary: the socket-based service activation and the
bus-based service activation together enable us to start
all daemons in parallel, without any further
synchronization. Activation also allows us to do lazy-loading of
services: if a service is rarely used, we can just load it the
first time somebody accesses the socket or bus name, instead of
starting it during boot.

And if that’s not great, then I don’t know what is
great!

Parallelizing File System Jobs

If you look at
the serialization graphs of the boot process
of current
distributions, there are more synchronisation points than just
daemon start-ups: most prominently there are file-system related
jobs: mounting, fscking, quota. Right now, on boot-up a lot of
time is spent idling to wait until all devices that are listed in
/etc/fstab show up in the device tree and are then
fsck’ed, mounted, quota checked (if enabled). Only after that is
fully finished we go on and boot the actual services.

Can we improve this? It turns out we can. Harald Hoyer came up
with the idea of using the venerable autofs system for this:

Just like a connect() call shows that a service is
interested in another service, an open() (or a similar
call) shows that a service is interested in a specific file or
file-system. So, in order to improve how much we can parallelize
we can make those apps wait only if a file-system they are looking
for is not yet mounted and readily available: we set up an autofs
mount point, and then when our file-system finished fsck and quota
due to normal boot-up we replace it by the real mount. While the
file-system is not ready yet, the access will be queued by the
kernel and the accessing process will block, but only that one
daemon and only that one access. And this way we can begin
starting our daemons even before all file systems have been fully
made available — without them missing any files, and maximizing
parallelization.

Parallelizing file system jobs and service jobs does
not make sense for /, after all that’s where the service
binaries are usually stored. However, for file-systems such as
/home, that usually are bigger, even encrypted, possibly
remote and seldom accessed by the usual boot-up daemons, this
can improve boot time considerably. It is probably not necessary
to mention this, but virtual file systems, such as
procfs or sysfs should never be mounted via autofs.

I wouldn’t be surprised if some readers might find integrating
autofs in an init system a bit fragile and even weird, and maybe
more on the “crackish” side of things. However, having played
around with this extensively I can tell you that this actually
feels quite right. Using autofs here simply means that we can
create a mount point without having to provide the backing file
system right-away. In effect it hence only delays accesses. If an
application tries to access an autofs file-system and we take very
long to replace it with the real file-system, it will hang in an
interruptible sleep, meaning that you can safely cancel it, for
example via C-c. Also note that at any point, if the mount point
should not be mountable in the end (maybe because fsck failed), we
can just tell autofs to return a clean error code (like
ENOENT). So, I guess what I want to say is that even though
integrating autofs into an init system might appear adventurous at
first, our experimental code has shown that this idea works
surprisingly well in practice — if it is done for the right
reasons and the right way.

Also note that these should be direct autofs
mounts, meaning that from an application perspective there’s
little effective difference between a classic mount point and one
based on autofs.

Keeping the First User PID Small

Another thing we can learn from the MacOS boot-up logic is
that shell scripts are evil. Shell is fast and shell is slow. It
is fast to hack, but slow in execution. The classic sysvinit boot
logic is modelled around shell scripts. Whether it is
/bin/bash or any other shell (that was written to make
shell scripts faster), in the end the approach is doomed to be
slow. On my system the scripts in /etc/init.d call
grep at least 77 times. awk is called 92
times, cut 23 and sed 74. Every time those
commands (and others) are called, a process is spawned, the
libraries searched, some start-up stuff like i18n and so on set up
and more. And then after seldom doing more than a trivial string
operation the process is terminated again. Of course, that has to
be incredibly slow. No other language but shell would do something like
that. On top of that, shell scripts are also very fragile, and
change their behaviour drastically based on environment variables
and suchlike, stuff that is hard to oversee and control.

So, let’s get rid of shell scripts in the boot process! Before
we can do that we need to figure out what they are currently
actually used for: well, the big picture is that most of the time,
what they do is actually quite boring. Most of the scripting is
spent on trivial setup and tear-down of services, and should be
rewritten in C, either in separate executables, or moved into the
daemons themselves, or simply be done in the init system.

It is not likely that we can get rid of shell scripts during
system boot-up entirely anytime soon. Rewriting them in C takes
time, in a few case does not really make sense, and sometimes
shell scripts are just too handy to do without. But we can
certainly make them less prominent.

A good metric for measuring shell script infestation of the
boot process is the PID number of the first process you can start
after the system is fully booted up. Boot up, log in, open a
terminal, and type echo $$. Try that on your Linux
system, and then compare the result with MacOS! (Hint, it’s
something like this: Linux PID 1823; MacOS PID 154, measured on
test systems we own.)

Keeping Track of Processes

A central part of a system that starts up and maintains
services should be process babysitting: it should watch
services. Restart them if they shut down. If they crash it should
collect information about them, and keep it around for the
administrator, and cross-link that information with what is
available from crash dump systems such as abrt, and in logging
systems like syslog or the audit system.

It should also be capable of shutting down a service
completely. That might sound easy, but is harder than you
think. Traditionally on Unix a process that does double-forking
can escape the supervision of its parent, and the old parent will
not learn about the relation of the new process to the one it
actually started. An example: currently, a misbehaving CGI script
that has double-forked is not terminated when you shut down
Apache. Furthermore, you will not even be able to figure out its
relation to Apache, unless you know it by name and purpose.

So, how can we keep track of processes, so that they cannot
escape the babysitter, and that we can control them as one unit
even if they fork a gazillion times?

Different people came up with different solutions for this. I
am not going into much detail here, but let’s at least say that
approaches based on ptrace or the netlink connector (a kernel
interface which allows you to get a netlink message each time any
process on the system fork()s or exit()s) that some people have
investigated and implemented, have been criticised as ugly and not
very scalable.

So what can we do about this? Well, since quite a while the
kernel knows Control
Groups
(aka “cgroups”). Basically they allow the creation of a
hierarchy of groups of processes. The hierarchy is directly
exposed in a virtual file-system, and hence easily accessible. The
group names are basically directory names in that file-system. If
a process belonging to a specific cgroup fork()s, its child will
become a member of the same group. Unless it is privileged and has
access to the cgroup file system it cannot escape its
group. Originally, cgroups have been introduced into the kernel
for the purpose of containers: certain kernel subsystems can
enforce limits on resources of certain groups, such as limiting
CPU or memory usage. Traditional resource limits (as implemented
by setrlimit()) are (mostly) per-process. cgroups on the
other hand let you enforce limits on entire groups of
processes. cgroups are also useful to enforce limits outside of
the immediate container use case. You can use it for example to
limit the total amount of memory or CPU Apache and all its
children may use. Then, a misbehaving CGI script can no longer
escape your setrlimit() resource control by simply
forking away.

In addition to container and resource limit enforcement cgroups
are very useful to keep track of daemons: cgroup membership is
securely inherited by child processes, they cannot escape. There’s
a notification system available so that a supervisor process can
be notified when a cgroup runs empty. You can find the cgroups of
a process by reading /proc/$PID/cgroup. cgroups hence
make a very good choice to keep track of processes for babysitting
purposes.

Controlling the Process Execution Environment

A good babysitter should not only oversee and control when a
daemon starts, ends or crashes, but also set up a good, minimal,
and secure working environment for it.

That means setting obvious process parameters such as the
setrlimit() resource limits, user/group IDs or the
environment block, but does not end there. The Linux kernel gives
users and administrators a lot of control over processes (some of
it is rarely used, currently). For each process you can set CPU
and IO scheduler controls, the capability bounding set, CPU
affinity or of course cgroup environments with additional limits,
and more.

As an example, ioprio_set() with
IOPRIO_CLASS_IDLE is a great away to minimize the effect
of locate‘s updatedb on system interactivity.

On top of that certain high-level controls can be very useful,
such as setting up read-only file system overlays based on
read-only bind mounts. That way one can run certain daemons so
that all (or some) file systems appear read-only to them, so that
EROFS is returned on every write request. As such this can be used
to lock down what daemons can do similar in fashion to a poor
man’s SELinux policy system (but this certainly doesn’t replace
SELinux, don’t get any bad ideas, please).

Finally logging is an important part of executing services:
ideally every bit of output a service generates should be logged
away. An init system should hence provide logging to daemons it
spawns right from the beginning, and connect stdout and stderr to
syslog or in some cases even /dev/kmsg which in many
cases makes a very useful replacement for syslog (embedded folks,
listen up!), especially in times where the kernel log buffer is
configured ridiculously large out-of-the-box.

On Upstart

To begin with, let me emphasize that I actually like the code
of Upstart, it is very well commented and easy to
follow. It’s certainly something other projects should learn
from (including my own).

That being said, I can’t say I agree with the general approach
of Upstart. But first, a bit more about the project:

Upstart does not share code with sysvinit, and its
functionality is a super-set of it, and provides compatibility to
some degree with the well known SysV init scripts. It’s main
feature is its event-based approach: starting and stopping of
processes is bound to “events” happening in the system, where an
“event” can be a lot of different things, such as: a network
interfaces becomes available or some other software has been
started.

Upstart does service serialization via these events: if the
syslog-started event is triggered this is used as an
indication to start D-Bus since it can now make use of Syslog. And
then, when dbus-started is triggered,
NetworkManager is started, since it may now use
D-Bus, and so on.

One could say that this way the actual logical dependency tree
that exists and is understood by the admin or developer is
translated and encoded into event and action rules: every logical
“a needs b” rule that the administrator/developer is aware of
becomes a “start a when b is started” plus “stop a when b is
stopped”. In some way this certainly is a simplification:
especially for the code in Upstart itself. However I would argue
that this simplification is actually detrimental. First of all,
the logical dependency system does not go away, the person who is
writing Upstart files must now translate the dependencies manually
into these event/action rules (actually, two rules for each
dependency). So, instead of letting the computer figure out what
to do based on the dependencies, the user has to manually
translate the dependencies into simple event/action rules. Also,
because the dependency information has never been encoded it is
not available at runtime, effectively meaning that an
administrator who tries to figure our why something
happened, i.e. why a is started when b is started, has no chance
of finding that out.

Furthermore, the event logic turns around all dependencies,
from the feet onto their head. Instead of minimizing the
amount of work (which is something that a good init system should
focus on, as pointed out in the beginning of this blog story), it
actually maximizes the amount of work to do during
operations. Or in other words, instead of having a clear goal and
only doing the things it really needs to do to reach the goal, it
does one step, and then after finishing it, it does all
steps that possibly could follow it.

Or to put it simpler: the fact that the user just started D-Bus
is in no way an indication that NetworkManager should be started
too (but this is what Upstart would do). It’s right the other way
round: when the user asks for NetworkManager, that is definitely
an indication that D-Bus should be started too (which is certainly
what most users would expect, right?).

A good init system should start only what is needed, and that
on-demand. Either lazily or parallelized and in advance. However
it should not start more than necessary, particularly not
everything installed that could use that service.

Finally, I fail to see the actual usefulness of the event
logic. It appears to me that most events that are exposed in
Upstart actually are not punctual in nature, but have duration: a
service starts, is running, and stops. A device is plugged in, is
available, and is plugged out again. A mount point is in the
process of being mounted, is fully mounted, or is being
unmounted. A power plug is plugged in, the system runs on AC, and
the power plug is pulled. Only a minority of the events an init
system or process supervisor should handle are actually punctual,
most of them are tuples of start, condition, and stop. This
information is again not available in Upstart, because it focuses
in singular events, and ignores durable dependencies.

Now, I am aware that some of the issues I pointed out above are
in some way mitigated by certain more recent changes in Upstart,
particularly condition based syntaxes such as start on
(local-filesystems and net-device-up IFACE=lo)
in Upstart
rule files. However, to me this appears mostly as an attempt to
fix a system whose core design is flawed.

Besides that Upstart does OK for babysitting daemons, even though
some choices might be questionable (see above), and there are certainly a lot
of missed opportunities (see above, too).

There are other init systems besides sysvinit, Upstart and
launchd. Most of them offer little substantial more than Upstart or
sysvinit. The most interesting other contender is Solaris SMF,
which supports proper dependencies between services. However, in
many ways it is overly complex and, let’s say, a bit academic
with its excessive use of XML and new terminology for known
things. It is also closely bound to Solaris specific features such
as the contract system.

Putting it All Together: systemd

Well, this is another good time for a little pause, because
after I have hopefully explained above what I think a good PID 1
should be doing and what the current most used system does, we’ll
now come to where the beef is. So, go and refill you coffee mug
again. It’s going to be worth it.

You probably guessed it: what I suggested above as requirements
and features for an ideal init system is actually available now,
in a (still experimental) init system called systemd, and
which I hereby want to announce. Again, here’s the
code.
And here’s a quick rundown of its features, and the
rationale behind them:

systemd starts up and supervises the entire system (hence the
name…). It implements all of the features pointed out above and
a few more. It is based around the notion of units. Units
have a name and a type. Since their configuration is usually
loaded directly from the file system, these unit names are
actually file names. Example: a unit avahi.service is
read from a configuration file by the same name, and of course
could be a unit encapsulating the Avahi daemon. There are several
kinds of units:

  1. service: these are the most obvious kind of unit:
    daemons that can be started, stopped, restarted, reloaded. For
    compatibility with SysV we not only support our own
    configuration files for services, but also are able to read
    classic SysV init scripts, in particular we parse the LSB
    header, if it exists. /etc/init.d is hence not much
    more than just another source of configuration.
  2. socket: this unit encapsulates a socket in the
    file-system or on the Internet. We currently support AF_INET,
    AF_INET6, AF_UNIX sockets of the types stream, datagram, and
    sequential packet. We also support classic FIFOs as
    transport. Each socket unit has a matching
    service unit, that is started if the first connection
    comes in on the socket or FIFO. Example: nscd.socket
    starts nscd.service on an incoming connection.
  3. device: this unit encapsulates a device in the
    Linux device tree. If a device is marked for this via udev
    rules, it will be exposed as a device unit in
    systemd. Properties set with udev can be used as
    configuration source to set dependencies for device units.
  4. mount: this unit encapsulates a mount point in the
    file system hierarchy. systemd monitors all mount points how
    they come and go, and can also be used to mount or
    unmount mount-points. /etc/fstab is used here as an
    additional configuration source for these mount points, similar to
    how SysV init scripts can be used as additional configuration
    source for service units.
  5. automount: this unit type encapsulates an automount
    point in the file system hierarchy. Each automount
    unit has a matching mount unit, which is started
    (i.e. mounted) as soon as the automount directory is
    accessed.
  6. target: this unit type is used for logical
    grouping of units: instead of actually doing anything by itself
    it simply references other units, which thereby can be controlled
    together. Examples for this are: multi-user.target,
    which is a target that basically plays the role of run-level 5 on
    classic SysV system, or bluetooth.target which is
    requested as soon as a bluetooth dongle becomes available and
    which simply pulls in bluetooth related services that otherwise
    would not need to be started: bluetoothd and
    obexd and suchlike.
  7. snapshot: similar to target units
    snapshots do not actually do anything themselves and their only
    purpose is to reference other units. Snapshots can be used to
    save/rollback the state of all services and units of the init
    system. Primarily it has two intended use cases: to allow the
    user to temporarily enter a specific state such as “Emergency
    Shell”, terminating current services, and provide an easy way to
    return to the state before, pulling up all services again that
    got temporarily pulled down. And to ease support for system
    suspending: still many services cannot correctly deal with
    system suspend, and it is often a better idea to shut them down
    before suspend, and restore them afterwards.

All these units can have dependencies between each other (both
positive and negative, i.e. ‘Requires’ and ‘Conflicts’): a device
can have a dependency on a service, meaning that as soon as a
device becomes available a certain service is started. Mounts get
an implicit dependency on the device they are mounted from. Mounts
also gets implicit dependencies to mounts that are their prefixes
(i.e. a mount /home/lennart implicitly gets a dependency
added to the mount for /home) and so on.

A short list of other features:

  1. For each process that is spawned, you may control: the
    environment, resource limits, working and root directory, umask,
    OOM killer adjustment, nice level, IO class and priority, CPU policy
    and priority, CPU affinity, timer slack, user id, group id,
    supplementary group ids, readable/writable/inaccessible
    directories, shared/private/slave mount flags,
    capabilities/bounding set, secure bits, CPU scheduler reset of
    fork, private /tmp name-space, cgroup control for
    various subsystems. Also, you can easily connect
    stdin/stdout/stderr of services to syslog, /dev/kmsg,
    arbitrary TTYs. If connected to a TTY for input systemd will make
    sure a process gets exclusive access, optionally waiting or enforcing
    it.
  2. Every executed process gets its own cgroup (currently by
    default in the debug subsystem, since that subsystem is not
    otherwise used and does not much more than the most basic
    process grouping), and it is very easy to configure systemd to
    place services in cgroups that have been configured externally,
    for example via the libcgroups utilities.
  3. The native configuration files use a syntax that closely
    follows the well-known .desktop files. It is a simple syntax for
    which parsers exist already in many software frameworks. Also, this
    allows us to rely on existing tools for i18n for service
    descriptions, and similar. Administrators and developers don’t
    need to learn a new syntax.
  4. As mentioned, we provide compatibility with SysV init
    scripts. We take advantages of LSB and Red Hat chkconfig headers
    if they are available. If they aren’t we try to make the best of
    the otherwise available information, such as the start
    priorities in /etc/rc.d. These init scripts are simply
    considered a different source of configuration, hence an easy
    upgrade path to proper systemd services is available. Optionally
    we can read classic PID files for services to identify the main
    pid of a daemon. Note that we make use of the dependency
    information from the LSB init script headers, and translate
    those into native systemd dependencies. Side note: Upstart is
    unable to harvest and make use of that information. Boot-up on a
    plain Upstart system with mostly LSB SysV init scripts will
    hence not be parallelized, a similar system running systemd
    however will. In fact, for Upstart all SysV scripts together
    make one job that is executed, they are not treated
    individually, again in contrast to systemd where SysV init
    scripts are just another source of configuration and are all
    treated and controlled individually, much like any other native
    systemd service.
  5. Similarly, we read the existing /etc/fstab
    configuration file, and consider it just another source of
    configuration. Using the comment= fstab option you can
    even mark /etc/fstab entries to become systemd
    controlled automount points.
  6. If the same unit is configured in multiple configuration
    sources (e.g. /etc/systemd/system/avahi.service exists,
    and /etc/init.d/avahi too), then the native
    configuration will always take precedence, the legacy format is
    ignored, allowing an easy upgrade path and packages to carry
    both a SysV init script and a systemd service file for a
    while.
  7. We support a simple templating/instance mechanism. Example:
    instead of having six configuration files for six gettys, we
    only have one [email protected] file which gets instantiated to
    [email protected] and suchlike. The interface part can
    even be inherited by dependency expressions, i.e. it is easy to
    encode that a service [email protected] pulls in
    [email protected], while leaving the
    eth0 string wild-carded.
  8. For socket activation we support full compatibility with the
    traditional inetd modes, as well as a very simple mode that
    tries to mimic launchd socket activation and is recommended for
    new services. The inetd mode only allows passing one socket to
    the started daemon, while the native mode supports passing
    arbitrary numbers of file descriptors. We also support one
    instance per connection, as well as one instance for all
    connections modes. In the former mode we name the cgroup the
    daemon will be started in after the connection parameters, and
    utilize the templating logic mentioned above for this. Example:
    sshd.socket might spawn services
    [email protected] with a
    cgroup of [email protected]/192.168.0.1-4711-192.168.0.2-22
    (i.e. the IP address and port numbers are used in the instance
    names. For AF_UNIX sockets we use PID and user id of the
    connecting client). This provides a nice way for the
    administrator to identify the various instances of a daemon and
    control their runtime individually. The native socket passing
    mode is very easily implementable in applications: if
    $LISTEN_FDS is set it contains the number of sockets
    passed and the daemon will find them sorted as listed in the
    .service file, starting from file descriptor 3 (a
    nicely written daemon could also use fstat() and
    getsockname() to identify the sockets in case it
    receives more than one). In addition we set $LISTEN_PID
    to the PID of the daemon that shall receive the fds, because
    environment variables are normally inherited by sub-processes and
    hence could confuse processes further down the chain. Even
    though this socket passing logic is very simple to implement in
    daemons, we will provide a BSD-licensed reference implementation
    that shows how to do this. We have ported a couple of existing
    daemons to this new scheme.
  9. We provide compatibility with /dev/initctl to a
    certain extent. This compatibility is in fact implemented with a
    FIFO-activated service, which simply translates these legacy
    requests to D-Bus requests. Effectively this means the old
    shutdown, poweroff and similar commands from
    Upstart and sysvinit continue to work with
    systemd.
  10. We also provide compatibility with utmp and
    wtmp. Possibly even to an extent that is far more
    than healthy, given how crufty utmp and wtmp
    are.
  11. systemd supports several kinds of
    dependencies between units. After/Before can be used to fix
    the ordering how units are activated. It is completely
    orthogonal to Requires and Wants, which
    express a positive requirement dependency, either mandatory, or
    optional. Then, there is Conflicts which
    expresses a negative requirement dependency. Finally, there are
    three further, less used dependency types.
  12. systemd has a minimal transaction system. Meaning: if a unit
    is requested to start up or shut down we will add it and all its
    dependencies to a temporary transaction. Then, we will
    verify if the transaction is consistent (i.e. whether the
    ordering via After/Before of all units is
    cycle-free). If it is not, systemd will try to fix it up, and
    removes non-essential jobs from the transaction that might
    remove the loop. Also, systemd tries to suppress non-essential
    jobs in the transaction that would stop a running
    service. Non-essential jobs are those which the original request
    did not directly include but which where pulled in by
    Wants type of dependencies. Finally we check whether
    the jobs of the transaction contradict jobs that have already
    been queued, and optionally the transaction is aborted then. If
    all worked out and the transaction is consistent and minimized
    in its impact it is merged with all already outstanding jobs and
    added to the run queue. Effectively this means that before
    executing a requested operation, we will verify that it makes
    sense, fixing it if possible, and only failing if it really cannot
    work.
  13. We record start/exit time as well as the PID and exit status
    of every process we spawn and supervise. This data can be used
    to cross-link daemons with their data in abrtd, auditd and
    syslog. Think of an UI that will highlight crashed daemons for
    you, and allows you to easily navigate to the respective UIs for
    syslog, abrt, and auditd that will show the data generated from
    and for this daemon on a specific run.
  14. We support reexecution of the init process itself at any
    time. The daemon state is serialized before the reexecution and
    deserialized afterwards. That way we provide a simple way to
    facilitate init system upgrades as well as handover from an
    initrd daemon to the final daemon. Open sockets and autofs
    mounts are properly serialized away, so that they stay
    connectible all the time, in a way that clients will not even
    notice that the init system reexecuted itself. Also, the fact
    that a big part of the service state is encoded anyway in the
    cgroup virtual file system would even allow us to resume
    execution without access to the serialization data. The
    reexecution code paths are actually mostly the same as the init
    system configuration reloading code paths, which
    guarantees that reexecution (which is probably more seldom
    triggered) gets similar testing as reloading (which is probably
    more common).
  15. Starting the work of removing shell scripts from the boot
    process we have recoded part of the basic system setup in C and
    moved it directly into systemd. Among that is mounting of the API
    file systems (i.e. virtual file systems such as /proc,
    /sys and /dev.) and setting of the
    host-name.
  16. Server state is introspectable and controllable via
    D-Bus. This is not complete yet but quite extensive.
  17. While we want to emphasize socket-based and bus-name-based
    activation, and we hence support dependencies between sockets and
    services, we also support traditional inter-service
    dependencies. We support multiple ways how such a service can
    signal its readiness: by forking and having the start process
    exit (i.e. traditional daemonize() behaviour), as well
    as by watching the bus until a configured service name appears.
  18. There’s an interactive mode which asks for confirmation each
    time a process is spawned by systemd. You may enable it by
    passing systemd.confirm_spawn=1 on the kernel command
    line.
  19. With the systemd.default= kernel command line
    parameter you can specify which unit systemd should start on
    boot-up. Normally you’d specify something like
    multi-user.target here, but another choice could even
    be a single service instead of a target, for example
    out-of-the-box we ship a service emergency.service that
    is similar in its usefulness as init=/bin/bash, however
    has the advantage of actually running the init system, hence
    offering the option to boot up the full system from the
    emergency shell.
  20. There’s a minimal UI that allows you to
    start/stop/introspect services. It’s far from complete but
    useful as a debugging tool. It’s written in Vala (yay!) and goes
    by the name of systemadm.

It should be noted that systemd uses many Linux-specific
features, and does not limit itself to POSIX. That unlocks a lot
of functionality a system that is designed for portability to
other operating systems cannot provide.

Status

All the features listed above are already implemented. Right
now systemd can already be used as a drop-in replacement for
Upstart and sysvinit (at least as long as there aren’t too many
native upstart services yet. Thankfully most distributions don’t
carry too many native Upstart services yet.)

However, testing has been minimal, our version number is
currently at an impressive 0. Expect breakage if you run this in
its current state. That said, overall it should be quite stable
and some of us already boot their normal development systems with
systemd (in contrast to VMs only). YMMV, especially if you try
this on distributions we developers don’t use.

Where is This Going?

The feature set described above is certainly already
comprehensive. However, we have a few more things on our plate. I
don’t really like speaking too much about big plans but here’s a
short overview in which direction we will be pushing this:

We want to add at least two more unit types: swap
shall be used to control swap devices the same way we
already control mounts, i.e. with automatic dependencies on the
device tree devices they are activated from, and
suchlike. timer shall provide functionality similar to
cron, i.e. starts services based on time events, the
focus being both monotonic clock and wall-clock/calendar
events. (i.e. “start this 5h after it last ran” as well as “start
this every monday 5 am”)

More importantly however, it is also our plan to experiment with
systemd not only for optimizing boot times, but also to make it
the ideal session manager, to replace (or possibly just augment)
gnome-session, kdeinit and similar daemons. The problem set of a
session manager and an init system are very similar: quick start-up
is essential and babysitting processes the focus. Using the same
code for both uses hence suggests itself. Apple recognized that
and does just that with launchd. And so should we: socket and bus
based activation and parallelization is something session services
and system services can benefit from equally.

I should probably note that all three of these features are
already partially available in the current code base, but not
complete yet. For example, already, you can run systemd just fine
as a normal user, and it will detect that is run that way and
support for this mode has been available since the very beginning,
and is in the very core. (It is also exceptionally useful for
debugging! This works fine even without having the system
otherwise converted to systemd for booting.)

However, there are some things we probably should fix in the
kernel and elsewhere before finishing work on this: we
need swap status change notifications from the kernel similar to
how we can already subscribe to mount changes; we want a
notification when CLOCK_REALTIME jumps relative to
CLOCK_MONOTONIC; we want to allow normal processes to get
some init-like powers
; we need a well-defined
place where we can put user sockets
. None of these issues are
really essential for systemd, but they’d certainly improve
things.

You Want to See This in Action?

Currently, there are no tarball releases, but it should be
straightforward to check out the code from our
repository
. In addition, to have something to start with, here’s
a tarball with unit configuration files
that allows an
otherwise unmodified Fedora 13 system to work with systemd. We
have no RPMs to offer you for now.

An easier way is to download this Fedora 13 qemu image, which
has been prepared for systemd. In the grub menu you can select
whether you want to boot the system with Upstart or systemd. Note
that this system is minimally modified only. Service information
is read exclusively from the existing SysV init scripts. Hence it
will not take advantage of the full socket and bus-based
parallelization pointed out above, however it will interpret the
parallelization hints from the LSB headers, and hence boots faster
than the Upstart system, which in Fedora does not employ any
parallelization at the moment. The image is configured to output
debug information on the serial console, as well as writing it to
the kernel log buffer (which you may access with dmesg).
You might want to run qemu configured with a virtual
serial terminal. All passwords are set to systemd.

Even simpler than downloading and booting the qemu image is
looking at pretty screen-shots. Since an init system usually is
well hidden beneath the user interface, some shots of
systemadm and ps must do:

systemadm

That’s systemadm showing all loaded units, with more detailed
information on one of the getty instances.

ps

That’s an excerpt of the output of ps xaf -eo
pid,user,args,cgroup
showing how neatly the processes are
sorted into the cgroup of their service. (The fourth column is the
cgroup, the debug: prefix is shown because we use the
debug cgroup controller for systemd, as mentioned earlier. This is
only temporary.)

Note that both of these screenshots show an only minimally
modified Fedora 13 Live CD installation, where services are
exclusively loaded from the existing SysV init scripts. Hence,
this does not use socket or bus activation for any existing
service.

Sorry, no bootcharts or hard data on start-up times for the
moment. We’ll publish that as soon as we have fully parallelized
all services from the default Fedora install. Then, we’ll welcome
you to benchmark the systemd approach, and provide our own
benchmark data as well.

Well, presumably everybody will keep bugging me about this, so
here are two numbers I’ll tell you. However, they are completely
unscientific as they are measured for a VM (single CPU) and by
using the stop timer in my watch. Fedora 13 booting up with
Upstart takes 27s, with systemd we reach 24s (from grub to gdm,
same system, same settings, shorter value of two bootups, one
immediately following the other). Note however that this shows
nothing more than the speedup effect reached by using the LSB
dependency information parsed from the init script headers for
parallelization. Socket or bus based activation was not utilized
for this, and hence these numbers are unsuitable to assess the
ideas pointed out above. Also, systemd was set to debug verbosity
levels on a serial console. So again, this benchmark data has
barely any value.

Writing Daemons

An ideal daemon for use with systemd does a few things
differently then things were traditionally done. Later on, we will
publish a longer guide explaining and suggesting how to write a daemon for use
with this systemd. Basically, things get simpler for daemon
developers:

  • We ask daemon writers not to fork or even double fork
    in their processes, but run their event loop from the initial process
    systemd starts for you. Also, don’t call setsid().
  • Don’t drop user privileges in the daemon itself, leave this
    to systemd and configure it in systemd service configuration
    files. (There are exceptions here. For example, for some daemons
    there are good reasons to drop privileges inside the daemon
    code, after an initialization phase that requires elevated
    privileges.)
  • Don’t write PID files
  • Grab a name on the bus
  • You may rely on systemd for logging, you are welcome to log
    whatever you need to log to stderr.
  • Let systemd create and watch sockets for you, so that socket
    activation works. Hence, interpret $LISTEN_FDS and
    $LISTEN_PID as described above.
  • Use SIGTERM for requesting shut downs from your daemon.

The list above is very similar to what Apple
recommends for daemons compatible with launchd
. It should be
easy to extend daemons that already support launchd
activation to support systemd activation as well.

Note that systemd supports daemons not written in this style
perfectly as well, already for compatibility reasons (launchd has
only limited support for that). As mentioned, this even extends to
existing inetd capable daemons which can be used unmodified for
socket activation by systemd.

So, yes, should systemd prove itself in our experiments and get
adopted by the distributions it would make sense to port at least
those services that are started by default to use socket or
bus-based activation. We have
written proof-of-concept patches
, and the porting turned out
to be very easy. Also, we can leverage the work that has already
been done for launchd, to a certain extent. Moreover, adding
support for socket-based activation does not make the service
incompatible with non-systemd systems.

FAQs

Who’s behind this?
Well, the current code-base is mostly my work, Lennart
Poettering (Red Hat). However the design in all its details is
result of close cooperation between Kay Sievers (Novell) and
me. Other people involved are Harald Hoyer (Red Hat), Dhaval
Giani (Formerly IBM), and a few others from various
companies such as Intel, SUSE and Nokia.
Is this a Red Hat project?
No, this is my personal side project. Also, let me emphasize
this: the opinions reflected here are my own. They are not
the views of my employer, or Ronald McDonald, or anyone
else.
Will this come to Fedora?
If our experiments prove that this approach works out, and
discussions in the Fedora community show support for this, then
yes, we’ll certainly try to get this into Fedora.
Will this come to OpenSUSE?
Kay’s pursuing that, so something similar as for Fedora applies here, too.
Will this come to Debian/Gentoo/Mandriva/MeeGo/Ubuntu/[insert your favourite distro here]?
That’s up to them. We’d certainly welcome their interest, and help with the integration.
Why didn’t you just add this to Upstart, why did you invent something new?
Well, the point of the part about Upstart above was to show
that the core design of Upstart is flawed, in our
opinion. Starting completely from scratch suggests itself if the
existing solution appears flawed in its core. However, note that
we took a lot of inspiration from Upstart’s code-base
otherwise.
If you love Apple launchd so much, why not adopt that?
launchd is a great invention, but I am not convinced that it
would fit well into Linux, nor that it is suitable for a system
like Linux with its immense scalability and flexibility to
numerous purposes and uses.
Is this an NIH project?
Well, I hope that I managed to explain in the text above why
we came up with something new, instead of building on Upstart or
launchd. We came up with systemd due to technical
reasons, not political reasons.
Don’t forget that it is Upstart that includes
a library called NIH
(which is kind of a reimplementation of glib) — not systemd!
Will this run on [insert non-Linux OS here]?
Unlikely. As pointed out, systemd uses many Linux specific
APIs (such as epoll, signalfd, libudev, cgroups, and numerous
more), a port to other operating systems appears to us as not
making a lot of sense. Also, we, the people involved are
unlikely to be interested in merging possible ports to other
platforms and work with the constraints this introduces. That said,
git supports branches and rebasing quite well, in case
people really want to do a port.
Actually portability is even more limited than just to other OSes: we require a very
recent Linux kernel, glibc, libcgroup and libudev. No support for
less-than-current Linux systems, sorry.
If folks want to implement something similar for other
operating systems, the preferred mode of cooperation is probably
that we help you identify which interfaces can be shared with
your system, to make life easier for daemon writers to support
both systemd and your systemd counterpart. Probably, the focus should be
to share interfaces, not code.
I hear [fill one in here: the Gentoo boot system, initng,
Solaris SMF, runit, uxlaunch, …] is an awesome init system and
also does parallel boot-up, so why not adopt that?
Well, before we started this we actually had a very close
look at the various systems, and none of them did what we had in
mind for systemd (with the exception of launchd, of course). If
you cannot see that, then please read again what I wrote
above.

Contributions

We are very interested in patches and help. It should be common
sense that every Free Software project can only benefit from the
widest possible external contributions. That is particularly true
for a core part of the OS, such as an init system. We value your
contributions and hence do not require copyright assignment (Very
much unlike Canonical/Upstart
!). And also, we use git,
everybody’s favourite VCS, yay!

We are particularly interested in help getting systemd to work
on other distributions, besides Fedora and OpenSUSE. (Hey, anybody
from Debian, Gentoo, Mandriva, MeeGo looking for something to do?)
But even beyond that we are keen to attract contributors on every
level: we welcome C hackers, packagers, as well as folks who are interested
to write documentation, or contribute a logo.

Community

At this time we only have source code
repository
and an IRC channel (#systemd on
Freenode). There’s no mailing list, web site or bug tracking
system. We’ll probably set something up on freedesktop.org
soon. If you have any questions or want to contact us otherwise we
invite you to join us on IRC!

Update: our GIT repository has moved.

Rethinking PID 1

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/systemd.html

If you are well connected or good at reading between the lines
you might already know what this blog post is about. But even then
you may find this story interesting. So grab a cup of coffee,
sit down, and read what’s coming.

This blog story is long, so even though I can only recommend
reading the long story, here’s the one sentence summary: we are
experimenting with a new init system and it is fun.

Here’s the code. And here’s the story:

Process Identifier 1

On every Unix system there is one process with the special
process identifier 1. It is started by the kernel before all other
processes and is the parent process for all those other processes
that have nobody else to be child of. Due to that it can do a lot
of stuff that other processes cannot do. And it is also
responsible for some things that other processes are not
responsible for, such as bringing up and maintaining userspace
during boot.

Historically on Linux the software acting as PID 1 was the
venerable sysvinit package, though it had been showing its age for
quite a while. Many replacements have been suggested, only one of
them really took off: Upstart, which has by now found
its way into all major distributions.

As mentioned, the central responsibility of an init system is
to bring up userspace. And a good init system does that
fast. Unfortunately, the traditional SysV init system was not
particularly fast.

For a fast and efficient boot-up two things are crucial:

To start less.

And to start more in parallel.

What does that mean? Starting less means starting fewer
services or deferring the starting of services until they are
actually needed. There are some services where we know that they
will be required sooner or later (syslog, D-Bus system bus, etc.),
but for many others this isn’t the case. For example, bluetoothd
does not need to be running unless a bluetooth dongle is actually
plugged in or an application wants to talk to its D-Bus
interfaces. Same for a printing system: unless the machine
physically is connected to a printer, or an application wants to
print something, there is no need to run a printing daemon such as
CUPS. Avahi: if the machine is not connected to a
network, there is no need to run Avahi, unless some application wants
to use its APIs. And even SSH: as long as nobody wants to contact
your machine there is no need to run it, as long as it is then
started on the first connection. (And admit it, on most machines
where sshd might be listening somebody connects to it only every
other month or so.)

Starting more in parallel means that if we have
to run something, we should not serialize its start-up (as sysvinit
does), but run it all at the same time, so that the available
CPU and disk IO bandwidth is maxed out, and hence
the overall start-up time minimized.

Hardware and Software Change Dynamically

Modern systems (especially general purpose OS) are highly
dynamic in their configuration and use: they are mobile, different
applications are started and stopped, different hardware added and
removed again. An init system that is responsible for maintaining
services needs to listen to hardware and software
changes. It needs to dynamically start (and sometimes stop)
services as they are needed to run a program or enable some
hardware.

Most current systems that try to parallelize boot-up still
synchronize the start-up of the various daemons involved: since
Avahi needs D-Bus, D-Bus is started first, and only when D-Bus
signals that it is ready, Avahi is started too. Similar for other
services: livirtd and X11 need HAL (well, I am considering the
Fedora 13 services here, ignore that HAL is obsolete), hence HAL
is started first, before livirtd and X11 are started. And
libvirtd also needs Avahi, so it waits for Avahi too. And all of
them require syslog, so they all wait until Syslog is fully
started up and initialized. And so on.

Parallelizing Socket Services

This kind of start-up synchronization results in the
serialization of a significant part of the boot process. Wouldn’t
it be great if we could get rid of the synchronization and
serialization cost? Well, we can, actually. For that, we need to
understand what exactly the daemons require from each other, and
why their start-up is delayed. For traditional Unix daemons,
there’s one answer to it: they wait until the socket the other
daemon offers its services on is ready for connections. Usually
that is an AF_UNIX socket in the file-system, but it could be
AF_INET[6], too. For example, clients of D-Bus wait that
/var/run/dbus/system_bus_socket can be connected to,
clients of syslog wait for /dev/log, clients of CUPS wait
for /var/run/cups/cups.sock and NFS mounts wait for
/var/run/rpcbind.sock and the portmapper IP port, and so
on. And think about it, this is actually the only thing they wait
for!

Now, if that’s all they are waiting for, if we manage to make
those sockets available for connection earlier and only actually
wait for that instead of the full daemon start-up, then we can
speed up the entire boot and start more processes in parallel. So,
how can we do that? Actually quite easily in Unix-like systems: we
can create the listening sockets before we actually start
the daemon, and then just pass the socket during exec()
to it. That way, we can create all sockets for all
daemons in one step in the init system, and then in a second step
run all daemons at once. If a service needs another, and it is not
fully started up, that’s completely OK: what will happen is that
the connection is queued in the providing service and the client
will potentially block on that single request. But only that one
client will block and only on that one request. Also, dependencies
between services will no longer necessarily have to be configured
to allow proper parallelized start-up: if we start all sockets at
once and a service needs another it can be sure that it can
connect to its socket.

Because this is at the core of what is following, let me say
this again, with different words and by example: if you start
syslog and and various syslog clients at the same time, what will
happen in the scheme pointed out above is that the messages of the
clients will be added to the /dev/log socket buffer. As
long as that buffer doesn’t run full, the clients will not have to
wait in any way and can immediately proceed with their start-up. As
soon as syslog itself finished start-up, it will dequeue all
messages and process them. Another example: we start D-Bus and
several clients at the same time. If a synchronous bus
request is sent and hence a reply expected, what will happen is
that the client will have to block, however only that one client
and only until D-Bus managed to catch up and process it.

Basically, the kernel socket buffers help us to maximize
parallelization, and the ordering and synchronization is done by
the kernel, without any further management from userspace! And if
all the sockets are available before the daemons actually start-up,
dependency management also becomes redundant (or at least
secondary): if a daemon needs another daemon, it will just connect
to it. If the other daemon is already started, this will
immediately succeed. If it isn’t started but in the process of
being started, the first daemon will not even have to wait for it,
unless it issues a synchronous request. And even if the other
daemon is not running at all, it can be auto-spawned. From the
first daemon’s perspective there is no difference, hence dependency
management becomes mostly unnecessary or at least secondary, and
all of this in optimal parallelization and optionally with
on-demand loading. On top of this, this is also more robust, because
the sockets stay available regardless whether the actual daemons
might temporarily become unavailable (maybe due to crashing). In
fact, you can easily write a daemon with this that can run, and
exit (or crash), and run again and exit again (and so on), and all
of that without the clients noticing or loosing any request.

It’s a good time for a pause, go and refill your coffee mug,
and be assured, there is more interesting stuff following.

But first, let’s clear a few things up: is this kind of logic
new? No, it certainly is not. The most prominent system that works
like this is Apple’s launchd system: on MacOS the listening of the
sockets is pulled out of all daemons and done by launchd. The
services themselves hence can all start up in parallel and
dependencies need not to be configured for them. And that is
actually a really ingenious design, and the primary reason why
MacOS manages to provide the fantastic boot-up times it
provides. I can highly recommend this
video
where the launchd folks explain what they are
doing. Unfortunately this idea never really took on outside of the Apple
camp.

The idea is actually even older than launchd. Prior to launchd
the venerable inetd worked much like this: sockets were
centrally created in a daemon that would start the actual service
daemons passing the socket file descriptors during
exec(). However the focus of inetd certainly
wasn’t local services, but Internet services (although later
reimplementations supported AF_UNIX sockets, too). It also wasn’t a
tool to parallelize boot-up or even useful for getting implicit
dependencies right.

For TCP sockets inetd was primarily used in a way that
for every incoming connection a new daemon instance was
spawned. That meant that for each connection a new
process was spawned and initialized, which is not a
recipe for high-performance servers. However, right from the
beginning inetd also supported another mode, where a
single daemon was spawned on the first connection, and that single
instance would then go on and also accept the follow-up connections
(that’s what the wait/nowait option in
inetd.conf was for, a particularly badly documented
option, unfortunately.) Per-connection daemon starts probably gave
inetd its bad reputation for being slow. But that’s not entirely
fair.

Parallelizing Bus Services

Modern daemons on Linux tend to provide services via D-Bus
instead of plain AF_UNIX sockets. Now, the question is, for those
services, can we apply the same parallelizing boot logic as for
traditional socket services? Yes, we can, D-Bus already has all
the right hooks for it: using bus activation a service can be
started the first time it is accessed. Bus activation also gives
us the minimal per-request synchronisation we need for starting up
the providers and the consumers of D-Bus services at the same
time: if we want to start Avahi at the same time as CUPS (side
note: CUPS uses Avahi to browse for mDNS/DNS-SD printers), then we
can simply run them at the same time, and if CUPS is quicker than
Avahi via the bus activation logic we can get D-Bus to queue the
request until Avahi manages to establish its service name.

So, in summary: the socket-based service activation and the
bus-based service activation together enable us to start
all daemons in parallel, without any further
synchronization. Activation also allows us to do lazy-loading of
services: if a service is rarely used, we can just load it the
first time somebody accesses the socket or bus name, instead of
starting it during boot.

And if that’s not great, then I don’t know what is
great!

Parallelizing File System Jobs

If you look at
the serialization graphs of the boot process
of current
distributions, there are more synchronisation points than just
daemon start-ups: most prominently there are file-system related
jobs: mounting, fscking, quota. Right now, on boot-up a lot of
time is spent idling to wait until all devices that are listed in
/etc/fstab show up in the device tree and are then
fsck’ed, mounted, quota checked (if enabled). Only after that is
fully finished we go on and boot the actual services.

Can we improve this? It turns out we can. Harald Hoyer came up
with the idea of using the venerable autofs system for this:

Just like a connect() call shows that a service is
interested in another service, an open() (or a similar
call) shows that a service is interested in a specific file or
file-system. So, in order to improve how much we can parallelize
we can make those apps wait only if a file-system they are looking
for is not yet mounted and readily available: we set up an autofs
mount point, and then when our file-system finished fsck and quota
due to normal boot-up we replace it by the real mount. While the
file-system is not ready yet, the access will be queued by the
kernel and the accessing process will block, but only that one
daemon and only that one access. And this way we can begin
starting our daemons even before all file systems have been fully
made available — without them missing any files, and maximizing
parallelization.

Parallelizing file system jobs and service jobs does
not make sense for /, after all that’s where the service
binaries are usually stored. However, for file-systems such as
/home, that usually are bigger, even encrypted, possibly
remote and seldom accessed by the usual boot-up daemons, this
can improve boot time considerably. It is probably not necessary
to mention this, but virtual file systems, such as
procfs or sysfs should never be mounted via autofs.

I wouldn’t be surprised if some readers might find integrating
autofs in an init system a bit fragile and even weird, and maybe
more on the “crackish” side of things. However, having played
around with this extensively I can tell you that this actually
feels quite right. Using autofs here simply means that we can
create a mount point without having to provide the backing file
system right-away. In effect it hence only delays accesses. If an
application tries to access an autofs file-system and we take very
long to replace it with the real file-system, it will hang in an
interruptible sleep, meaning that you can safely cancel it, for
example via C-c. Also note that at any point, if the mount point
should not be mountable in the end (maybe because fsck failed), we
can just tell autofs to return a clean error code (like
ENOENT). So, I guess what I want to say is that even though
integrating autofs into an init system might appear adventurous at
first, our experimental code has shown that this idea works
surprisingly well in practice — if it is done for the right
reasons and the right way.

Also note that these should be direct autofs
mounts, meaning that from an application perspective there’s
little effective difference between a classic mount point and one
based on autofs.

Keeping the First User PID Small

Another thing we can learn from the MacOS boot-up logic is
that shell scripts are evil. Shell is fast and shell is slow. It
is fast to hack, but slow in execution. The classic sysvinit boot
logic is modelled around shell scripts. Whether it is
/bin/bash or any other shell (that was written to make
shell scripts faster), in the end the approach is doomed to be
slow. On my system the scripts in /etc/init.d call
grep at least 77 times. awk is called 92
times, cut 23 and sed 74. Every time those
commands (and others) are called, a process is spawned, the
libraries searched, some start-up stuff like i18n and so on set up
and more. And then after seldom doing more than a trivial string
operation the process is terminated again. Of course, that has to
be incredibly slow. No other language but shell would do something like
that. On top of that, shell scripts are also very fragile, and
change their behaviour drastically based on environment variables
and suchlike, stuff that is hard to oversee and control.

So, let’s get rid of shell scripts in the boot process! Before
we can do that we need to figure out what they are currently
actually used for: well, the big picture is that most of the time,
what they do is actually quite boring. Most of the scripting is
spent on trivial setup and tear-down of services, and should be
rewritten in C, either in separate executables, or moved into the
daemons themselves, or simply be done in the init system.

It is not likely that we can get rid of shell scripts during
system boot-up entirely anytime soon. Rewriting them in C takes
time, in a few case does not really make sense, and sometimes
shell scripts are just too handy to do without. But we can
certainly make them less prominent.

A good metric for measuring shell script infestation of the
boot process is the PID number of the first process you can start
after the system is fully booted up. Boot up, log in, open a
terminal, and type echo $$. Try that on your Linux
system, and then compare the result with MacOS! (Hint, it’s
something like this: Linux PID 1823; MacOS PID 154, measured on
test systems we own.)

Keeping Track of Processes

A central part of a system that starts up and maintains
services should be process babysitting: it should watch
services. Restart them if they shut down. If they crash it should
collect information about them, and keep it around for the
administrator, and cross-link that information with what is
available from crash dump systems such as abrt, and in logging
systems like syslog or the audit system.

It should also be capable of shutting down a service
completely. That might sound easy, but is harder than you
think. Traditionally on Unix a process that does double-forking
can escape the supervision of its parent, and the old parent will
not learn about the relation of the new process to the one it
actually started. An example: currently, a misbehaving CGI script
that has double-forked is not terminated when you shut down
Apache. Furthermore, you will not even be able to figure out its
relation to Apache, unless you know it by name and purpose.

So, how can we keep track of processes, so that they cannot
escape the babysitter, and that we can control them as one unit
even if they fork a gazillion times?

Different people came up with different solutions for this. I
am not going into much detail here, but let’s at least say that
approaches based on ptrace or the netlink connector (a kernel
interface which allows you to get a netlink message each time any
process on the system fork()s or exit()s) that some people have
investigated and implemented, have been criticised as ugly and not
very scalable.

So what can we do about this? Well, since quite a while the
kernel knows Control
Groups
(aka “cgroups”). Basically they allow the creation of a
hierarchy of groups of processes. The hierarchy is directly
exposed in a virtual file-system, and hence easily accessible. The
group names are basically directory names in that file-system. If
a process belonging to a specific cgroup fork()s, its child will
become a member of the same group. Unless it is privileged and has
access to the cgroup file system it cannot escape its
group. Originally, cgroups have been introduced into the kernel
for the purpose of containers: certain kernel subsystems can
enforce limits on resources of certain groups, such as limiting
CPU or memory usage. Traditional resource limits (as implemented
by setrlimit()) are (mostly) per-process. cgroups on the
other hand let you enforce limits on entire groups of
processes. cgroups are also useful to enforce limits outside of
the immediate container use case. You can use it for example to
limit the total amount of memory or CPU Apache and all its
children may use. Then, a misbehaving CGI script can no longer
escape your setrlimit() resource control by simply
forking away.

In addition to container and resource limit enforcement cgroups
are very useful to keep track of daemons: cgroup membership is
securely inherited by child processes, they cannot escape. There’s
a notification system available so that a supervisor process can
be notified when a cgroup runs empty. You can find the cgroups of
a process by reading /proc/$PID/cgroup. cgroups hence
make a very good choice to keep track of processes for babysitting
purposes.

Controlling the Process Execution Environment

A good babysitter should not only oversee and control when a
daemon starts, ends or crashes, but also set up a good, minimal,
and secure working environment for it.

That means setting obvious process parameters such as the
setrlimit() resource limits, user/group IDs or the
environment block, but does not end there. The Linux kernel gives
users and administrators a lot of control over processes (some of
it is rarely used, currently). For each process you can set CPU
and IO scheduler controls, the capability bounding set, CPU
affinity or of course cgroup environments with additional limits,
and more.

As an example, ioprio_set() with
IOPRIO_CLASS_IDLE is a great away to minimize the effect
of locate’s updatedb on system interactivity.

On top of that certain high-level controls can be very useful,
such as setting up read-only file system overlays based on
read-only bind mounts. That way one can run certain daemons so
that all (or some) file systems appear read-only to them, so that
EROFS is returned on every write request. As such this can be used
to lock down what daemons can do similar in fashion to a poor
man’s SELinux policy system (but this certainly doesn’t replace
SELinux, don’t get any bad ideas, please).

Finally logging is an important part of executing services:
ideally every bit of output a service generates should be logged
away. An init system should hence provide logging to daemons it
spawns right from the beginning, and connect stdout and stderr to
syslog or in some cases even /dev/kmsg which in many
cases makes a very useful replacement for syslog (embedded folks,
listen up!), especially in times where the kernel log buffer is
configured ridiculously large out-of-the-box.

On Upstart

To begin with, let me emphasize that I actually like the code
of Upstart, it is very well commented and easy to
follow. It’s certainly something other projects should learn
from (including my own).

That being said, I can’t say I agree with the general approach
of Upstart. But first, a bit more about the project:

Upstart does not share code with sysvinit, and its
functionality is a super-set of it, and provides compatibility to
some degree with the well known SysV init scripts. It’s main
feature is its event-based approach: starting and stopping of
processes is bound to “events” happening in the system, where an
“event” can be a lot of different things, such as: a network
interfaces becomes available or some other software has been
started.

Upstart does service serialization via these events: if the
syslog-started event is triggered this is used as an
indication to start D-Bus since it can now make use of Syslog. And
then, when dbus-started is triggered,
NetworkManager is started, since it may now use
D-Bus, and so on.

One could say that this way the actual logical dependency tree
that exists and is understood by the admin or developer is
translated and encoded into event and action rules: every logical
“a needs b” rule that the administrator/developer is aware of
becomes a “start a when b is started” plus “stop a when b is
stopped”. In some way this certainly is a simplification:
especially for the code in Upstart itself. However I would argue
that this simplification is actually detrimental. First of all,
the logical dependency system does not go away, the person who is
writing Upstart files must now translate the dependencies manually
into these event/action rules (actually, two rules for each
dependency). So, instead of letting the computer figure out what
to do based on the dependencies, the user has to manually
translate the dependencies into simple event/action rules. Also,
because the dependency information has never been encoded it is
not available at runtime, effectively meaning that an
administrator who tries to figure our why something
happened, i.e. why a is started when b is started, has no chance
of finding that out.

Furthermore, the event logic turns around all dependencies,
from the feet onto their head. Instead of minimizing the
amount of work (which is something that a good init system should
focus on, as pointed out in the beginning of this blog story), it
actually maximizes the amount of work to do during
operations. Or in other words, instead of having a clear goal and
only doing the things it really needs to do to reach the goal, it
does one step, and then after finishing it, it does all
steps that possibly could follow it.

Or to put it simpler: the fact that the user just started D-Bus
is in no way an indication that NetworkManager should be started
too (but this is what Upstart would do). It’s right the other way
round: when the user asks for NetworkManager, that is definitely
an indication that D-Bus should be started too (which is certainly
what most users would expect, right?).

A good init system should start only what is needed, and that
on-demand. Either lazily or parallelized and in advance. However
it should not start more than necessary, particularly not
everything installed that could use that service.

Finally, I fail to see the actual usefulness of the event
logic. It appears to me that most events that are exposed in
Upstart actually are not punctual in nature, but have duration: a
service starts, is running, and stops. A device is plugged in, is
available, and is plugged out again. A mount point is in the
process of being mounted, is fully mounted, or is being
unmounted. A power plug is plugged in, the system runs on AC, and
the power plug is pulled. Only a minority of the events an init
system or process supervisor should handle are actually punctual,
most of them are tuples of start, condition, and stop. This
information is again not available in Upstart, because it focuses
in singular events, and ignores durable dependencies.

Now, I am aware that some of the issues I pointed out above are
in some way mitigated by certain more recent changes in Upstart,
particularly condition based syntaxes such as start on
(local-filesystems and net-device-up IFACE=lo) in Upstart
rule files. However, to me this appears mostly as an attempt to
fix a system whose core design is flawed.

Besides that Upstart does OK for babysitting daemons, even though
some choices might be questionable (see above), and there are certainly a lot
of missed opportunities (see above, too).

There are other init systems besides sysvinit, Upstart and
launchd. Most of them offer little substantial more than Upstart or
sysvinit. The most interesting other contender is Solaris SMF,
which supports proper dependencies between services. However, in
many ways it is overly complex and, let’s say, a bit academic
with its excessive use of XML and new terminology for known
things. It is also closely bound to Solaris specific features such
as the contract system.

Putting it All Together: systemd

Well, this is another good time for a little pause, because
after I have hopefully explained above what I think a good PID 1
should be doing and what the current most used system does, we’ll
now come to where the beef is. So, go and refill you coffee mug
again. It’s going to be worth it.

You probably guessed it: what I suggested above as requirements
and features for an ideal init system is actually available now,
in a (still experimental) init system called systemd, and
which I hereby want to announce. Again, here’s the
code.
And here’s a quick rundown of its features, and the
rationale behind them:

systemd starts up and supervises the entire system (hence the
name…). It implements all of the features pointed out above and
a few more. It is based around the notion of units. Units
have a name and a type. Since their configuration is usually
loaded directly from the file system, these unit names are
actually file names. Example: a unit avahi.service is
read from a configuration file by the same name, and of course
could be a unit encapsulating the Avahi daemon. There are several
kinds of units:

service: these are the most obvious kind of unit:
daemons that can be started, stopped, restarted, reloaded. For
compatibility with SysV we not only support our own
configuration files for services, but also are able to read
classic SysV init scripts, in particular we parse the LSB
header, if it exists. /etc/init.d is hence not much
more than just another source of configuration.

socket: this unit encapsulates a socket in the
file-system or on the Internet. We currently support AF_INET,
AF_INET6, AF_UNIX sockets of the types stream, datagram, and
sequential packet. We also support classic FIFOs as
transport. Each socket unit has a matching
service unit, that is started if the first connection
comes in on the socket or FIFO. Example: nscd.socket
starts nscd.service on an incoming connection.

device: this unit encapsulates a device in the
Linux device tree. If a device is marked for this via udev
rules, it will be exposed as a device unit in
systemd. Properties set with udev can be used as
configuration source to set dependencies for device units.

mount: this unit encapsulates a mount point in the
file system hierarchy. systemd monitors all mount points how
they come and go, and can also be used to mount or
unmount mount-points. /etc/fstab is used here as an
additional configuration source for these mount points, similar to
how SysV init scripts can be used as additional configuration
source for service units.

automount: this unit type encapsulates an automount
point in the file system hierarchy. Each automount
unit has a matching mount unit, which is started
(i.e. mounted) as soon as the automount directory is
accessed.

target: this unit type is used for logical
grouping of units: instead of actually doing anything by itself
it simply references other units, which thereby can be controlled
together. Examples for this are: multi-user.target,
which is a target that basically plays the role of run-level 5 on
classic SysV system, or bluetooth.target which is
requested as soon as a bluetooth dongle becomes available and
which simply pulls in bluetooth related services that otherwise
would not need to be started: bluetoothd and
obexd and suchlike.

snapshot: similar to target units
snapshots do not actually do anything themselves and their only
purpose is to reference other units. Snapshots can be used to
save/rollback the state of all services and units of the init
system. Primarily it has two intended use cases: to allow the
user to temporarily enter a specific state such as “Emergency
Shell”, terminating current services, and provide an easy way to
return to the state before, pulling up all services again that
got temporarily pulled down. And to ease support for system
suspending: still many services cannot correctly deal with
system suspend, and it is often a better idea to shut them down
before suspend, and restore them afterwards.

All these units can have dependencies between each other (both
positive and negative, i.e. ‘Requires’ and ‘Conflicts’): a device
can have a dependency on a service, meaning that as soon as a
device becomes available a certain service is started. Mounts get
an implicit dependency on the device they are mounted from. Mounts
also gets implicit dependencies to mounts that are their prefixes
(i.e. a mount /home/lennart implicitly gets a dependency
added to the mount for /home) and so on.

A short list of other features:

For each process that is spawned, you may control: the
environment, resource limits, working and root directory, umask,
OOM killer adjustment, nice level, IO class and priority, CPU policy
and priority, CPU affinity, timer slack, user id, group id,
supplementary group ids, readable/writable/inaccessible
directories, shared/private/slave mount flags,
capabilities/bounding set, secure bits, CPU scheduler reset of
fork, private /tmp name-space, cgroup control for
various subsystems. Also, you can easily connect
stdin/stdout/stderr of services to syslog, /dev/kmsg,
arbitrary TTYs. If connected to a TTY for input systemd will make
sure a process gets exclusive access, optionally waiting or enforcing
it.

Every executed process gets its own cgroup (currently by
default in the debug subsystem, since that subsystem is not
otherwise used and does not much more than the most basic
process grouping), and it is very easy to configure systemd to
place services in cgroups that have been configured externally,
for example via the libcgroups utilities.

The native configuration files use a syntax that closely
follows the well-known .desktop files. It is a simple syntax for
which parsers exist already in many software frameworks. Also, this
allows us to rely on existing tools for i18n for service
descriptions, and similar. Administrators and developers don’t
need to learn a new syntax.

As mentioned, we provide compatibility with SysV init
scripts. We take advantages of LSB and Red Hat chkconfig headers
if they are available. If they aren’t we try to make the best of
the otherwise available information, such as the start
priorities in /etc/rc.d. These init scripts are simply
considered a different source of configuration, hence an easy
upgrade path to proper systemd services is available. Optionally
we can read classic PID files for services to identify the main
pid of a daemon. Note that we make use of the dependency
information from the LSB init script headers, and translate
those into native systemd dependencies. Side note: Upstart is
unable to harvest and make use of that information. Boot-up on a
plain Upstart system with mostly LSB SysV init scripts will
hence not be parallelized, a similar system running systemd
however will. In fact, for Upstart all SysV scripts together
make one job that is executed, they are not treated
individually, again in contrast to systemd where SysV init
scripts are just another source of configuration and are all
treated and controlled individually, much like any other native
systemd service.

Similarly, we read the existing /etc/fstab
configuration file, and consider it just another source of
configuration. Using the comment= fstab option you can
even mark /etc/fstab entries to become systemd
controlled automount points.

If the same unit is configured in multiple configuration
sources (e.g. /etc/systemd/system/avahi.service exists,
and /etc/init.d/avahi too), then the native
configuration will always take precedence, the legacy format is
ignored, allowing an easy upgrade path and packages to carry
both a SysV init script and a systemd service file for a
while.

We support a simple templating/instance mechanism. Example:
instead of having six configuration files for six gettys, we
only have one [email protected] file which gets instantiated to
[email protected] and suchlike. The interface part can
even be inherited by dependency expressions, i.e. it is easy to
encode that a service [email protected] pulls in
[email protected], while leaving the
eth0 string wild-carded.

For socket activation we support full compatibility with the
traditional inetd modes, as well as a very simple mode that
tries to mimic launchd socket activation and is recommended for
new services. The inetd mode only allows passing one socket to
the started daemon, while the native mode supports passing
arbitrary numbers of file descriptors. We also support one
instance per connection, as well as one instance for all
connections modes. In the former mode we name the cgroup the
daemon will be started in after the connection parameters, and
utilize the templating logic mentioned above for this. Example:
sshd.socket might spawn services
[email protected] with a
cgroup of [email protected]/192.168.0.1-4711-192.168.0.2-22
(i.e. the IP address and port numbers are used in the instance
names. For AF_UNIX sockets we use PID and user id of the
connecting client). This provides a nice way for the
administrator to identify the various instances of a daemon and
control their runtime individually. The native socket passing
mode is very easily implementable in applications: if
$LISTEN_FDS is set it contains the number of sockets
passed and the daemon will find them sorted as listed in the
.service file, starting from file descriptor 3 (a
nicely written daemon could also use fstat() and
getsockname() to identify the sockets in case it
receives more than one). In addition we set $LISTEN_PID
to the PID of the daemon that shall receive the fds, because
environment variables are normally inherited by sub-processes and
hence could confuse processes further down the chain. Even
though this socket passing logic is very simple to implement in
daemons, we will provide a BSD-licensed reference implementation
that shows how to do this. We have ported a couple of existing
daemons to this new scheme.

We provide compatibility with /dev/initctl to a
certain extent. This compatibility is in fact implemented with a
FIFO-activated service, which simply translates these legacy
requests to D-Bus requests. Effectively this means the old
shutdown, poweroff and similar commands from
Upstart and sysvinit continue to work with
systemd.

We also provide compatibility with utmp and
wtmp. Possibly even to an extent that is far more
than healthy, given how crufty utmp and wtmp
are.

systemd supports several kinds of
dependencies between units. After/Before can be used to fix
the ordering how units are activated. It is completely
orthogonal to Requires and Wants, which
express a positive requirement dependency, either mandatory, or
optional. Then, there is Conflicts which
expresses a negative requirement dependency. Finally, there are
three further, less used dependency types.

systemd has a minimal transaction system. Meaning: if a unit
is requested to start up or shut down we will add it and all its
dependencies to a temporary transaction. Then, we will
verify if the transaction is consistent (i.e. whether the
ordering via After/Before of all units is
cycle-free). If it is not, systemd will try to fix it up, and
removes non-essential jobs from the transaction that might
remove the loop. Also, systemd tries to suppress non-essential
jobs in the transaction that would stop a running
service. Non-essential jobs are those which the original request
did not directly include but which where pulled in by
Wants type of dependencies. Finally we check whether
the jobs of the transaction contradict jobs that have already
been queued, and optionally the transaction is aborted then. If
all worked out and the transaction is consistent and minimized
in its impact it is merged with all already outstanding jobs and
added to the run queue. Effectively this means that before
executing a requested operation, we will verify that it makes
sense, fixing it if possible, and only failing if it really cannot
work.

We record start/exit time as well as the PID and exit status
of every process we spawn and supervise. This data can be used
to cross-link daemons with their data in abrtd, auditd and
syslog. Think of an UI that will highlight crashed daemons for
you, and allows you to easily navigate to the respective UIs for
syslog, abrt, and auditd that will show the data generated from
and for this daemon on a specific run.

We support reexecution of the init process itself at any
time. The daemon state is serialized before the reexecution and
deserialized afterwards. That way we provide a simple way to
facilitate init system upgrades as well as handover from an
initrd daemon to the final daemon. Open sockets and autofs
mounts are properly serialized away, so that they stay
connectible all the time, in a way that clients will not even
notice that the init system reexecuted itself. Also, the fact
that a big part of the service state is encoded anyway in the
cgroup virtual file system would even allow us to resume
execution without access to the serialization data. The
reexecution code paths are actually mostly the same as the init
system configuration reloading code paths, which
guarantees that reexecution (which is probably more seldom
triggered) gets similar testing as reloading (which is probably
more common).

Starting the work of removing shell scripts from the boot
process we have recoded part of the basic system setup in C and
moved it directly into systemd. Among that is mounting of the API
file systems (i.e. virtual file systems such as /proc,
/sys and /dev.) and setting of the
host-name.

Server state is introspectable and controllable via
D-Bus. This is not complete yet but quite extensive.

While we want to emphasize socket-based and bus-name-based
activation, and we hence support dependencies between sockets and
services, we also support traditional inter-service
dependencies. We support multiple ways how such a service can
signal its readiness: by forking and having the start process
exit (i.e. traditional daemonize() behaviour), as well
as by watching the bus until a configured service name appears.

There’s an interactive mode which asks for confirmation each
time a process is spawned by systemd. You may enable it by
passing systemd.confirm_spawn=1 on the kernel command
line.

With the systemd.default= kernel command line
parameter you can specify which unit systemd should start on
boot-up. Normally you’d specify something like
multi-user.target here, but another choice could even
be a single service instead of a target, for example
out-of-the-box we ship a service emergency.service that
is similar in its usefulness as init=/bin/bash, however
has the advantage of actually running the init system, hence
offering the option to boot up the full system from the
emergency shell.

There’s a minimal UI that allows you to
start/stop/introspect services. It’s far from complete but
useful as a debugging tool. It’s written in Vala (yay!) and goes
by the name of systemadm.

It should be noted that systemd uses many Linux-specific
features, and does not limit itself to POSIX. That unlocks a lot
of functionality a system that is designed for portability to
other operating systems cannot provide.

Status

All the features listed above are already implemented. Right
now systemd can already be used as a drop-in replacement for
Upstart and sysvinit (at least as long as there aren’t too many
native upstart services yet. Thankfully most distributions don’t
carry too many native Upstart services yet.)

However, testing has been minimal, our version number is
currently at an impressive 0. Expect breakage if you run this in
its current state. That said, overall it should be quite stable
and some of us already boot their normal development systems with
systemd (in contrast to VMs only). YMMV, especially if you try
this on distributions we developers don’t use.

Where is This Going?

The feature set described above is certainly already
comprehensive. However, we have a few more things on our plate. I
don’t really like speaking too much about big plans but here’s a
short overview in which direction we will be pushing this:

We want to add at least two more unit types: swap
shall be used to control swap devices the same way we
already control mounts, i.e. with automatic dependencies on the
device tree devices they are activated from, and
suchlike. timer shall provide functionality similar to
cron, i.e. starts services based on time events, the
focus being both monotonic clock and wall-clock/calendar
events. (i.e. “start this 5h after it last ran” as well as “start
this every monday 5 am”)

More importantly however, it is also our plan to experiment with
systemd not only for optimizing boot times, but also to make it
the ideal session manager, to replace (or possibly just augment)
gnome-session, kdeinit and similar daemons. The problem set of a
session manager and an init system are very similar: quick start-up
is essential and babysitting processes the focus. Using the same
code for both uses hence suggests itself. Apple recognized that
and does just that with launchd. And so should we: socket and bus
based activation and parallelization is something session services
and system services can benefit from equally.

I should probably note that all three of these features are
already partially available in the current code base, but not
complete yet. For example, already, you can run systemd just fine
as a normal user, and it will detect that is run that way and
support for this mode has been available since the very beginning,
and is in the very core. (It is also exceptionally useful for
debugging! This works fine even without having the system
otherwise converted to systemd for booting.)

However, there are some things we probably should fix in the
kernel and elsewhere before finishing work on this: we
need swap status change notifications from the kernel similar to
how we can already subscribe to mount changes; we want a
notification when CLOCK_REALTIME jumps relative to
CLOCK_MONOTONIC; we want to allow normal processes to get
some init-like powers
; we need a well-defined
place where we can put user sockets
. None of these issues are
really essential for systemd, but they’d certainly improve
things.

You Want to See This in Action?

Currently, there are no tarball releases, but it should be
straightforward to check out the code from our
repository
. In addition, to have something to start with, here’s
a tarball with unit configuration files
that allows an
otherwise unmodified Fedora 13 system to work with systemd. We
have no RPMs to offer you for now.

An easier way is to download this Fedora 13 qemu image, which
has been prepared for systemd. In the grub menu you can select
whether you want to boot the system with Upstart or systemd. Note
that this system is minimally modified only. Service information
is read exclusively from the existing SysV init scripts. Hence it
will not take advantage of the full socket and bus-based
parallelization pointed out above, however it will interpret the
parallelization hints from the LSB headers, and hence boots faster
than the Upstart system, which in Fedora does not employ any
parallelization at the moment. The image is configured to output
debug information on the serial console, as well as writing it to
the kernel log buffer (which you may access with dmesg).
You might want to run qemu configured with a virtual
serial terminal. All passwords are set to systemd.

Even simpler than downloading and booting the qemu image is
looking at pretty screen-shots. Since an init system usually is
well hidden beneath the user interface, some shots of
systemadm and ps must do:

systemadm

That’s systemadm showing all loaded units, with more detailed
information on one of the getty instances.

ps

That’s an excerpt of the output of ps xaf -eo
pid,user,args,cgroup showing how neatly the processes are
sorted into the cgroup of their service. (The fourth column is the
cgroup, the debug: prefix is shown because we use the
debug cgroup controller for systemd, as mentioned earlier. This is
only temporary.)

Note that both of these screenshots show an only minimally
modified Fedora 13 Live CD installation, where services are
exclusively loaded from the existing SysV init scripts. Hence,
this does not use socket or bus activation for any existing
service.

Sorry, no bootcharts or hard data on start-up times for the
moment. We’ll publish that as soon as we have fully parallelized
all services from the default Fedora install. Then, we’ll welcome
you to benchmark the systemd approach, and provide our own
benchmark data as well.

Well, presumably everybody will keep bugging me about this, so
here are two numbers I’ll tell you. However, they are completely
unscientific as they are measured for a VM (single CPU) and by
using the stop timer in my watch. Fedora 13 booting up with
Upstart takes 27s, with systemd we reach 24s (from grub to gdm,
same system, same settings, shorter value of two bootups, one
immediately following the other). Note however that this shows
nothing more than the speedup effect reached by using the LSB
dependency information parsed from the init script headers for
parallelization. Socket or bus based activation was not utilized
for this, and hence these numbers are unsuitable to assess the
ideas pointed out above. Also, systemd was set to debug verbosity
levels on a serial console. So again, this benchmark data has
barely any value.

Writing Daemons

An ideal daemon for use with systemd does a few things
differently then things were traditionally done. Later on, we will
publish a longer guide explaining and suggesting how to write a daemon for use
with this systemd. Basically, things get simpler for daemon
developers:

We ask daemon writers not to fork or even double fork
in their processes, but run their event loop from the initial process
systemd starts for you. Also, don’t call setsid().

Don’t drop user privileges in the daemon itself, leave this
to systemd and configure it in systemd service configuration
files. (There are exceptions here. For example, for some daemons
there are good reasons to drop privileges inside the daemon
code, after an initialization phase that requires elevated
privileges.)

Don’t write PID files

Grab a name on the bus

You may rely on systemd for logging, you are welcome to log
whatever you need to log to stderr.

Let systemd create and watch sockets for you, so that socket
activation works. Hence, interpret $LISTEN_FDS and
$LISTEN_PID as described above.

Use SIGTERM for requesting shut downs from your daemon.

The list above is very similar to what Apple
recommends for daemons compatible with launchd
. It should be
easy to extend daemons that already support launchd
activation to support systemd activation as well.

Note that systemd supports daemons not written in this style
perfectly as well, already for compatibility reasons (launchd has
only limited support for that). As mentioned, this even extends to
existing inetd capable daemons which can be used unmodified for
socket activation by systemd.

So, yes, should systemd prove itself in our experiments and get
adopted by the distributions it would make sense to port at least
those services that are started by default to use socket or
bus-based activation. We have
written proof-of-concept patches
, and the porting turned out
to be very easy. Also, we can leverage the work that has already
been done for launchd, to a certain extent. Moreover, adding
support for socket-based activation does not make the service
incompatible with non-systemd systems.

FAQs

Who’s behind this?

Well, the current code-base is mostly my work, Lennart
Poettering (Red Hat). However the design in all its details is
result of close cooperation between Kay Sievers (Novell) and
me. Other people involved are Harald Hoyer (Red Hat), Dhaval
Giani (Formerly IBM), and a few others from various
companies such as Intel, SUSE and Nokia.

Is this a Red Hat project?

No, this is my personal side project. Also, let me emphasize
this: the opinions reflected here are my own. They are not
the views of my employer, or Ronald McDonald, or anyone
else.

Will this come to Fedora?

If our experiments prove that this approach works out, and
discussions in the Fedora community show support for this, then
yes, we’ll certainly try to get this into Fedora.

Will this come to OpenSUSE?

Kay’s pursuing that, so something similar as for Fedora applies here, too.

Will this come to Debian/Gentoo/Mandriva/MeeGo/Ubuntu/[insert your favourite distro here]?

That’s up to them. We’d certainly welcome their interest, and help with the integration.

Why didn’t you just add this to Upstart, why did you invent something new?

Well, the point of the part about Upstart above was to show
that the core design of Upstart is flawed, in our
opinion. Starting completely from scratch suggests itself if the
existing solution appears flawed in its core. However, note that
we took a lot of inspiration from Upstart’s code-base
otherwise.

If you love Apple launchd so much, why not adopt that?

launchd is a great invention, but I am not convinced that it
would fit well into Linux, nor that it is suitable for a system
like Linux with its immense scalability and flexibility to
numerous purposes and uses.

Is this an NIH project?

Well, I hope that I managed to explain in the text above why
we came up with something new, instead of building on Upstart or
launchd. We came up with systemd due to technical
reasons, not political reasons.

Don’t forget that it is Upstart that includes
a library called NIH
(which is kind of a reimplementation of glib) — not systemd!

Will this run on [insert non-Linux OS here]?

Unlikely. As pointed out, systemd uses many Linux specific
APIs (such as epoll, signalfd, libudev, cgroups, and numerous
more), a port to other operating systems appears to us as not
making a lot of sense. Also, we, the people involved are
unlikely to be interested in merging possible ports to other
platforms and work with the constraints this introduces. That said,
git supports branches and rebasing quite well, in case
people really want to do a port.

Actually portability is even more limited than just to other OSes: we require a very
recent Linux kernel, glibc, libcgroup and libudev. No support for
less-than-current Linux systems, sorry.

If folks want to implement something similar for other
operating systems, the preferred mode of cooperation is probably
that we help you identify which interfaces can be shared with
your system, to make life easier for daemon writers to support
both systemd and your systemd counterpart. Probably, the focus should be
to share interfaces, not code.

I hear [fill one in here: the Gentoo boot system, initng,
Solaris SMF, runit, uxlaunch, …] is an awesome init system and
also does parallel boot-up, so why not adopt that?

Well, before we started this we actually had a very close
look at the various systems, and none of them did what we had in
mind for systemd (with the exception of launchd, of course). If
you cannot see that, then please read again what I wrote
above.

<!– First you break my
audio
, and now you want to corrupt my boot?

Yes. And don’t forget that I am also responsible for crucifying your network. I am
coming after you! Muhahahaha!–>

Contributions

We are very interested in patches and help. It should be common
sense that every Free Software project can only benefit from the
widest possible external contributions. That is particularly true
for a core part of the OS, such as an init system. We value your
contributions and hence do not require copyright assignment (Very
much unlike Canonical/Upstart
!). And also, we use git,
everybody’s favourite VCS, yay!

We are particularly interested in help getting systemd to work
on other distributions, besides Fedora and OpenSUSE. (Hey, anybody
from Debian, Gentoo, Mandriva, MeeGo looking for something to do?)
But even beyond that we are keen to attract contributors on every
level: we welcome C hackers, packagers, as well as folks who are interested
to write documentation, or contribute a logo.

Community

At this time we only have source code
repository
and an IRC channel (#systemd on
Freenode). There’s no mailing list, web site or bug tracking
system. We’ll probably set something up on freedesktop.org
soon. If you have any questions or want to contact us otherwise we
invite you to join us on IRC!

Update: our GIT repository has moved.

Will Nokia Ever Realize Open Source Is Not a Panacea?

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2011/08/18/open-source-not-panacea.html

I was pretty sure there was something wrong with the whole thing in
fall of 2009, when they first asked me. A Nokia employee contacted me
to ask if I’d be willing to be a director of
the Symbian
Foundation
(or so I thought that’s what they were asking —
read on). I wrote them a thoughtful response explaining my then-current
concerns about
Symbian:

the poor choice of
the Eclipse Public
License
for the eventual code,

the fact that Symbian couldn’t be built in any software freedom
system environment, and

that the Symbian source code that had been released thus far didn’t
actually run on any existing phones.

I nevertheless offered to serve as a director for one year, and I would
resign at that point if the problems that I’d listed weren’t
resolved.

I figured that was quite a laundry list. I also figured that they
probably wouldn’t be interested anyway once they saw my list.
Amusingly, they still were. But then, I realized what was really going
on.

In response to my laundry list, I got back a rather disturbing response
that showed a confusion in my understanding. I wasn’t being invited to
join the board of the Symbian Foundation. They had asked me instead to
serve as a Director of a small USA entity (that
they heralded
as Symbian DevCo
) that would then be permitted one Representative of
the Symbian Foundation itself, which was, in turn, a trade association
controlled by dozens of proprietary software companies.

In fact, this Nokia employee said that they planned to channel all
individual developers toward this Symbian DevCo in the USA,
and that would be the only voice these developers would have in
the direction of Symbian. It would be one tiny voice against dozens of
proprietary software company who controlled the real Symbian Foundation,
a trade association.

Anyone who has worked in the non-profit sector, or even contributed to
any real software freedom project can see what’s deeply wrong
there. However, my response wasn’t to refuse. I wrote back and said
clearly why this was failing completely to create a software freedom
community that could survive vibrantly. I pointed out the way the Linux
community was structured: whereby the Linux Foundation is a trade
association for companies — and, while they do fund Linus’ salary,
they don’t control his or any other activities of developers.
Meanwhile, the individual Linux developers have all the real authority:
from community structure, to licensing, to holding copyrights, to
technical decision-making. I pointed out if they wanted Symbian to
succeed, they should emulate Linux as much as they could. I suggested
Nokia immediately change the whole structure to have developers in
charge of the project, and have a path for Symbian DevCo to ultimately
be the primary organization in charge of the codebase, while Symbian
Foundation could remain the trade association, roughly akin to the Linux
Foundation. I offered to help them do that.

You might guess that I never got a reply to that email. It was thus no
surprise to me in the least what happened to Symbian after that:

In December 2010 (nearly 13 months to the day after my email exchange
described
above), the
Symbian Foundation shut down all its websites
.

In February
2011, Nokia
announced its partnership with Microsoft to prefer Windows 7 on its phones
.

In April
2011, Nokia
announced that Symbian would no longer be available as Free Software
.

In June
2011, Nokia
announced that some other consulting company will take over proprietary
development of Symbian
.

So, within 17 months of Symbian Foundation’s inquiry to ask me to help
run Symbian DevCo, the (Open Source) Symbian project was
canceled entirely, the codebase was now again proprietary (with
a few of the old
codedumps floating around on other sites
),
and the Symbian Foundation
consists only of a single webpage filled with double-speak
.

Of course, even if Nokia had tried its hardest to build an actual
software freedom community, Symbian still had a good chance of
failing, as I
pointed out in March 2010
. But, if Nokia had actually tried to
release control and let developers have some authority, Symbian might
have had a fighting chance as Free Software. As it turned out, Nokia
threw some code over the wall, gave all the power to decide what happens
to a bunch of proprietary software companies, and then hung it all out
to dry. It’s a shining example of how to liberate software in a way
that will guarantee its deprecation in short order.

Of course, we now know that during all this time, Nokia was busy
preparing a backroom deal that would end its
always-burgeoning-but-never-complete affiliation with software freedom
by making a deal with Microsoft to control the future of Nokia. It’s a
foolish decision for software freedom; whether it’s a good business
decision surely isn’t for me to judge. (After all, I haven’t worked in
the for-profit sector for fifteen years for a reason.)

It’s true that I’ve always given a hard time to Maemo (and to MeeGo as
well). Those involved from inside Nokia spent the last six months
telling me that MeeGo is run by completely different people at Nokia,
and Nokia
did recently launch yet another MeeGo based product
. I’ve meanwhile
gotten the impression that Nokia is one of those companies whose
executives are more like wealthy Romans who like to pit their champions
against each other in the arena to see who wins; Nokia’s various
divisions appear to be in constant competition with each other. I
imagine someone running the place has read too much Ayn Rand.

Of course, it now seems that MeeGo hasn’t, in Nokia’s view,
“survived as the fittest”.
I learned today (thanks
to jwildeboer) that,
In
Elop’s words, there is no returning to MeeGo, even if the N9 turns out
to be a hit
. Nokia’s commitment to Maemo/MeeGo, while it did last
at least four years or so, is now gone too, as they begin their march to
Microsoft’s funeral dirge. Yet another FLOSS project Nokia got serious
about, coordinated poorly, and yet ultimately gave up.

Upon considering Nokia’s bad trajectory, it led me to think about how
Open Source companies tend to succeed. I’ve noticed something
interesting, which I’ve confirmed by talking to a lot of employees of
successful Open Source companies. The successful ones — those
that get something useful done for software freedom while also making
some cash (i.e., the true promise of Open Source) — let the
developers run the software projects themselves. Such
companies don’t relegate the developers into a small
non-profit that has to lobby dozens of proprietary software companies to
actually make an impact. They
don’t throw code over the wall — rather, they
fund developers who make their own decisions about what to do in the
software. Ultimately, smart Open Source companies treat software
freedom development like R&D should be treated: fund it and see what
comes out and try to build a business model after something’s already
working. Companies like Nokia, by contrast, constantly put their carts
in front of all the horses and wonder why those horses whinny loudly at
them but don’t write any code.

Open Source slowly became a fad during the DotCom era, and it strangely
remains such. A lot of companies follow fads, particularly when they
can’t figure what else to do. The fad becomes a quick-fix solution. Of
course, for those of us that started as volunteers and enthusiasts in
1991 or earlier, software freedom isn’t some new attraction at
P. T. Barnum’s circus. It’s a community where we belong and collaborate
to improve society. Companies are welcomed to join us for the ride, but
only if they put developers and users in charge.

Meanwhile, my personal postscript to my old conversation with Nokia
arrived in my inbox late in May 2011. I received a extremely vague email
from a lawyer at Nokia. She wanted really badly to figure out how to
quickly dump some software project — and she wouldn’t tell me what
it was — into the Software Freedom Conservancy. Of course, I’m
sure this lawyer knows nothing about the history of the Symbian project
wooing me for directorship of Symbian DevCo and all the other history of
why “throwing code over the wall” into a non-profit is
rarely known to work, particularly for Nokia. I sent her a response
explaining all the problems with her request, and, true to Nokia’s
style, she didn’t even bother to respond to me thanking me for my
time.

I can’t wait to see what project Nokia dumps over the wall next, and
then, in another 17 months (or if they really want to lead us
on, four years), decides to proprietarize or abandon it because, they’ll
say, this open-sourcing thing just doesn’t work. Yet, so many
companies make money with it. The short answer is: Nokia,
you keep doing it wrong!

Update (2011-08-24):
Boudewijn
Rempt argued another side of this question
.
He says
the Calligra suite is a counterexample of Nokia getting a FLOSS project
right
. I don’t know enough about Calligra to agree or disagree.

Canonical, Ltd. Finally On Record: Seeking Open Core

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2010/10/17/shuttleworth-admits-it.html

I’ve
written before
about my deep skepticism regarding the true motives
of Canonical, Ltd.’s advocacy and demand of for-profit corporate
copyright assignment without promises to adhere
to copyleft. I’ve
often asked Canonical employees,
including Jono
Bacon
, Amanda
Brock
, Jane
Silber
, Mark
Shuttleworth
himself, and —
in the
comments of this very blog post

Matt Asay to
explain (a) why exactly
they demand copyright
assignment on their projects
, rather than merely having contributors
agree to the GNU
GPL
formally (like
projects such as Linux do), and (b) why, having received a contributor’s
copyright assignment, Canonical, Ltd. refuses to promise
to keep the software copylefted and never proprietarize it
(FSF, for example, has always done the latter in assignments). When I
ask these questions of Canonical, Ltd. employees, they invariably
artfully change the subject.

I’ve actually been asking these questions for at least a year and a
half, but I really began to get worried earlier this year
when Mark
Shuttleworth falsely claimed
that Canonical, Ltd.’s copyright
assignment was no different than the FSF’s copyright assignment.
That event made it clear to me that there was a job of salesmanship
going on: Canonical, Ltd. was trying to sell something to community that
the community doesn’t want nor need, and trying to reuse the good name
of other people and organizations to do it.

Since that interview in February, Canonical, Ltd. has launched a
manipulatively named product called
“Project
Harmony”
. They market this product as a “summit”
of sorts — purported to have no determined agenda other than
to discuss the issue of contributor agreements and copyright
assignment, and come to a community consensus on this. Their
goal, however, was merely to get community members to lend their good
names to the process. Indeed, Canonical, Ltd. has oft attempted to use
the involvement of good people to make it seem as if Canonical, Ltd.’s
agenda is endorsed by many. In
fact, FSF
recently distanced itself from the process
because of Canonical,
Ltd.’s actions in this
regard. Simon
Phipps had similarly distanced himself before that
.

Nevertheless, it seems Canonical, Ltd. now believes that they’ve
succeed in their sales job, because they’ve now confessed their true
motive. In
an IRC
Q&A session
last
Thursday0, Shuttleworth
finally admits that his goal is to increase the amount
of “Open
Core”
activity. Specifically, Shuttleworth says at 15:21 (and
following):

[C]ompare Qt and Gtk, Qt has a contribution agreement, Gtk doesn’t, for a
while, back in the bubble, Sun, Red Hat, Ximian and many other companies
threw money at Gtk and it grew and improved very quickly but, then they
lost interest, and it has stagnated. Qt was owned by Trolltech it was open
source (GPL) but because of the contribution agreement they had many
options including proprietary licensing, which is just fine with me
alongside the GPL and later, because they owned Qt completely, they were
an attractive acquisition for Nokia, All in all, the Qt ecosystem has
benefitted and the Gtk ecosystem hasn’t.

It takes some careful analysis to parse what’s going on here. First of
all, Shuttleworth is glossing over a lot of complicated Qt history. Qt
started with a non-FaiF
license (QPL),
which later
became a GPL-incompatible Free Software license
. After a few years
of this
oddball, license-proliferation-style
software freedom license, Trolltech stumbled upon the “Open
Core” model (likely inspired by MySQL AB), and switched to GPL.
When Nokia
bought Trolltech
, Nokia itself discovered that full-on “Open Core”
was bad for the code base, and
(as I
heralded at the
time
) relicensed
the codebase to LGPL
(the same license used by Gtk). A few
months after that, Nokia
abandoned
copyright assignment completely for Qt
as well! (I.e., Shuttleworth
is just wrong on this point entirely.) In fact, Shuttleworth, rather
than supporting his pro-Open-Core argument, actually gave the prime
example of Nokia/TrollTech’s lesson learned: “don’t do an
Open-Core-style contributor agreement, you’ll regret it”.
(RMS
also
recently published
a good essay on this subject
).

Furthermore, Shuttleworth also ignores completely plenty of historical angst in
communities that rely on Qt, which often had difficulty getting bugfixes
upstream and other such challenges when dealing with a for-profit
controlled “Open Core” library. (These were, in fact, among the
reasons Nokia
gave in May 2009 for the change in policy
). Indeed, if the proprietary
relicensing business is what made Trolltech such a lucrative acquisition
for Nokia, why did they abandon the business model entirely within four
months of the acquisition?

Although, Shuttleworth’s “lucrative acquisition” point has
some validity. Namely, “Open Core” makes wealthy,
profit-driven types (e.g.,
VCs) drool. Meanwhile,
people like
me,
Simon
Phipps
, NASA’s
Chris
Kemp
, John
Mark Walker
, Tarus
Balog
and many others are either very skeptical about “Open Core”, or
dead-set against it. The reason it’s meeting with so much opposition is
because “Open Core” is a VC-friendly way to control all the
copyright “assets” while pretending to actually have the
goal of building an Open Source community. The real goal of “Open
Core”, of course, is a bait-and-switch move. (Details on that are
beyond the scope of this post and well covered in the links I’ve
given.)

As to Shuttleworth’s argument of Gtk stagnation, after
my trip
this past summer to GUADEC
, I’m quite convinced that the GNOME
community is extremely healthy. Indeed,
as Dave
Neary’s GNOME Census shows
, the GNOME codebases are well-contributed
to by various corporate entities and (more importantly) volunteers.
For-profit corporate folks like Shuttleworth and his executives tend not
to like communities where a non-profit (in this case,
the GNOME Foundation)
shepherds a project and keeps the multiple for-profit interests at bay.
In fact, he dislikes this so much that when GNOME was
recently documenting
its long standing copyright policies
, he sent Silber to the GNOME
Advisory Board (the first and only time Canonical, Ltd. sent such a high
profile person to the Advisory Board) to argue against
the long-standing GNOME community preference for no
copyright assignment on its
projects1.
Silber’s primary argument was that it was unreasonable for individual
contributors to even ask to keep their own copyrights, since
Canonical, Ltd. puts in the bulk of the work on their projects that
require copyright assignment. Her argument was, in other words, an
anti-software-freedom equality argument: a for-profit company is more
valuable to the community than the individual contributor. Fortunately,
GNOME Foundation didn’t fall for this, continued its work with Intel to
get the Clutter codebase free of copyright assignment (and that work has
since succeeded). It’s also particularly ironic that, a few months
later, Neary showed that the very company making that argument
contributes 22% less to the GNOME codebase than the volunteers
Silber once argued don’t contribute enough to warrant keeping their
copyrights.

So, why have Shuttleworth and his staff been on a year-long campaign to
convince everyone to embrace “Open Core” and give up all
their rights that copyleft provides? Well, in the same IRC log (at
15:15) I quoted above, Shuttleworth admits that he has some work
left to do to make Canonical, Ltd. profitable. And therein lies the
connection: Shuttleworth admits Canonical, Ltd.’s profitability is a
major goal (which is probably obvious). Then, in his next answer, he
explains at great length how lucrative and important “Open
Core” is. We should accept “Open Core”, Shuttleworth
argues, merely because it’s so important that Canonical, Ltd. be
profitable.

Shuttleworth’s argument reminds me of a story
that Michael Moore (who
famously made the
documentary Roger and Me
, and has since made other
documentaries) told at a book-signing in the mid-1990s. Moore said (I’m
paraphrasing from memory here, BTW):

Inevitably, I end up on planes next to some corporate executive. They
look at me a few times, and then say: Hey, I know you, you’re Roger
Moore [audience laughs]. What I want to know, is what the hell have you
got against profit? What’s wrong with profit, anyway? The
answer I give is simple: There’s nothing wrong with profit at all. The
question I’m raising is: What lengths are acceptable to achieve profit?
We all agree that we can’t exploit child labor and other such things, even
if that helps profitability. Yet, once upon a time, these sorts of
horrible policies were acceptable for corporations. So, my point is that
we still need more changes to balance the push for profit with what’s
right for workers.

I quote this at length to make it abundantly clear: I’m not opposed to
Canonical, Ltd. making a profit by supporting software freedom. I’m
glad that Shuttleworth has contributed a non-trivial part of his
personal wealth to start a company that employs many excellent
FLOSS
developers (and even sometimes lets those developers work on upstream
projects). But the question really is: Are the values of software
freedom worth giving up merely to make Canonical, Ltd. profitable?
Should we just accept
that proprietary
network services like UbuntuOne
, integrated on nearly every menu of
the desktop, as reasonable merely because it might help Canonical,
Ltd. make a few bucks? Do we think we should abandon copyleft’s
assurances of fair treatment to all, and hand over full
proprietarization powers on GPL’d software to for-profit companies,
merely so they can employ a few FLOSS developers to work primarily on
non-upstream projects?

I don’t think so. I’m often critical of Red Hat, but one thing they do
get right in this regard is a healthy encouragement of their developers
to start, contribute to, and maintain upstream projects that live in the
community rather than inside Red Hat. Red Hat currently allows its
engineers to keep their own copyrights and license them under whatever
license the upstream project uses, binding them to the terms of the
copyleft licenses (when the upstream project is copylefted). For
projects generated inside Red Hat,
after experimenting
with the sorts of CLAs that I’m complaining about
,
they learned
from the mistake and corrected it
(although
unfortunately, Red
Hat hasn’t universally corrected the problem
). For the most part,
Red Hat encourages outside contributors to give under their own
copyright under the outbound license Red Hat chose for its projects
(some of which are also copylefted). Red Hat’s newer policies have some
flaws (details of which are beyond the scope of this post), but it’s
orders of magnitude better than the copyright assignment intimidation
tactics that other companies, like Canonical, Ltd., now employ.

So, don’t let a friendly name like “Harmony” fool you.
Our community has some key infrastructure, such as the copyleft itself,
that
actually keeps us harmonious.
Contributor
agreements aren’t created equal
, and therefore we should oppose the
idea that contributor and assignment agreements should be set to the
lowest common denominator to enable a for-profit corporate land-grab
that Shuttleworth and other “Open Core” proponents seek.
I also strongly advise the organizations and individuals who are
assisting Canonical, Ltd. in this goal to stop immediately,
particularly now that Shuttleworth has announced his “Open
Core” plans.

Update (2010-10-18): In comments, many people have,
quite correctly, argued that I have not proved that Canonical,
Ltd. has plans to go “Open Core” with their
copyright-assigned copyleft products. Such comments are correct; I
intended this article to be an opinion piece, not a logical proof. I
further agree that without absolute proof, the title of
this blog post is an exaggeration. (I didn’t change it, as that seemed
disingenuous after the fact).

Anyway, to be clear, the only thing the chain of events described
above prove is that Canonical, Ltd. wants “Open
Core” as a possibility for the future. That part is
trivially true: if they didn’t want to reserve the possibility, they’d
simply make a promise-back to keep the software as Free Software in
their assignment. The only reason not to make an
FSF-style promise-back is that you want to reserve the
possibility of proprietary relicensing.

Meanwhile, even though I cannot construct a logical proof of it, I
still believe the only possible explanation for this 1+ year marketing
campaign described above is that Canonical, Ltd. is moving toward
“Open Core” for those projects on which they are the sole
copyright holder. I have asked others to offer alternative
explanations of why Canonical, Ltd. is carrying out this campaign: I
agree that there could exist another logical explanation other than the
one I’ve presented. If someone can come up with one, then I would be
happy to link to it here.

Finally, if Canonical, Ltd. comes out with a statement that they’ll
switch to using FSF’s promise-back in their assignments, I will be very
happy to admit I was wrong. The outcome I want is for individual
developers to be treated right by corporations in control of particular
codebases; I would much rather that happen than be correct in my
opinions.

0I
originally credited OMG Ubuntu
as publishing
Shutleworth’s comments as an interview
. Their reformatting
of his comments temporarily confused me, and I thought they’d
done an interview. Thanks
to @gotunandan who
pointed this out.

1Ironically, the
debate had nothing to do with a Canonical, Ltd. codebase, since their
contributions amount to so little (1%) of the GNOME codebase anyway.
The debate was about the Clutter/Intel situation, which has since been
resolved.

Responses Not In the Identica Thread:

Alex
Hudson’s blog post

Discussion on
Hacker News

LWN comments
Matt
Aslett’s response

and my
response to him
Ingolf
Schaefer’s blog post
, which only allows comments with a Google
Account, so I comment below instead (to be clear, I’m not criticizing
Ingolf’s choice of Google-account-to-comment, especially since I make
everyone who wants to comment here sign up for identi.ca ;):
Ingolf, you noted that you’d rather I not try to read between the lines
to deduce that proprietary relicensing and/or “Open Core” is
where Canonical, Ltd.’s marketing is leading. I disagree; I think it’s
useful to consider what seems a likely end-outcome here. My primary
goal is to draw attention to it now in hopes of preventing it from
happening. My best possible outcome is that I get proved wrong, and
Canonical makes a promise-back in their assignment and/or CLA.

Meanwhile, I don’t think they can go “Open Core”
and/or proprietary relicensing for all of Ubuntu, as you are saying.
They aren’t sole copyright holder in most of Ubuntu. The places where
they can pursue these options is in Launchpad, pbuilder, upstart, and
the other projects that require CLA and/or assignment.

I don’t know for sure that they’ll do this, as I say above. I can
deduce no other explanation. As I keep saying, if someone else has
another possible explanation for Canonical, Ltd.’s behavior that I list
above, I’m happy to link to it here. I can’t see any other reason;
they’d surely by now just made an FSF-style promise-back in their CLA if
they didn’t want to hold proprietarization as a possibility.

Rethinking PID 1

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/systemd.html

If you are well connected or good at reading between the lines
you might already know what this blog post is about. But even then
you may find this story interesting. So grab a cup of coffee,
sit down, and read what’s coming.

This blog story is long, so even though I can only recommend
reading the long story, here’s the one sentence summary: we are
experimenting with a new init system and it is fun.

Here’s the code. And here’s the story:

Process Identifier 1

On every Unix system there is one process with the special
process identifier 1. It is started by the kernel before all other
processes and is the parent process for all those other processes
that have nobody else to be child of. Due to that it can do a lot
of stuff that other processes cannot do. And it is also
responsible for some things that other processes are not
responsible for, such as bringing up and maintaining userspace
during boot.

Historically on Linux the software acting as PID 1 was the
venerable sysvinit package, though it had been showing its age for
quite a while. Many replacements have been suggested, only one of
them really took off: Upstart, which has by now found
its way into all major distributions.

As mentioned, the central responsibility of an init system is
to bring up userspace. And a good init system does that
fast. Unfortunately, the traditional SysV init system was not
particularly fast.

For a fast and efficient boot-up two things are crucial:

  • To start less.
  • And to start more in parallel.

What does that mean? Starting less means starting fewer
services or deferring the starting of services until they are
actually needed. There are some services where we know that they
will be required sooner or later (syslog, D-Bus system bus, etc.),
but for many others this isn’t the case. For example, bluetoothd
does not need to be running unless a bluetooth dongle is actually
plugged in or an application wants to talk to its D-Bus
interfaces. Same for a printing system: unless the machine
physically is connected to a printer, or an application wants to
print something, there is no need to run a printing daemon such as
CUPS. Avahi: if the machine is not connected to a
network, there is no need to run Avahi, unless some application wants
to use its APIs. And even SSH: as long as nobody wants to contact
your machine there is no need to run it, as long as it is then
started on the first connection. (And admit it, on most machines
where sshd might be listening somebody connects to it only every
other month or so.)

Starting more in parallel means that if we have
to run something, we should not serialize its start-up (as sysvinit
does), but run it all at the same time, so that the available
CPU and disk IO bandwidth is maxed out, and hence
the overall start-up time minimized.

Hardware and Software Change Dynamically

Modern systems (especially general purpose OS) are highly
dynamic in their configuration and use: they are mobile, different
applications are started and stopped, different hardware added and
removed again. An init system that is responsible for maintaining
services needs to listen to hardware and software
changes. It needs to dynamically start (and sometimes stop)
services as they are needed to run a program or enable some
hardware.

Most current systems that try to parallelize boot-up still
synchronize the start-up of the various daemons involved: since
Avahi needs D-Bus, D-Bus is started first, and only when D-Bus
signals that it is ready, Avahi is started too. Similar for other
services: livirtd and X11 need HAL (well, I am considering the
Fedora 13 services here, ignore that HAL is obsolete), hence HAL
is started first, before livirtd and X11 are started. And
libvirtd also needs Avahi, so it waits for Avahi too. And all of
them require syslog, so they all wait until Syslog is fully
started up and initialized. And so on.

Parallelizing Socket Services

This kind of start-up synchronization results in the
serialization of a significant part of the boot process. Wouldn’t
it be great if we could get rid of the synchronization and
serialization cost? Well, we can, actually. For that, we need to
understand what exactly the daemons require from each other, and
why their start-up is delayed. For traditional Unix daemons,
there’s one answer to it: they wait until the socket the other
daemon offers its services on is ready for connections. Usually
that is an AF_UNIX socket in the file-system, but it could be
AF_INET[6], too. For example, clients of D-Bus wait that
/var/run/dbus/system_bus_socket can be connected to,
clients of syslog wait for /dev/log, clients of CUPS wait
for /var/run/cups/cups.sock and NFS mounts wait for
/var/run/rpcbind.sock and the portmapper IP port, and so
on. And think about it, this is actually the only thing they wait
for!

Now, if that’s all they are waiting for, if we manage to make
those sockets available for connection earlier and only actually
wait for that instead of the full daemon start-up, then we can
speed up the entire boot and start more processes in parallel. So,
how can we do that? Actually quite easily in Unix-like systems: we
can create the listening sockets before we actually start
the daemon, and then just pass the socket during exec()
to it. That way, we can create all sockets for all
daemons in one step in the init system, and then in a second step
run all daemons at once. If a service needs another, and it is not
fully started up, that’s completely OK: what will happen is that
the connection is queued in the providing service and the client
will potentially block on that single request. But only that one
client will block and only on that one request. Also, dependencies
between services will no longer necessarily have to be configured
to allow proper parallelized start-up: if we start all sockets at
once and a service needs another it can be sure that it can
connect to its socket.

Because this is at the core of what is following, let me say
this again, with different words and by example: if you start
syslog and and various syslog clients at the same time, what will
happen in the scheme pointed out above is that the messages of the
clients will be added to the /dev/log socket buffer. As
long as that buffer doesn’t run full, the clients will not have to
wait in any way and can immediately proceed with their start-up. As
soon as syslog itself finished start-up, it will dequeue all
messages and process them. Another example: we start D-Bus and
several clients at the same time. If a synchronous bus
request is sent and hence a reply expected, what will happen is
that the client will have to block, however only that one client
and only until D-Bus managed to catch up and process it.

Basically, the kernel socket buffers help us to maximize
parallelization, and the ordering and synchronization is done by
the kernel, without any further management from userspace! And if
all the sockets are available before the daemons actually start-up,
dependency management also becomes redundant (or at least
secondary): if a daemon needs another daemon, it will just connect
to it. If the other daemon is already started, this will
immediately succeed. If it isn’t started but in the process of
being started, the first daemon will not even have to wait for it,
unless it issues a synchronous request. And even if the other
daemon is not running at all, it can be auto-spawned. From the
first daemon’s perspective there is no difference, hence dependency
management becomes mostly unnecessary or at least secondary, and
all of this in optimal parallelization and optionally with
on-demand loading. On top of this, this is also more robust, because
the sockets stay available regardless whether the actual daemons
might temporarily become unavailable (maybe due to crashing). In
fact, you can easily write a daemon with this that can run, and
exit (or crash), and run again and exit again (and so on), and all
of that without the clients noticing or loosing any request.

It’s a good time for a pause, go and refill your coffee mug,
and be assured, there is more interesting stuff following.

But first, let’s clear a few things up: is this kind of logic
new? No, it certainly is not. The most prominent system that works
like this is Apple’s launchd system: on MacOS the listening of the
sockets is pulled out of all daemons and done by launchd. The
services themselves hence can all start up in parallel and
dependencies need not to be configured for them. And that is
actually a really ingenious design, and the primary reason why
MacOS manages to provide the fantastic boot-up times it
provides. I can highly recommend this
video
where the launchd folks explain what they are
doing. Unfortunately this idea never really took on outside of the Apple
camp.

The idea is actually even older than launchd. Prior to launchd
the venerable inetd worked much like this: sockets were
centrally created in a daemon that would start the actual service
daemons passing the socket file descriptors during
exec(). However the focus of inetd certainly
wasn’t local services, but Internet services (although later
reimplementations supported AF_UNIX sockets, too). It also wasn’t a
tool to parallelize boot-up or even useful for getting implicit
dependencies right.

For TCP sockets inetd was primarily used in a way that
for every incoming connection a new daemon instance was
spawned. That meant that for each connection a new
process was spawned and initialized, which is not a
recipe for high-performance servers. However, right from the
beginning inetd also supported another mode, where a
single daemon was spawned on the first connection, and that single
instance would then go on and also accept the follow-up connections
(that’s what the wait/nowait option in
inetd.conf was for, a particularly badly documented
option, unfortunately.) Per-connection daemon starts probably gave
inetd its bad reputation for being slow. But that’s not entirely
fair.

Parallelizing Bus Services

Modern daemons on Linux tend to provide services via D-Bus
instead of plain AF_UNIX sockets. Now, the question is, for those
services, can we apply the same parallelizing boot logic as for
traditional socket services? Yes, we can, D-Bus already has all
the right hooks for it: using bus activation a service can be
started the first time it is accessed. Bus activation also gives
us the minimal per-request synchronisation we need for starting up
the providers and the consumers of D-Bus services at the same
time: if we want to start Avahi at the same time as CUPS (side
note: CUPS uses Avahi to browse for mDNS/DNS-SD printers), then we
can simply run them at the same time, and if CUPS is quicker than
Avahi via the bus activation logic we can get D-Bus to queue the
request until Avahi manages to establish its service name.

So, in summary: the socket-based service activation and the
bus-based service activation together enable us to start
all daemons in parallel, without any further
synchronization. Activation also allows us to do lazy-loading of
services: if a service is rarely used, we can just load it the
first time somebody accesses the socket or bus name, instead of
starting it during boot.

And if that’s not great, then I don’t know what is
great!

Parallelizing File System Jobs

If you look at
the serialization graphs of the boot process
of current
distributions, there are more synchronisation points than just
daemon start-ups: most prominently there are file-system related
jobs: mounting, fscking, quota. Right now, on boot-up a lot of
time is spent idling to wait until all devices that are listed in
/etc/fstab show up in the device tree and are then
fsck’ed, mounted, quota checked (if enabled). Only after that is
fully finished we go on and boot the actual services.

Can we improve this? It turns out we can. Harald Hoyer came up
with the idea of using the venerable autofs system for this:

Just like a connect() call shows that a service is
interested in another service, an open() (or a similar
call) shows that a service is interested in a specific file or
file-system. So, in order to improve how much we can parallelize
we can make those apps wait only if a file-system they are looking
for is not yet mounted and readily available: we set up an autofs
mount point, and then when our file-system finished fsck and quota
due to normal boot-up we replace it by the real mount. While the
file-system is not ready yet, the access will be queued by the
kernel and the accessing process will block, but only that one
daemon and only that one access. And this way we can begin
starting our daemons even before all file systems have been fully
made available — without them missing any files, and maximizing
parallelization.

Parallelizing file system jobs and service jobs does
not make sense for /, after all that’s where the service
binaries are usually stored. However, for file-systems such as
/home, that usually are bigger, even encrypted, possibly
remote and seldom accessed by the usual boot-up daemons, this
can improve boot time considerably. It is probably not necessary
to mention this, but virtual file systems, such as
procfs or sysfs should never be mounted via autofs.

I wouldn’t be surprised if some readers might find integrating
autofs in an init system a bit fragile and even weird, and maybe
more on the “crackish” side of things. However, having played
around with this extensively I can tell you that this actually
feels quite right. Using autofs here simply means that we can
create a mount point without having to provide the backing file
system right-away. In effect it hence only delays accesses. If an
application tries to access an autofs file-system and we take very
long to replace it with the real file-system, it will hang in an
interruptible sleep, meaning that you can safely cancel it, for
example via C-c. Also note that at any point, if the mount point
should not be mountable in the end (maybe because fsck failed), we
can just tell autofs to return a clean error code (like
ENOENT). So, I guess what I want to say is that even though
integrating autofs into an init system might appear adventurous at
first, our experimental code has shown that this idea works
surprisingly well in practice — if it is done for the right
reasons and the right way.

Also note that these should be direct autofs
mounts, meaning that from an application perspective there’s
little effective difference between a classic mount point and one
based on autofs.

Keeping the First User PID Small

Another thing we can learn from the MacOS boot-up logic is
that shell scripts are evil. Shell is fast and shell is slow. It
is fast to hack, but slow in execution. The classic sysvinit boot
logic is modelled around shell scripts. Whether it is
/bin/bash or any other shell (that was written to make
shell scripts faster), in the end the approach is doomed to be
slow. On my system the scripts in /etc/init.d call
grep at least 77 times. awk is called 92
times, cut 23 and sed 74. Every time those
commands (and others) are called, a process is spawned, the
libraries searched, some start-up stuff like i18n and so on set up
and more. And then after seldom doing more than a trivial string
operation the process is terminated again. Of course, that has to
be incredibly slow. No other language but shell would do something like
that. On top of that, shell scripts are also very fragile, and
change their behaviour drastically based on environment variables
and suchlike, stuff that is hard to oversee and control.

So, let’s get rid of shell scripts in the boot process! Before
we can do that we need to figure out what they are currently
actually used for: well, the big picture is that most of the time,
what they do is actually quite boring. Most of the scripting is
spent on trivial setup and tear-down of services, and should be
rewritten in C, either in separate executables, or moved into the
daemons themselves, or simply be done in the init system.

It is not likely that we can get rid of shell scripts during
system boot-up entirely anytime soon. Rewriting them in C takes
time, in a few case does not really make sense, and sometimes
shell scripts are just too handy to do without. But we can
certainly make them less prominent.

A good metric for measuring shell script infestation of the
boot process is the PID number of the first process you can start
after the system is fully booted up. Boot up, log in, open a
terminal, and type echo $$. Try that on your Linux
system, and then compare the result with MacOS! (Hint, it’s
something like this: Linux PID 1823; MacOS PID 154, measured on
test systems we own.)

Keeping Track of Processes

A central part of a system that starts up and maintains
services should be process babysitting: it should watch
services. Restart them if they shut down. If they crash it should
collect information about them, and keep it around for the
administrator, and cross-link that information with what is
available from crash dump systems such as abrt, and in logging
systems like syslog or the audit system.

It should also be capable of shutting down a service
completely. That might sound easy, but is harder than you
think. Traditionally on Unix a process that does double-forking
can escape the supervision of its parent, and the old parent will
not learn about the relation of the new process to the one it
actually started. An example: currently, a misbehaving CGI script
that has double-forked is not terminated when you shut down
Apache. Furthermore, you will not even be able to figure out its
relation to Apache, unless you know it by name and purpose.

So, how can we keep track of processes, so that they cannot
escape the babysitter, and that we can control them as one unit
even if they fork a gazillion times?

Different people came up with different solutions for this. I
am not going into much detail here, but let’s at least say that
approaches based on ptrace or the netlink connector (a kernel
interface which allows you to get a netlink message each time any
process on the system fork()s or exit()s) that some people have
investigated and implemented, have been criticised as ugly and not
very scalable.

So what can we do about this? Well, since quite a while the
kernel knows Control
Groups
(aka “cgroups”). Basically they allow the creation of a
hierarchy of groups of processes. The hierarchy is directly
exposed in a virtual file-system, and hence easily accessible. The
group names are basically directory names in that file-system. If
a process belonging to a specific cgroup fork()s, its child will
become a member of the same group. Unless it is privileged and has
access to the cgroup file system it cannot escape its
group. Originally, cgroups have been introduced into the kernel
for the purpose of containers: certain kernel subsystems can
enforce limits on resources of certain groups, such as limiting
CPU or memory usage. Traditional resource limits (as implemented
by setrlimit()) are (mostly) per-process. cgroups on the
other hand let you enforce limits on entire groups of
processes. cgroups are also useful to enforce limits outside of
the immediate container use case. You can use it for example to
limit the total amount of memory or CPU Apache and all its
children may use. Then, a misbehaving CGI script can no longer
escape your setrlimit() resource control by simply
forking away.

In addition to container and resource limit enforcement cgroups
are very useful to keep track of daemons: cgroup membership is
securely inherited by child processes, they cannot escape. There’s
a notification system available so that a supervisor process can
be notified when a cgroup runs empty. You can find the cgroups of
a process by reading /proc/$PID/cgroup. cgroups hence
make a very good choice to keep track of processes for babysitting
purposes.

Controlling the Process Execution Environment

A good babysitter should not only oversee and control when a
daemon starts, ends or crashes, but also set up a good, minimal,
and secure working environment for it.

That means setting obvious process parameters such as the
setrlimit() resource limits, user/group IDs or the
environment block, but does not end there. The Linux kernel gives
users and administrators a lot of control over processes (some of
it is rarely used, currently). For each process you can set CPU
and IO scheduler controls, the capability bounding set, CPU
affinity or of course cgroup environments with additional limits,
and more.

As an example, ioprio_set() with
IOPRIO_CLASS_IDLE is a great away to minimize the effect
of locate‘s updatedb on system interactivity.

On top of that certain high-level controls can be very useful,
such as setting up read-only file system overlays based on
read-only bind mounts. That way one can run certain daemons so
that all (or some) file systems appear read-only to them, so that
EROFS is returned on every write request. As such this can be used
to lock down what daemons can do similar in fashion to a poor
man’s SELinux policy system (but this certainly doesn’t replace
SELinux, don’t get any bad ideas, please).

Finally logging is an important part of executing services:
ideally every bit of output a service generates should be logged
away. An init system should hence provide logging to daemons it
spawns right from the beginning, and connect stdout and stderr to
syslog or in some cases even /dev/kmsg which in many
cases makes a very useful replacement for syslog (embedded folks,
listen up!), especially in times where the kernel log buffer is
configured ridiculously large out-of-the-box.

On Upstart

To begin with, let me emphasize that I actually like the code
of Upstart, it is very well commented and easy to
follow. It’s certainly something other projects should learn
from (including my own).

That being said, I can’t say I agree with the general approach
of Upstart. But first, a bit more about the project:

Upstart does not share code with sysvinit, and its
functionality is a super-set of it, and provides compatibility to
some degree with the well known SysV init scripts. It’s main
feature is its event-based approach: starting and stopping of
processes is bound to “events” happening in the system, where an
“event” can be a lot of different things, such as: a network
interfaces becomes available or some other software has been
started.

Upstart does service serialization via these events: if the
syslog-started event is triggered this is used as an
indication to start D-Bus since it can now make use of Syslog. And
then, when dbus-started is triggered,
NetworkManager is started, since it may now use
D-Bus, and so on.

One could say that this way the actual logical dependency tree
that exists and is understood by the admin or developer is
translated and encoded into event and action rules: every logical
“a needs b” rule that the administrator/developer is aware of
becomes a “start a when b is started” plus “stop a when b is
stopped”. In some way this certainly is a simplification:
especially for the code in Upstart itself. However I would argue
that this simplification is actually detrimental. First of all,
the logical dependency system does not go away, the person who is
writing Upstart files must now translate the dependencies manually
into these event/action rules (actually, two rules for each
dependency). So, instead of letting the computer figure out what
to do based on the dependencies, the user has to manually
translate the dependencies into simple event/action rules. Also,
because the dependency information has never been encoded it is
not available at runtime, effectively meaning that an
administrator who tries to figure our why something
happened, i.e. why a is started when b is started, has no chance
of finding that out.

Furthermore, the event logic turns around all dependencies,
from the feet onto their head. Instead of minimizing the
amount of work (which is something that a good init system should
focus on, as pointed out in the beginning of this blog story), it
actually maximizes the amount of work to do during
operations. Or in other words, instead of having a clear goal and
only doing the things it really needs to do to reach the goal, it
does one step, and then after finishing it, it does all
steps that possibly could follow it.

Or to put it simpler: the fact that the user just started D-Bus
is in no way an indication that NetworkManager should be started
too (but this is what Upstart would do). It’s right the other way
round: when the user asks for NetworkManager, that is definitely
an indication that D-Bus should be started too (which is certainly
what most users would expect, right?).

A good init system should start only what is needed, and that
on-demand. Either lazily or parallelized and in advance. However
it should not start more than necessary, particularly not
everything installed that could use that service.

Finally, I fail to see the actual usefulness of the event
logic. It appears to me that most events that are exposed in
Upstart actually are not punctual in nature, but have duration: a
service starts, is running, and stops. A device is plugged in, is
available, and is plugged out again. A mount point is in the
process of being mounted, is fully mounted, or is being
unmounted. A power plug is plugged in, the system runs on AC, and
the power plug is pulled. Only a minority of the events an init
system or process supervisor should handle are actually punctual,
most of them are tuples of start, condition, and stop. This
information is again not available in Upstart, because it focuses
in singular events, and ignores durable dependencies.

Now, I am aware that some of the issues I pointed out above are
in some way mitigated by certain more recent changes in Upstart,
particularly condition based syntaxes such as start on
(local-filesystems and net-device-up IFACE=lo)
in Upstart
rule files. However, to me this appears mostly as an attempt to
fix a system whose core design is flawed.

Besides that Upstart does OK for babysitting daemons, even though
some choices might be questionable (see above), and there are certainly a lot
of missed opportunities (see above, too).

There are other init systems besides sysvinit, Upstart and
launchd. Most of them offer little substantial more than Upstart or
sysvinit. The most interesting other contender is Solaris SMF,
which supports proper dependencies between services. However, in
many ways it is overly complex and, let’s say, a bit academic
with its excessive use of XML and new terminology for known
things. It is also closely bound to Solaris specific features such
as the contract system.

Putting it All Together: systemd

Well, this is another good time for a little pause, because
after I have hopefully explained above what I think a good PID 1
should be doing and what the current most used system does, we’ll
now come to where the beef is. So, go and refill you coffee mug
again. It’s going to be worth it.

You probably guessed it: what I suggested above as requirements
and features for an ideal init system is actually available now,
in a (still experimental) init system called systemd, and
which I hereby want to announce. Again, here’s the
code.
And here’s a quick rundown of its features, and the
rationale behind them:

systemd starts up and supervises the entire system (hence the
name…). It implements all of the features pointed out above and
a few more. It is based around the notion of units. Units
have a name and a type. Since their configuration is usually
loaded directly from the file system, these unit names are
actually file names. Example: a unit avahi.service is
read from a configuration file by the same name, and of course
could be a unit encapsulating the Avahi daemon. There are several
kinds of units:

  1. service: these are the most obvious kind of unit:
    daemons that can be started, stopped, restarted, reloaded. For
    compatibility with SysV we not only support our own
    configuration files for services, but also are able to read
    classic SysV init scripts, in particular we parse the LSB
    header, if it exists. /etc/init.d is hence not much
    more than just another source of configuration.
  2. socket: this unit encapsulates a socket in the
    file-system or on the Internet. We currently support AF_INET,
    AF_INET6, AF_UNIX sockets of the types stream, datagram, and
    sequential packet. We also support classic FIFOs as
    transport. Each socket unit has a matching
    service unit, that is started if the first connection
    comes in on the socket or FIFO. Example: nscd.socket
    starts nscd.service on an incoming connection.
  3. device: this unit encapsulates a device in the
    Linux device tree. If a device is marked for this via udev
    rules, it will be exposed as a device unit in
    systemd. Properties set with udev can be used as
    configuration source to set dependencies for device units.
  4. mount: this unit encapsulates a mount point in the
    file system hierarchy. systemd monitors all mount points how
    they come and go, and can also be used to mount or
    unmount mount-points. /etc/fstab is used here as an
    additional configuration source for these mount points, similar to
    how SysV init scripts can be used as additional configuration
    source for service units.
  5. automount: this unit type encapsulates an automount
    point in the file system hierarchy. Each automount
    unit has a matching mount unit, which is started
    (i.e. mounted) as soon as the automount directory is
    accessed.
  6. target: this unit type is used for logical
    grouping of units: instead of actually doing anything by itself
    it simply references other units, which thereby can be controlled
    together. Examples for this are: multi-user.target,
    which is a target that basically plays the role of run-level 5 on
    classic SysV system, or bluetooth.target which is
    requested as soon as a bluetooth dongle becomes available and
    which simply pulls in bluetooth related services that otherwise
    would not need to be started: bluetoothd and
    obexd and suchlike.
  7. snapshot: similar to target units
    snapshots do not actually do anything themselves and their only
    purpose is to reference other units. Snapshots can be used to
    save/rollback the state of all services and units of the init
    system. Primarily it has two intended use cases: to allow the
    user to temporarily enter a specific state such as “Emergency
    Shell”, terminating current services, and provide an easy way to
    return to the state before, pulling up all services again that
    got temporarily pulled down. And to ease support for system
    suspending: still many services cannot correctly deal with
    system suspend, and it is often a better idea to shut them down
    before suspend, and restore them afterwards.

All these units can have dependencies between each other (both
positive and negative, i.e. ‘Requires’ and ‘Conflicts’): a device
can have a dependency on a service, meaning that as soon as a
device becomes available a certain service is started. Mounts get
an implicit dependency on the device they are mounted from. Mounts
also gets implicit dependencies to mounts that are their prefixes
(i.e. a mount /home/lennart implicitly gets a dependency
added to the mount for /home) and so on.

A short list of other features:

  1. For each process that is spawned, you may control: the
    environment, resource limits, working and root directory, umask,
    OOM killer adjustment, nice level, IO class and priority, CPU policy
    and priority, CPU affinity, timer slack, user id, group id,
    supplementary group ids, readable/writable/inaccessible
    directories, shared/private/slave mount flags,
    capabilities/bounding set, secure bits, CPU scheduler reset of
    fork, private /tmp name-space, cgroup control for
    various subsystems. Also, you can easily connect
    stdin/stdout/stderr of services to syslog, /dev/kmsg,
    arbitrary TTYs. If connected to a TTY for input systemd will make
    sure a process gets exclusive access, optionally waiting or enforcing
    it.
  2. Every executed process gets its own cgroup (currently by
    default in the debug subsystem, since that subsystem is not
    otherwise used and does not much more than the most basic
    process grouping), and it is very easy to configure systemd to
    place services in cgroups that have been configured externally,
    for example via the libcgroups utilities.
  3. The native configuration files use a syntax that closely
    follows the well-known .desktop files. It is a simple syntax for
    which parsers exist already in many software frameworks. Also, this
    allows us to rely on existing tools for i18n for service
    descriptions, and similar. Administrators and developers don’t
    need to learn a new syntax.
  4. As mentioned, we provide compatibility with SysV init
    scripts. We take advantages of LSB and Red Hat chkconfig headers
    if they are available. If they aren’t we try to make the best of
    the otherwise available information, such as the start
    priorities in /etc/rc.d. These init scripts are simply
    considered a different source of configuration, hence an easy
    upgrade path to proper systemd services is available. Optionally
    we can read classic PID files for services to identify the main
    pid of a daemon. Note that we make use of the dependency
    information from the LSB init script headers, and translate
    those into native systemd dependencies. Side note: Upstart is
    unable to harvest and make use of that information. Boot-up on a
    plain Upstart system with mostly LSB SysV init scripts will
    hence not be parallelized, a similar system running systemd
    however will. In fact, for Upstart all SysV scripts together
    make one job that is executed, they are not treated
    individually, again in contrast to systemd where SysV init
    scripts are just another source of configuration and are all
    treated and controlled individually, much like any other native
    systemd service.
  5. Similarly, we read the existing /etc/fstab
    configuration file, and consider it just another source of
    configuration. Using the comment= fstab option you can
    even mark /etc/fstab entries to become systemd
    controlled automount points.
  6. If the same unit is configured in multiple configuration
    sources (e.g. /etc/systemd/system/avahi.service exists,
    and /etc/init.d/avahi too), then the native
    configuration will always take precedence, the legacy format is
    ignored, allowing an easy upgrade path and packages to carry
    both a SysV init script and a systemd service file for a
    while.
  7. We support a simple templating/instance mechanism. Example:
    instead of having six configuration files for six gettys, we
    only have one [email protected] file which gets instantiated to
    [email protected] and suchlike. The interface part can
    even be inherited by dependency expressions, i.e. it is easy to
    encode that a service [email protected] pulls in
    [email protected], while leaving the
    eth0 string wild-carded.
  8. For socket activation we support full compatibility with the
    traditional inetd modes, as well as a very simple mode that
    tries to mimic launchd socket activation and is recommended for
    new services. The inetd mode only allows passing one socket to
    the started daemon, while the native mode supports passing
    arbitrary numbers of file descriptors. We also support one
    instance per connection, as well as one instance for all
    connections modes. In the former mode we name the cgroup the
    daemon will be started in after the connection parameters, and
    utilize the templating logic mentioned above for this. Example:
    sshd.socket might spawn services
    [email protected] with a
    cgroup of [email protected]/192.168.0.1-4711-192.168.0.2-22
    (i.e. the IP address and port numbers are used in the instance
    names. For AF_UNIX sockets we use PID and user id of the
    connecting client). This provides a nice way for the
    administrator to identify the various instances of a daemon and
    control their runtime individually. The native socket passing
    mode is very easily implementable in applications: if
    $LISTEN_FDS is set it contains the number of sockets
    passed and the daemon will find them sorted as listed in the
    .service file, starting from file descriptor 3 (a
    nicely written daemon could also use fstat() and
    getsockname() to identify the sockets in case it
    receives more than one). In addition we set $LISTEN_PID
    to the PID of the daemon that shall receive the fds, because
    environment variables are normally inherited by sub-processes and
    hence could confuse processes further down the chain. Even
    though this socket passing logic is very simple to implement in
    daemons, we will provide a BSD-licensed reference implementation
    that shows how to do this. We have ported a couple of existing
    daemons to this new scheme.
  9. We provide compatibility with /dev/initctl to a
    certain extent. This compatibility is in fact implemented with a
    FIFO-activated service, which simply translates these legacy
    requests to D-Bus requests. Effectively this means the old
    shutdown, poweroff and similar commands from
    Upstart and sysvinit continue to work with
    systemd.
  10. We also provide compatibility with utmp and
    wtmp. Possibly even to an extent that is far more
    than healthy, given how crufty utmp and wtmp
    are.
  11. systemd supports several kinds of
    dependencies between units. After/Before can be used to fix
    the ordering how units are activated. It is completely
    orthogonal to Requires and Wants, which
    express a positive requirement dependency, either mandatory, or
    optional. Then, there is Conflicts which
    expresses a negative requirement dependency. Finally, there are
    three further, less used dependency types.
  12. systemd has a minimal transaction system. Meaning: if a unit
    is requested to start up or shut down we will add it and all its
    dependencies to a temporary transaction. Then, we will
    verify if the transaction is consistent (i.e. whether the
    ordering via After/Before of all units is
    cycle-free). If it is not, systemd will try to fix it up, and
    removes non-essential jobs from the transaction that might
    remove the loop. Also, systemd tries to suppress non-essential
    jobs in the transaction that would stop a running
    service. Non-essential jobs are those which the original request
    did not directly include but which where pulled in by
    Wants type of dependencies. Finally we check whether
    the jobs of the transaction contradict jobs that have already
    been queued, and optionally the transaction is aborted then. If
    all worked out and the transaction is consistent and minimized
    in its impact it is merged with all already outstanding jobs and
    added to the run queue. Effectively this means that before
    executing a requested operation, we will verify that it makes
    sense, fixing it if possible, and only failing if it really cannot
    work.
  13. We record start/exit time as well as the PID and exit status
    of every process we spawn and supervise. This data can be used
    to cross-link daemons with their data in abrtd, auditd and
    syslog. Think of an UI that will highlight crashed daemons for
    you, and allows you to easily navigate to the respective UIs for
    syslog, abrt, and auditd that will show the data generated from
    and for this daemon on a specific run.
  14. We support reexecution of the init process itself at any
    time. The daemon state is serialized before the reexecution and
    deserialized afterwards. That way we provide a simple way to
    facilitate init system upgrades as well as handover from an
    initrd daemon to the final daemon. Open sockets and autofs
    mounts are properly serialized away, so that they stay
    connectible all the time, in a way that clients will not even
    notice that the init system reexecuted itself. Also, the fact
    that a big part of the service state is encoded anyway in the
    cgroup virtual file system would even allow us to resume
    execution without access to the serialization data. The
    reexecution code paths are actually mostly the same as the init
    system configuration reloading code paths, which
    guarantees that reexecution (which is probably more seldom
    triggered) gets similar testing as reloading (which is probably
    more common).
  15. Starting the work of removing shell scripts from the boot
    process we have recoded part of the basic system setup in C and
    moved it directly into systemd. Among that is mounting of the API
    file systems (i.e. virtual file systems such as /proc,
    /sys and /dev.) and setting of the
    host-name.
  16. Server state is introspectable and controllable via
    D-Bus. This is not complete yet but quite extensive.
  17. While we want to emphasize socket-based and bus-name-based
    activation, and we hence support dependencies between sockets and
    services, we also support traditional inter-service
    dependencies. We support multiple ways how such a service can
    signal its readiness: by forking and having the start process
    exit (i.e. traditional daemonize() behaviour), as well
    as by watching the bus until a configured service name appears.
  18. There’s an interactive mode which asks for confirmation each
    time a process is spawned by systemd. You may enable it by
    passing systemd.confirm_spawn=1 on the kernel command
    line.
  19. With the systemd.default= kernel command line
    parameter you can specify which unit systemd should start on
    boot-up. Normally you’d specify something like
    multi-user.target here, but another choice could even
    be a single service instead of a target, for example
    out-of-the-box we ship a service emergency.service that
    is similar in its usefulness as init=/bin/bash, however
    has the advantage of actually running the init system, hence
    offering the option to boot up the full system from the
    emergency shell.
  20. There’s a minimal UI that allows you to
    start/stop/introspect services. It’s far from complete but
    useful as a debugging tool. It’s written in Vala (yay!) and goes
    by the name of systemadm.

It should be noted that systemd uses many Linux-specific
features, and does not limit itself to POSIX. That unlocks a lot
of functionality a system that is designed for portability to
other operating systems cannot provide.

Status

All the features listed above are already implemented. Right
now systemd can already be used as a drop-in replacement for
Upstart and sysvinit (at least as long as there aren’t too many
native upstart services yet. Thankfully most distributions don’t
carry too many native Upstart services yet.)

However, testing has been minimal, our version number is
currently at an impressive 0. Expect breakage if you run this in
its current state. That said, overall it should be quite stable
and some of us already boot their normal development systems with
systemd (in contrast to VMs only). YMMV, especially if you try
this on distributions we developers don’t use.

Where is This Going?

The feature set described above is certainly already
comprehensive. However, we have a few more things on our plate. I
don’t really like speaking too much about big plans but here’s a
short overview in which direction we will be pushing this:

We want to add at least two more unit types: swap
shall be used to control swap devices the same way we
already control mounts, i.e. with automatic dependencies on the
device tree devices they are activated from, and
suchlike. timer shall provide functionality similar to
cron, i.e. starts services based on time events, the
focus being both monotonic clock and wall-clock/calendar
events. (i.e. “start this 5h after it last ran” as well as “start
this every monday 5 am”)

More importantly however, it is also our plan to experiment with
systemd not only for optimizing boot times, but also to make it
the ideal session manager, to replace (or possibly just augment)
gnome-session, kdeinit and similar daemons. The problem set of a
session manager and an init system are very similar: quick start-up
is essential and babysitting processes the focus. Using the same
code for both uses hence suggests itself. Apple recognized that
and does just that with launchd. And so should we: socket and bus
based activation and parallelization is something session services
and system services can benefit from equally.

I should probably note that all three of these features are
already partially available in the current code base, but not
complete yet. For example, already, you can run systemd just fine
as a normal user, and it will detect that is run that way and
support for this mode has been available since the very beginning,
and is in the very core. (It is also exceptionally useful for
debugging! This works fine even without having the system
otherwise converted to systemd for booting.)

However, there are some things we probably should fix in the
kernel and elsewhere before finishing work on this: we
need swap status change notifications from the kernel similar to
how we can already subscribe to mount changes; we want a
notification when CLOCK_REALTIME jumps relative to
CLOCK_MONOTONIC; we want to allow normal processes to get
some init-like powers
; we need a well-defined
place where we can put user sockets
. None of these issues are
really essential for systemd, but they’d certainly improve
things.

You Want to See This in Action?

Currently, there are no tarball releases, but it should be
straightforward to check out the code from our
repository
. In addition, to have something to start with, here’s
a tarball with unit configuration files
that allows an
otherwise unmodified Fedora 13 system to work with systemd. We
have no RPMs to offer you for now.

An easier way is to download this Fedora 13 qemu image, which
has been prepared for systemd. In the grub menu you can select
whether you want to boot the system with Upstart or systemd. Note
that this system is minimally modified only. Service information
is read exclusively from the existing SysV init scripts. Hence it
will not take advantage of the full socket and bus-based
parallelization pointed out above, however it will interpret the
parallelization hints from the LSB headers, and hence boots faster
than the Upstart system, which in Fedora does not employ any
parallelization at the moment. The image is configured to output
debug information on the serial console, as well as writing it to
the kernel log buffer (which you may access with dmesg).
You might want to run qemu configured with a virtual
serial terminal. All passwords are set to systemd.

Even simpler than downloading and booting the qemu image is
looking at pretty screen-shots. Since an init system usually is
well hidden beneath the user interface, some shots of
systemadm and ps must do:

systemadm

That’s systemadm showing all loaded units, with more detailed
information on one of the getty instances.

ps

That’s an excerpt of the output of ps xaf -eo
pid,user,args,cgroup
showing how neatly the processes are
sorted into the cgroup of their service. (The fourth column is the
cgroup, the debug: prefix is shown because we use the
debug cgroup controller for systemd, as mentioned earlier. This is
only temporary.)

Note that both of these screenshots show an only minimally
modified Fedora 13 Live CD installation, where services are
exclusively loaded from the existing SysV init scripts. Hence,
this does not use socket or bus activation for any existing
service.

Sorry, no bootcharts or hard data on start-up times for the
moment. We’ll publish that as soon as we have fully parallelized
all services from the default Fedora install. Then, we’ll welcome
you to benchmark the systemd approach, and provide our own
benchmark data as well.

Well, presumably everybody will keep bugging me about this, so
here are two numbers I’ll tell you. However, they are completely
unscientific as they are measured for a VM (single CPU) and by
using the stop timer in my watch. Fedora 13 booting up with
Upstart takes 27s, with systemd we reach 24s (from grub to gdm,
same system, same settings, shorter value of two bootups, one
immediately following the other). Note however that this shows
nothing more than the speedup effect reached by using the LSB
dependency information parsed from the init script headers for
parallelization. Socket or bus based activation was not utilized
for this, and hence these numbers are unsuitable to assess the
ideas pointed out above. Also, systemd was set to debug verbosity
levels on a serial console. So again, this benchmark data has
barely any value.

Writing Daemons

An ideal daemon for use with systemd does a few things
differently then things were traditionally done. Later on, we will
publish a longer guide explaining and suggesting how to write a daemon for use
with this systemd. Basically, things get simpler for daemon
developers:

  • We ask daemon writers not to fork or even double fork
    in their processes, but run their event loop from the initial process
    systemd starts for you. Also, don’t call setsid().
  • Don’t drop user privileges in the daemon itself, leave this
    to systemd and configure it in systemd service configuration
    files. (There are exceptions here. For example, for some daemons
    there are good reasons to drop privileges inside the daemon
    code, after an initialization phase that requires elevated
    privileges.)
  • Don’t write PID files
  • Grab a name on the bus
  • You may rely on systemd for logging, you are welcome to log
    whatever you need to log to stderr.
  • Let systemd create and watch sockets for you, so that socket
    activation works. Hence, interpret $LISTEN_FDS and
    $LISTEN_PID as described above.
  • Use SIGTERM for requesting shut downs from your daemon.

The list above is very similar to what Apple
recommends for daemons compatible with launchd
. It should be
easy to extend daemons that already support launchd
activation to support systemd activation as well.

Note that systemd supports daemons not written in this style
perfectly as well, already for compatibility reasons (launchd has
only limited support for that). As mentioned, this even extends to
existing inetd capable daemons which can be used unmodified for
socket activation by systemd.

So, yes, should systemd prove itself in our experiments and get
adopted by the distributions it would make sense to port at least
those services that are started by default to use socket or
bus-based activation. We have
written proof-of-concept patches
, and the porting turned out
to be very easy. Also, we can leverage the work that has already
been done for launchd, to a certain extent. Moreover, adding
support for socket-based activation does not make the service
incompatible with non-systemd systems.

FAQs

Who’s behind this?
Well, the current code-base is mostly my work, Lennart
Poettering (Red Hat). However the design in all its details is
result of close cooperation between Kay Sievers (Novell) and
me. Other people involved are Harald Hoyer (Red Hat), Dhaval
Giani (Formerly IBM), and a few others from various
companies such as Intel, SUSE and Nokia.
Is this a Red Hat project?
No, this is my personal side project. Also, let me emphasize
this: the opinions reflected here are my own. They are not
the views of my employer, or Ronald McDonald, or anyone
else.
Will this come to Fedora?
If our experiments prove that this approach works out, and
discussions in the Fedora community show support for this, then
yes, we’ll certainly try to get this into Fedora.
Will this come to OpenSUSE?
Kay’s pursuing that, so something similar as for Fedora applies here, too.
Will this come to Debian/Gentoo/Mandriva/MeeGo/Ubuntu/[insert your favourite distro here]?
That’s up to them. We’d certainly welcome their interest, and help with the integration.
Why didn’t you just add this to Upstart, why did you invent something new?
Well, the point of the part about Upstart above was to show
that the core design of Upstart is flawed, in our
opinion. Starting completely from scratch suggests itself if the
existing solution appears flawed in its core. However, note that
we took a lot of inspiration from Upstart’s code-base
otherwise.
If you love Apple launchd so much, why not adopt that?
launchd is a great invention, but I am not convinced that it
would fit well into Linux, nor that it is suitable for a system
like Linux with its immense scalability and flexibility to
numerous purposes and uses.
Is this an NIH project?
Well, I hope that I managed to explain in the text above why
we came up with something new, instead of building on Upstart or
launchd. We came up with systemd due to technical
reasons, not political reasons.
Don’t forget that it is Upstart that includes
a library called NIH
(which is kind of a reimplementation of glib) — not systemd!
Will this run on [insert non-Linux OS here]?
Unlikely. As pointed out, systemd uses many Linux specific
APIs (such as epoll, signalfd, libudev, cgroups, and numerous
more), a port to other operating systems appears to us as not
making a lot of sense. Also, we, the people involved are
unlikely to be interested in merging possible ports to other
platforms and work with the constraints this introduces. That said,
git supports branches and rebasing quite well, in case
people really want to do a port.
Actually portability is even more limited than just to other OSes: we require a very
recent Linux kernel, glibc, libcgroup and libudev. No support for
less-than-current Linux systems, sorry.
If folks want to implement something similar for other
operating systems, the preferred mode of cooperation is probably
that we help you identify which interfaces can be shared with
your system, to make life easier for daemon writers to support
both systemd and your systemd counterpart. Probably, the focus should be
to share interfaces, not code.
I hear [fill one in here: the Gentoo boot system, initng,
Solaris SMF, runit, uxlaunch, …] is an awesome init system and
also does parallel boot-up, so why not adopt that?
Well, before we started this we actually had a very close
look at the various systems, and none of them did what we had in
mind for systemd (with the exception of launchd, of course). If
you cannot see that, then please read again what I wrote
above.

Contributions

We are very interested in patches and help. It should be common
sense that every Free Software project can only benefit from the
widest possible external contributions. That is particularly true
for a core part of the OS, such as an init system. We value your
contributions and hence do not require copyright assignment (Very
much unlike Canonical/Upstart
!). And also, we use git,
everybody’s favourite VCS, yay!

We are particularly interested in help getting systemd to work
on other distributions, besides Fedora and OpenSUSE. (Hey, anybody
from Debian, Gentoo, Mandriva, MeeGo looking for something to do?)
But even beyond that we are keen to attract contributors on every
level: we welcome C hackers, packagers, as well as folks who are interested
to write documentation, or contribute a logo.

Community

At this time we only have source code
repository
and an IRC channel (#systemd on
Freenode). There’s no mailing list, web site or bug tracking
system. We’ll probably set something up on freedesktop.org
soon. If you have any questions or want to contact us otherwise we
invite you to join us on IRC!

Update: our GIT repository has moved.

Bossa 2010/Manaus Slides

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/bossa2010.html

The slides for my talk about the audio infrastructure of Linux mobile
devices at BOSSA 2010 in Manaus/Brazil are now available
online
. They are terse (as usual), and the most interesting stuff is
probably in what I said, and not so much in what I wrote in those slides. But
nonetheless I believe this might still be quite interesting for attendees as
well as non-attendees.

The talk focuses on the audio architecture of the Nokia N900 and the Palm
Pre, and of course particularly their use of PulseAudio for all things audio. I analyzed
and compared their patch sets to figure out what their priorities are, what we
should move into PulseAudio mainline, and what should better be left in their
private patch sets.

Bossa 2010/Manaus Slides

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/bossa2010.html

The slides for my talk about the audio infrastructure of Linux mobile
devices at BOSSA 2010 in Manaus/Brazil are now available
online
. They are terse (as usual), and the most interesting stuff is
probably in what I said, and not so much in what I wrote in those slides. But
nonetheless I believe this might still be quite interesting for attendees as
well as non-attendees.

The talk focuses on the audio architecture of the Nokia N900 and the Palm
Pre, and of course particularly their use of PulseAudio for all things audio. I analyzed
and compared their patch sets to figure out what their priorities are, what we
should move into PulseAudio mainline, and what should better be left in their
private patch sets.

Musings on Software Freedom for Mobile Devices

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2010/03/04/mobile.html

I started using GNU/Linux and Free Software in 1992. In those days,
while everything I needed for a working computer was generally available
in software freedom, there were many components and applications that
simply did not exist. For highly technical users who did not need many
peripherals, the Free Software community had reached a state of complete
software freedom. Yet, in 1992, everyone agreed there was still much work
to be done. Even today, we still strive for a desktop and server
operating system, with all relevant applications, that grants complete
software freedom.

Looked at broadly, mobile telephone systems are not all that different
from 1992-era GNU/Linux systems. The basics are currently available as
Free, Libre, and Open Source Software (FLOSS). If you need only the bare
minimum of functionality, you can, by picking the right phone hardware,
run an almost completely FLOSS operating system and application set. Yet,
we have so far to go. This post discusses the current penetration of
FLOSS in mobile devices and offers a path forward for free software
advocates.

A Brief History

The mobile telephone market has never functioned like the traditional
computer market. Historically, the mobile user made arrangements with some
network carrier through a long-term contract. That carrier
“gave” the user a phone or discounted it as a loss-leader. Under
that system, few people take their phone hardware choice all that
seriously. Perhaps users pay a bit more for a slightly better phone, but
generally they nearly always pick among the limited choices provided by
the given carrier.

Meanwhile, Research in Motion was the first to provide
corporate-slave-oriented email-enabled devices. Indeed, with the very
recent focus on consumer-oriented devices like the iPhone, most users
forget that Apple is by far not the preferred fruit for the smart phone
user. Today, most people using a “smart phone” are using one
given to them by their employer to chain them to their office email 24/7.

Apple, excellent at manipulating users into paying more for a product
merely because it is shiny, also convinced everyone that now a phone
should be paid for separately, and contracts should go even longer. The
“race to mediocrity” of the phone market has ended. Phones
need real features to stand out. Phones, in fact, aren’t phones
anymore. They are small mobile computers that can also make phone calls.

If these small computers had been introduced in 1992, I suppose I’d be
left writing the Mobile GNU Manifesto, calling for developers
to start from scratch writing operating systems for these new computers,
so that all users could have software freedom. Fortunately, we have
instead been given a head start. Unlike in 1992, not every company in the
market today is completely against releasing Free Software. Specifically,
two companies have seen some value in releasing (some parts of) phone
operating systems as Free Software: Nokia and Google. However, the two
companies have done this for radically different reasons.

The Current State of Mobile Software Freedom
For its part, Nokia likely benefited greatly from the traditional
carrier system. Most of their phones were provided relatively cheaply with
contracts. Their interest in software freedom was limited and perhaps even
non-existent. Nokia sold new hardware every time a phone contract was
renewed, and the carrier paid the difference between the loss-leader price
and Nokia’s wholesale cost. The software on the devices was simple and
mostly internally developed. What incentive did Nokia have to release
software in software freedom? (Nokia realized too late this was the wrong
position, but more on that later.)

In parallel, Nokia had chased another market that I’ve never fully
understood: the tablet PC. Not big enough to be a real computer, but too
large to be a phone, these devices have been an idea looking for a user
base. Regardless of my personal views on these systems, though, GNU/Linux
remains the ideal system for these devices, and Nokia saw that. Nokia
built the Debian-ish Maemo system as a
tablet system, with no phone. However, I can count on one hand all the
people I’ve met who bothered with these devices; I just don’t think a
phone-less small computer is going to ever become the rage, even if Apple
dumps billions into marketing the iPad. (Anyone remember the Newton?)

I cannot explain, nor do I even understand, why Nokia took so long to
use Maemo as a platform for a tablet-like telephone. But, a few months
ago, they finally released
one. This N900 is among only a
few available phones that make any strides toward a fully free software
phone platform. Yet,
the list of
proprietary components required for operation
remains quite long. The
common joke is that you can’t even charge the battery on your N900 without
proprietary software.

While there are surely people inside Nokia who want more software
freedom on their devices, Nokia is fundamentally a hardware company
experimenting with software freedom in hopes that it will bolster hardware
sales. Convincing Nokia to shorten that proprietary list will prove
difficult, and the community based effort to replace that long list with
FLOSS (called Mer) faces many
challenges. (These challenges will likely increase with the recent Maemo
merger with Moblin to form MeeGo).

Fortunately, hardware companies are not the only entity interested in
phone operating systems. Google, ever-focused on routing human eyes to its
controlled advertising, realizes that even more eyes will be on mobile
computing platforms in the future. With this goal in mind, Google released
the Android/Linux system, now available on a variety of phones in varying
degrees of software freedom.

Google’s motives are completely different than Nokia’s. Technically,
Google has no hardware to sell. They do have a set of proprietary
applications that yield the “Google online experience” to
deliver Google’s advertising. From Google’s point of view, an
easy-to-adopt, licensing-unencumbered platform will broaden their
advertising market.

Thus, Android/Linux is a nearly fully non-copylefted phone operating
system platform where Linux is the only GPL licensed component essential
to Android’s operation. Ideally, Google wants to see Android adopted
broadly in both Free Software and mixed Free/proprietary deployments.
Google’s goals do not match that of the software freedom community, so in
some cases, a given Android/Linux device will give the user more software
freedom than the N900, but in many cases it will give much less.

The HTC
Dream
is the only Android/Linux device I know of where a careful
examination of the necessary proprietary components have been
analyzed. Obviously, the “Google experience” applications are
proprietary. There also
are about
20 hardware interface libraries
that do not have source code available
in a public repository. However, when lined up against the N900 with
Maemo, Android on the HTC Dream can be used as an operational mobile
telephone and 3G Internet device using only three proprietary components:
a proprietary GSM firmware, proprietary Wifi firmware, and two audio
interface libraries. Further proprietary components are needed if you
want a working accelerometer, camera, and video codecs as their hardware
interface libraries are all proprietary.

Based on this analysis, it appears that the HTC Dream currently gives
the most software freedom among Android/Linux deployments. It is unlikely
that Google wants anything besides their applications to be proprietary.
While Google has been unresponsive when asked why these hardware interface
libraries are proprietary, it is likely that HTC, the hardware maker with
whom Google contracted, insisted that these components remain proprietary,
and perhaps
fear patent
suits like the one filed this week
are to blame here. Meanwhile,
while no detailed analysis of
the Nexus One is yet available,
it’s likely similar to the HTC Dream.

Other Android/Linux devices are now available, such as those from
Motorola and Samsung. There appears to have been no detailed analysis done
yet on the relative proprietary/freeness ratio of these Android
deployments. One can surmise that since these devices are from
traditionally proprietary hardware makers, it is unlikely that these
platforms are freer than those available from Google, whose maximal
interest in a freely available operating system is clear and in contrast
to the traditional desires of hardware makers.

Whether the software is from a hardware-maker desperately trying a new
hardware sales strategy, or an advertising salesman who wants some
influence over an operating system choice to improve ad delivery, the
software freedom community cannot assume that the stewards of these
codebases have the interests of the user community at heart. Indeed, the
interests between these disparate groups will only occasionally be
aligned. Community-oriented forks, as has begun in the Maemo community
with Mer, must also begin in the Android/Linux space too. We are slowly
trying with
the Replicant
project
, founded by myself and my colleague Aaron Williamson.

A healthy community-oriented phone operating system project will
ultimately be an essential component to software freedom on these devices.
For example, consider the fate of the Mer project now that Nokia has
announced the merger of Maemo with Moblin. Mer does seek to cherry-pick
from various small device systems, but its focus was to create a freer
Maemo that worked on more devices. Mer now must choose between following
the Maemo in the merge with Moblin, or becoming a true fork. Ideally, the
right outcome for software freedom is a community-led effort, but there
may not be enough community interest, time and commitment to shepherd a
fork while Intel and Nokia push forward on a corporate-controlled
codebase. Further, Moblin will likely push the MeeGo project toward more
of a tablet-PC operating system than a smart phone.

A community-oriented Android/Linux fork has more hope. Google has
little to lose by encouraging and even assisting with such forks; such
effort would actually be wholly consistent with Google’s goals for wider
adoption of platforms that allow deployment of Google’s proprietary
applications. I expect that operating system
software-freedom-motivated efforts will be met with more support from
Google than from Nokia and/or Intel.

However, any operating system, even a mobile device one, needs many
applications to be useful. Google experience applications for
Android/Linux are merely the beginning of the plethora of proprietary
applications that will ultimately be available for MeeGo and Android/Linux
platforms. For FLOSS developers who don’t have a talent for low-level
device libraries and operating system software, these applications
represent a straightforward contribution towards mobile software
freedom. (Obviously, though, if one does have talent for low-level
programming, replacing the proprietary .so’s on Android/Linux would be the
optimal contribution.)

Indeed, on this point, we can take a page from Free Software history.
From the early 1990s onward, fully free GNU/Linux systems succeeded as
viable desktop and server systems because disparate groups of developers
focused simultaneously on both operating systems and application software.
We need that simultaneous diversity of improvement to actually compete
with the fully proprietary alternatives, and to ensure that the
“mostly FLOSS” systems of today are not the “barely
FLOSS” systems of tomorrow.

Careful readers have likely noticed that I have ignored Nokia’s other
release, the Symbian> codebase. Every time I write or speak about the
issues of software freedom in mobile devices, I’m chastised for leaving it
out of the story. My answer is always simple: when a FLOSS version of
Symbian can be compiled from source code, using a FLOSS compiler or SDK,
and that binary can be installed onto an actual working mobile phone
device, then (and only then) will I believe that the Symbian source
release has value beyond historical interest. We have to get honest as a
community about the future of Symbian: it’s a ten-year-old proprietary
codebase designed for devices of that era that doesn’t bootstrap with any
compilers our community uses regularly. Unless there’s a radical change
to these facts, the code belongs in a museum, not running on a phone.

Also, lest my own community of hard-core FLOSS advocates flame me, I
must also mention
the Neo FreeRunner
device
and the
OpenMoko project.
This was a noble experiment: a freely specified hardware platform running
100% FLOSS. I used an OpenMoko FreeRunner myself, hoping that it would be
the mobile phone our community could rally around. I do think the device
and its (various) software stack(s) have a future as an experimental,
hobbyist device. But, just as GNU/Linux needed to focus on x86 hardware
to succeed, so must software freedom efforts in mobile systems focus on
mass-market, widely used, and widely available hardware.

Jailbreaking and the Self-Installed System

When some of us at my day-job office decided to move as close to a
software freedom phone platform as we could, we picked Android/Linux and
the HTC Dream. However, we carefully considered the idea of permission to
run one’s own software on the device. In the desktop and server system
market, this is not a concern, but on mobile systems, it is a central
question.

The holdover of those carrier-controlled agreements for phone
acquisition is the demand that devices be locked down. Devices are locked
down first to a single carrier’s network, so that devices cannot (legally)
be resold as phones ready for any network. Second, carriers believe that
they must fear the FCC if device operating systems can be reinstalled.

On the first point, Google is our best ally in this
regard. The
HTC Dream developer models, called the Android Dev Phone 1 (aka ADP1)
,
while somewhat more expensive than T-Mobile branded G1s, permit the user
to install any operating system on the phone, and the purchase agreement
extract no promises from the purchaser regarding what software runs on the
device. Google has no interest in locking you to a single carrier, but
only to a single Google experience application vendor. Offering a user
“carrier freedom of choice”, while tying those users tighter
to Google applications, is probably a central part of their marketing
plans.

The second point — fear of an FCC crack down when mobile users
have software freedom — is beyond the scope of this
article. However, what Atheros has done with their Wifi devices shows that
software freedom and FCC compliance can co-exist. Furthermore, the
central piece of FCC’s concern — the GSM chipset and firmware
— runs on a separate processor in modern mobile devices. This is a
software freedom battle for another day, but it shows that the FCC can be
pacified in the meantime by keeping the GSM device a black box to the Free
Software running on the primary processor of the device.
Conclusion

Seeking software freedom on mobile devices will remain a complicated
endeavor for some time. Our community should utilize the FLOSS releases
from companies, but should not forget that, until viable community forks
exist, software freedom on these devices exists at the whim of these
companies. A traditional “get some volunteers together and write
some code” approach can achieve great advancement toward
community-oriented FLOSS systems on mobile devices. Developers could
initially focus on applications for the existing “mostly
FLOSS” platforms of MeeGo and Android/Linux. The challenging and
more urgent work is to replace lower-level proprietary components on these
systems with FLOSS alternatives, but admittedly needs special programming
skills that aren’t easy to find.

(This blog post first appeared
as an
article
in
the March
2010 issue
of the Canadian online
journal, The Open Source Business Resource
.)

Conferences

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/lpc-bluez-maemo-2009.html

Last week I’ve been at the Linux Plumbers Conference in
Portland. Like last year it kicked ass and proved again being one of the most
relevant Linux developer conferences (if not the most relevant one). I
ran the Audio MC at the conference which was very well attended. The slides
for our four talks in the
track are available online
. (My own slides are probably a bit too terse
for most readers, the interesting stuff was in the talking, not the
reading…) Personally, for me the most interesting part was to see to which
degree Nokia actually adopted PulseAudio
in the N900. While I was aware that Nokia was using it, I wasn’t aware that
their use is as comprehensive as it turned out it is. And the industry
support from other companies is really impressive too. After the main track we
had a BoF session, which notes I’ll post a bit later. Many thanks to Paul,
Jyri, Pierre for their great talks. Unfortunately, Palm, the only manufacturer
who is actually already shipping a phone with PulseAudio didn’t send anyone to
the conference who wanted to talk about that. Let’s hope they’ll eventually
learn that just throwing code over the wall is not how Open Source works.
Maybe they’ll send someone to next year’s LPC in Boston, where I hope to be
able to do the Audio MC again.

Right now I am at the BlueZ Summit in Stuttgart. Among other things we have
been discussing how to improve Bluetooth Audio support in PulseAudio. I guess
one could say thet the Bluetooth support in PulseAudio is already one of its
highlights, in fact working better then the support on other OSes (yay, that’s
an area where Linux Audio really shines!). So up next is better support for
allowing PA to receive A2DP audio, i.e. making PA act as if it was a Headset or
your hifi. Use case: send music from from your mobile to your desktop’s hifi
speakers. (Actually this is already support in current BlueZ/PA versions, but
not easily accessible). Also Bluetooth headsets tend to support AC3 or MP3
decoding natively these days so we should support that in PA too. Codec
handling has been on the TODO list for PA for quite some time, for the SPDIF or
HDMI cases, and Bluetooth Audio is another reason why we really should have
that.

Next week I’ll be at the Maemo Summit in Amsterdam.
Nokia kindly invited me. Unfortunately I was a bit too late to get a proper
talk accepted. That said, I am sure if enough folks are interested we could do
a little ad-hoc BoF and find some place at the venue for it. If you have any
questions regarding PA just talk to me. The N900 uses PulseAudio for all things
audio so I am quite sure we’ll have a lot to talk about.

See you in Amsterdam!

One last thing: Check out Colin’s
work to improve integration of PulseAudio and KDE
!

Conferences

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/lpc-bluez-maemo-2009.html

Last week I’ve been at the Linux Plumbers Conference in
Portland. Like last year it kicked ass and proved again being one of the most
relevant Linux developer conferences (if not the most relevant one). I
ran the Audio MC at the conference which was very well attended. The slides
for our four talks in the
track are available online
. (My own slides are probably a bit too terse
for most readers, the interesting stuff was in the talking, not the
reading…) Personally, for me the most interesting part was to see to which
degree Nokia actually adopted PulseAudio
in the N900. While I was aware that Nokia was using it, I wasn’t aware that
their use is as comprehensive as it turned out it is. And the industry
support from other companies is really impressive too. After the main track we
had a BoF session, which notes I’ll post a bit later. Many thanks to Paul,
Jyri, Pierre for their great talks. Unfortunately, Palm, the only manufacturer
who is actually already shipping a phone with PulseAudio didn’t send anyone to
the conference who wanted to talk about that. Let’s hope they’ll eventually
learn that just throwing code over the wall is not how Open Source works.
Maybe they’ll send someone to next year’s LPC in Boston, where I hope to be
able to do the Audio MC again.

Right now I am at the BlueZ Summit in Stuttgart. Among other things we have
been discussing how to improve Bluetooth Audio support in PulseAudio. I guess
one could say thet the Bluetooth support in PulseAudio is already one of its
highlights, in fact working better then the support on other OSes (yay, that’s
an area where Linux Audio really shines!). So up next is better support for
allowing PA to receive A2DP audio, i.e. making PA act as if it was a Headset or
your hifi. Use case: send music from from your mobile to your desktop’s hifi
speakers. (Actually this is already support in current BlueZ/PA versions, but
not easily accessible). Also Bluetooth headsets tend to support AC3 or MP3
decoding natively these days so we should support that in PA too. Codec
handling has been on the TODO list for PA for quite some time, for the SPDIF or
HDMI cases, and Bluetooth Audio is another reason why we really should have
that.

Next week I’ll be at the Maemo Summit in Amsterdam.
Nokia kindly invited me. Unfortunately I was a bit too late to get a proper
talk accepted. That said, I am sure if enough folks are interested we could do
a little ad-hoc BoF and find some place at the venue for it. If you have any
questions regarding PA just talk to me. The N900 uses PulseAudio for all things
audio so I am quite sure we’ll have a lot to talk about.

See you in Amsterdam!

One last thing: Check out Colin’s
work to improve integration of PulseAudio and KDE
!

LGPL’ing of Qt Will Encourage More Software Freedom

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2009/01/14/qt-lgpl.html

The decision between the GPL or LGPL for a library is a complex one,
particularly when that library solves a new problem or an old problem in
a new way. TrollTech faced this decision for the Qt library, and Nokia
(who acquired Trolltech last year) has now reconsidered the question and
come to a different conclusion. Having followed this situation since
even before Qt was GPL’d, I was glad that we have successfully
encouraged the reconsideration of this decision.

Years ago, RMS wrote what many consider the definitive essay on this
subject,
entitled Why
you shouldn’t use the Lesser GPL for your next library
. A
few times a year, I find myself rereading that essay because I believe
it puts forward some good points to think about when making this
decision.

Nevertheless, there is a strong case for the LGPL in many situations.
Sometimes, pure copyleft negatively impacts the goal of maximal software
freedom. The canonical example, of course, is the GNU C Library (which
was probably the first program ever LGPL’d).

Glibc was LGPL’d, in part, because it was unlikely at the time that
anyone would adopt a fully FaiF (Free as in Freedom) operating system
that didn’t allow any proprietary applications. Almost every program on
a Unix-like system combines with the C library, and if it were GPL’d,
all applications would be covered by the GPL. Users of the system
would have freedom, but encouraging the switch would be painful because
they’d have to give up all proprietary software all at once.

The GNU authors knew that there would be proprietary software for quite
some time, as our community slowly replaced each application with
freedom-respecting implementations. In the meantime, better that
proprietary software users have a FaiF C library and a FaiF operating
system to use (even with proprietary applications) while work
continued.

We now face a similar situation in the mobile device space. Most
mobile devices used today are locked down, top to bottom. It makes
sense to implement the approach we know works from our two decades of
experience — liberate the operating system first and the
applications will slowly follow.

This argument informs the decision about Qt’s licensing. Qt and its
derivatives are widely used as graphics toolkits in mobile devices.
Until now, Qt was licensed under GPL (and before that various semi-Free
licenses). Not only did the GPL create a “best is the enemy of
the good” situation, but those companies that rejected the GPL
could simply license a proprietary copy from TrollTech, which further
ghettoized the GPL’d versions. All that is now changing.

Beyond encouraging FaiF mobile operating systems, this change to LGPL
yields an important side benefit. While the proprietary relicensing
business is a common and legitimate business model to fund further
development, it also has some negative social side effects. The
codebase often lives in a silo, discouraging contributions from those
who don’t receive funding from the company who controls the canonical
upstream.

A change to LGPL sends a loud and clear message — the proprietary
relicensing business for Qt is over. Developers who have previously
rejected Qt because it was not community-developed might want to
reconsider that position in light of this news. We don’t know yet how
the new Qt community will be structured, but it’s now clear that Nokia,
Qt’s new copyright holder, no longer has a vested interest in
proprietary relicensing. The opportunity for a true software freedom
community around Qt’s code base has maximum potential at this moment. A
GUI programmer I am not; but I hope those who are will take a look and
see how to create the software freedom development community that Qt
needs.