All posts by Lennart Poettering

systemd for Administrators, Part VII

2011-04-12 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/blame-game.html

Here’s yet another installment of my ongoing
series
on
systemd
for
Administrators:

The Blame Game

Fedora 15^[1] is the first Fedora release to sport systemd. Our
primary goal for F15 was to get everything integrated and working
well. One focus for Fedora 16 will be to further polish and speed up
what we have in the distribution now. To prepare for this cycle we
have implemented a few tools (which are already available in F15),
which can help us pinpoint where exactly the biggest problems in our
boot-up remain. With this blog story I hope to shed some light on how
to figure out what to blame for your slow boot-up, and what to do
about it. We want to allow you to put the blame where the blame
belongs: on the system component responsible.

The first utility is a very simple one: systemd will automatically
write a log message with the time it needed to syslog/kmsg when it
finished booting up.

systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.

And here’s how you read this: 2s have been spent for kernel
initialization, until the time where the initial RAM disk (initrd,
i.e. dracut) was started. A bit less than 3s have then been spent in
the initrd. Finally, a bit less than 12s have been spent after the
actual system init daemon (systemd) has been invoked by the initrd to
bring up userspace. Summing this up the time that passed since the
boot loader jumped into the kernel code until systemd was finished
doing everything it needed to do at boot was a bit less than 17s. This
number is nice and simple to understand — and also easy to
misunderstand: it does not include the time that is spent initializing
your GNOME session, as that is outside of the scope of the init
system. Also, in many cases this is just where systemd finished doing
everything it needed to do. Very likely some daemons are still busy
doing whatever they need to do to finish startup when this time
is elapsed. Hence: while the time logged here is a good indication on
the general boot speed, it is not the time the user might feel
the boot actually takes.

Also, it is a pretty superficial value: it gives no insight which
system component systemd was waiting for all the time. To break this
up, we introduced the tool systemd-analyze blame:

$ systemd-analyze blame
  6207ms udev-settle.service
  5228ms cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service
   735ms NetworkManager.service
   642ms avahi-daemon.service
   600ms abrtd.service
   517ms rtkit-daemon.service
   478ms fedora-storage-init.service
   396ms dbus.service
   390ms rpcidmapd.service
   346ms systemd-tmpfiles-setup.service
   322ms fedora-sysinit-unhack.service
   316ms cups.service
   310ms console-kit-log-system-start.service
   309ms libvirtd.service
   303ms rpcbind.service
   298ms ksmtuned.service
   288ms lvm2-monitor.service
   281ms rpcgssd.service
   277ms sshd.service
   276ms livesys.service
   267ms iscsid.service
   236ms mdmonitor.service
   234ms nfslock.service
   223ms ksm.service
   218ms mcelog.service
...

This tool lists which systemd unit needed how much time to finish
initialization at boot, the worst offenders listed first. What we can
see here is that on this boot two services required more than 1s of
boot time: udev-settle.service and
cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service. This
tool’s output is easily misunderstood as well, it does not shed any
light on why the services in question actually need this much time, it
just determines that they did. Also note that the times listed here
might be spent “in parallel”, i.e. two services might be initializing
at the same time and thus the time spent to initialize them both is
much less than the sum of both individual times combined.

Let’s have a closer look at the worst offender on this boot: a
service by the name of udev-settle.service. So why does it
take that much time to initialize, and what can we do about it? This
service actually does very little: it just waits for the device
probing being done by udev to finish and then exits. Device probing
can be slow. In this instance for example, the reason for the device
probing to take more than 6s is the 3G modem built into the machine,
which when not having an inserted SIM card takes this long to respond
to software probe requests. The software probing is part of the logic
that makes ModemManager work and enables NetworkManager to offer easy
3G setup. An obvious reflex might now be to blame ModemManager for
having such a slow prober. But that’s actually ill-directed: hardware
probing quite frequently is this slow, and in the case of ModemManager
it’s a simple fact that the 3G hardware takes this long. It is an
essential requirement for a proper hardware probing solution that
individual probers can take this much time to finish probing. The
actual culprit is something else: the fact that we actually wait for
the probing, in other words: that udev-settle.service is part
of our boot process.

So, why is udev-settle.service part of our boot process?
Well, it actually doesn’t need to be. It is pulled in by the storage
setup logic of Fedora: to be precise, by the LVM, RAID and Multipath
setup script. These storage services have not been implemented in the
way hardware detection and probing work today: they expect to be
initialized at a point in time where “all devices have been probed”,
so that they can simply iterate through the list of available disks
and do their work on it. However, on modern machinery this is not how
things actually work: hardware can come and hardware can go all the
time, during boot and during runtime. For some technologies it is not
even possible to know when the device enumeration is complete
(example: USB, or iSCSI), thus waiting for all storage devices to show
up and be probed must necessarily include a fixed delay when it is
assumed that all devices that can show up have shown up, and got
probed. In this case all this shows very negatively in the boot time: the
storage scripts force us to delay bootup until all potential devices
have shown up and all devices that did got probed — and all that even
though we don’t actually need most devices for anything. In particular
since this machine actually does not make use of LVM, RAID or
Multipath!^[2]

Knowing what we know now we can go and disable
udev-settle.service for the next boots: since neither LVM,
RAID nor Multipath is used we can mask the services in question and
thus speed up our boot a little:

# ln -s /dev/null /etc/systemd/system/udev-settle.service
# ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service
# ln -s /dev/null /etc/systemd/system/fedora-storage-init.service
# systemctl daemon-reload

After restarting we can measure that the boot is now about 1s
faster. Why just 1s? Well, the second worst offender is cryptsetup
here: the machine in question has an encrypted
/home directory. For testing purposes I have stored the
passphrase in a file on disk, so that the boot-up is not delayed
because I as the user am a slow typer. The cryptsetup tool
unfortunately still takes more han 5s to set up the encrypted
partition. Being lazy instead of trying to fix
cryptsetup^[3] we’ll just tape over it here ^[4]:
systemd will normally wait for all file systems not marked with the
noauto option in /etc/fstab to show up, to be fscked and to
be mounted before proceeding bootup and starting the usual system
services. In the case of /home (unlike for example
/var) we know that it is needed only very late (i.e. when the
user actually logs in). An easy fix is hence to make the mount point
available already during boot, but not actually wait until cryptsetup,
fsck and mount finished running for it. You ask how we can make a
mount point available before actually mounting the file system behind
it? Well, systemd possesses magic powers, in form of the
comment=systemd.automount mount option in
/etc/fstab. If you specify it, systemd will create an
automount point at /home and when at the time of the first
access to the file system it still isn’t backed by a proper file
system systemd will wait for the device, fsck and mount it.

And here’s the result with this change to /etc/fstab
made:

systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.

Nice! With a few fixes we took almost 7s off our boot-time. And
these two changes are only fixes for the two most superficial
problems. With a bit of love and detail work there’s a lot of
additional room for improvements. In fact, on a different machine, a
more than two year old X300 laptop (which even back then wasn’t the
fastest machine on earth) and a bit of decrufting we have boot times
of around 4s (total) now, with a resonably complete GNOME system. And there’s
still a lot of room in it.

systemd-analyze blame is a nice and simple tool for
tracking down slow services. However, it suffers by a big problem: it
does not visualize how the parallel execution of the services actually
diminishes the price one pays for slow starting services. For that we
have prepared systemd-analyize plot for you. Use it like
this:

$ systemd-analyze plot > plot.svg
$ eog plot.svg

It creates pretty graphs, showing the time services spent to start
up in relation to the other services. It currently doesn’t visualize
explicitly which services wait for which ones, but with a bit of guess
work this is easily seen nonetheless.

To see the effect of our two little optimizations here are two
graphs generated with systemd-analyze plot, the first before
and the other after our change:

(For the sake of completeness, here are the two complete outputs of
systemd-analyze blame for these two boots: before and after.)

The well-informed reader probably wonders how this relates to Michael Meeks’
bootchart. This plot and bootchart do show similar graphs, that is
true. Bootchart is by far the more powerful tool. It plots in all
detail what is happening during the boot, how much CPU and IO is
used. systemd-analyze plot shows more high-level data: which
service took how much time to initialize, and what needed to wait for
it. If you use them both together you’ll have a wonderful toolset to
figure out why your boot is not as fast as it could be.

Now, before you now take these tools and start filing bugs against
the worst boot-up time offenders on your system: think twice. These
tools give you raw data, don’t misread it. As my optimization example
above hopefully shows, the blame for the slow bootup was not actually
with udev-settle.service, and not with the ModemManager
prober run by it either. It is with the subsystem that pulled this
service in in the first place. And that’s where the problem needs to
be fixed. So, file the bugs at the right places. Put the blame where
the blame belongs.

As mentioned, these three utilities are available on your Fedora 15
system out-of-the-box.

And here’s what to take home from this little blog story:

systemd-analyze is a wonderful tool and systemd comes
with profiling built in.
Don’t misread the data these tools generate!
With two simple changes you might be able to speed up your system
by 7s!
Fix your software if it can’t handle dynamic hardware
properly!
The Fedora default of installing the OS on an enterprise-level
storage managing system might be something to rethink.

And that’s all for now. Thank you for your interest.

Footnotes

[1] Also known as the greatest Free Software OS release
ever.

[2] The right fix here is to improve the services in
question to actively listen to hotplug events via libudev or similar
and act on the devices showing up as they show up, so that we can
continue with the bootup the instant everything we really need to go
on has shown up. To get a quick bootup we should wait for what we
actually need to proceed, not for everything. Also note that the
storage services are not the only services which do not cope well with
modern dynamic hardware, and assume that the device list is static and
stays unchanged. For example, in this example the reason the initrd is
actually as slow as it is is mostly due to the fact that Plymouth
expects to be executed when all video devices have shown up and have
been probed. For an unknown reason (at least unknown to me) loading
the video kernel modules for my Intel graphics cards takes multiple
seconds, and hence the entire boot is delayed unnecessarily. (Here too
I’d not put the blame on the probing but on the fact that we
wait for it to complete before going on.)

[3] Well, to be precise, I actually did try to get this
fixed. Most of the delay of crypsetup stems from the — in my eyes —
unnecessarily high default values for --iter-time in
cryptsetup. I tried to convince our cryptsetup maintainers that 100ms
as a default here are not really less secure than 1s, but well, I
failed.

[4] Of course, it’s usually not our style to just tape over
problems instead of fixing them, but this is such a nice occasion to
show off yet another cool systemd feature…

systemd for Administrators, Part VI

2011-04-08 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/changing-roots.html

Here’s another installment of
my
ongoing
series
on
systemd for Administrators:

Changing Roots

As administrator or developer sooner or later you’ll ecounter chroot()
environments. The chroot() system call simply shifts what
a process and all its children consider the root directory /, thus
limiting what the process can see of the file hierarchy to a subtree
of it. Primarily chroot() environments have two uses:

For security purposes: In this use a specific isolated daemon is
chroot()ed into a private subdirectory, so that when exploited the
attacker can see only the subdirectory instead of the full OS
hierarchy: he is trapped inside the chroot() jail.
To set up and control a debugging, testing, building, installation
or recovery image of an OS: For this a whole guest operating
system hierarchy is mounted or bootstraped into a subdirectory of the
host OS, and then a shell (or some other application) is started
inside it, with this subdirectory turned into its /. To the shell it
appears as if it was running inside a system that can differ greatly
from the host OS. For example, it might run a different distribution
or even a different architecture (Example: host x86_64, guest
i386). The full hierarchy of the host OS it cannot see.

On a classic System-V-based operating system it is relatively easy
to use chroot() environments. For example, to start a specific daemon
for test or other reasons inside a chroot()-based guest OS tree, mount
/proc, /sys and a few other API file systems into
the tree, and then use chroot(1) to enter the chroot, and
finally run the SysV init script via /sbin/service from
inside the chroot.

On a systemd-based OS things are not that easy anymore. One of the
big advantages of systemd is that all daemons are guaranteed to be
invoked in a completely clean and independent context which is in no
way related to the context of the user asking for the service to be
started. While in sysvinit-based systems a large part of the execution
context (like resource limits, environment variables and suchlike) is
inherited from the user shell invoking the init skript, in systemd the
user just notifies the init daemon, and the init daemon will then fork
off the daemon in a sane, well-defined and pristine execution context
and no inheritance of the user context parameters takes place. While
this is a formidable feature it actually breaks traditional approaches
to invoke a service inside a chroot() environment: since the actual
daemon is always spawned off PID 1 and thus inherits the chroot()
settings from it, it is irrelevant whether the client which asked for
the daemon to start is chroot()ed or not. On top of that, since
systemd actually places its local communications sockets in
/run/systemd a process in a chroot() environment will not even
be able to talk to the init system (which however is probably a good thing, and the
daring can work around this of course by making use of bind
mounts.)

This of course opens the question how to use chroot()s properly in
a systemd environment. And here’s what we came up with for you, which
hopefully answers this question thoroughly and comprehensively:

Let’s cover the first usecase first: locking a daemon into a
chroot() jail for security purposes. To begin with, chroot() as a
security tool is actually quite dubious, since chroot() is not a
one-way street. It is relatively easy to escape a chroot()
environment, as even the
man page points out. Only in combination with a few other
techniques it can be made somewhat secure. Due to that it usually
requires specific support in the applications to chroot() themselves
in a tamper-proof way. On top of that it usually requires a deep
understanding of the chroot()ed service to set up the chroot()
environment properly, for example to know which directories to bind mount from
the host tree, in order to make available all communication channels
in the chroot() the service actually needs. Putting this together,
chroot()ing software for security purposes is almost always done best
in the C code of the daemon itself. The developer knows best (or at
least should know best) how to properly secure down the
chroot(), and what the minimal set of files, file systems and
directories is the daemon will need inside the chroot(). These days a
number of daemons are capable of doing this, unfortunately however of
those running by default on a normal Fedora installation only two are
doing this: Avahi and
RealtimeKit. Both apparently written by the same really smart
dude. Chapeau! 😉 (Verify this easily by running ls -l /proc/*/root on your system.)

That all said, systemd of course does offer you a way to chroot()
specific daemons and manage them like any other with the usual
tools. This is supported via the RootDirectory= option in
systemd service files. Here’s an example:

[Unit]
Description=A chroot()ed Service

[Service]
RootDirectory=/srv/chroot/foobar
ExecStartPre=/usr/local/bin/setup-foobar-chroot.sh
ExecStart=/usr/bin/foobard
RootDirectoryStartOnly=yes

In this example, RootDirectory= configures where to
chroot() to before invoking the daemon binary specified with
ExecStart=. Note that the path specified in
ExecStart= needs to refer to the binary inside the chroot(),
it is not a path to the binary in the host tree (i.e. in this example
the binary executed is seen as
/srv/chroot/foobar/usr/bin/foobard from the host OS). Before
the daemon is started a shell script setup-foobar-chroot.sh
is invoked, whose purpose it is to set up the chroot environment as
necessary, i.e. mount /proc and similar file systems into it,
depending on what the service might need. With the
RootDirectoryStartOnly= switch we ensure that only the daemon
as specified in ExecStart= is chrooted, but not the
ExecStartPre= script which needs to have access to the full
OS hierarchy so that it can bind mount directories from there. (For
more information on these switches see the respective man
pages.)
If you place a unit file like this in
/etc/systemd/system/foobar.service you can start your
chroot()ed service by typing systemctl start foobar.service. You may then introspect it with systemctl status foobar.service. It is accessible to the administrator like
any other service, the fact that it is chroot()ed does — unlike on
SysV — not alter how your monitoring and control tools interact with
it.

Newer Linux kernels support file system namespaces. These are
similar to chroot() but a lot more powerful, and they do not
suffer by the same security problems as chroot(). systemd
exposes a subset of what you can do with file system namespaces right
in the unit files themselves. Often these are a useful and simpler
alternative to setting up full chroot() environment in a
subdirectory. With the switches ReadOnlyDirectories= and
InaccessibleDirectories= you may setup a file system
namespace jail for your service. Initially, it will be identical to
your host OS’ file system namespace. By listing directories in these
directives you may then mark certain directories or mount points of
the host OS as read-only or even completely inaccessible to the
daemon. Example:

[Unit]
Description=A Service With No Access to /home

[Service]
ExecStart=/usr/bin/foobard
InaccessibleDirectories=/home

This service will have access to the entire file system tree of the
host OS with one exception: /home will not be visible to it, thus
protecting the user’s data from potential exploiters. (See the
man page for details on these options.)

File system namespaces are in fact a better replacement for
chroot()s in many many ways. Eventually Avahi and RealtimeKit
should probably be updated to make use of namespaces replacing
chroot()s.

So much about the security usecase. Now, let’s look at the other
use case: setting up and controlling OS images for debugging, testing,
building, installing or recovering.

chroot() environments are relatively simple things: they only
virtualize the file system hierarchy. By chroot()ing into a
subdirectory a process still has complete access to all system calls,
can kill all processes and shares about everything else with the host
it is running on. To run an OS (or a small part of an OS) inside a
chroot() is hence a dangerous affair: the isolation between host and
guest is limited to the file system, everything else can be freely
accessed from inside the chroot(). For example, if you upgrade a
distribution inside a chroot(), and the package scripts send a SIGTERM
to PID 1 to trigger a reexecution of the init system, this will
actually take place in the host OS! On top of that, SysV shared
memory, abstract namespace sockets and other IPC primitives are shared
between host and guest. While a completely secure isolation for
testing, debugging, building, installing or recovering an OS is
probably not necessary, a basic isolation to avoid accidental
modifications of the host OS from inside the chroot() environment is
desirable: you never know what code package scripts execute which
might interfere with the host OS.

To deal with chroot() setups for this use systemd offers you a
couple of features:

First of all, systemctl detects when it is run in a
chroot. If so, most of its operations will become NOPs, with the
exception of systemctl enable and systemctl disable. If a package installation script hence calls these two
commands, services will be enabled in the guest OS. However, should a
package installation script include a command like systemctl restart as part of the package upgrade process this will have no
effect at all when run in a chroot() environment.

More importantly however systemd comes out-of-the-box with the systemd-nspawn
tool which acts as chroot(1) on steroids: it makes use of file system
and PID namespaces to boot a simple lightweight container on a file
system tree. It can be used almost like chroot(1), except that the
isolation from the host OS is much more complete, a lot more secure
and even easier to use. In fact, systemd-nspawn is capable of
booting a complete systemd or sysvinit OS in container with a single
command. Since it virtualizes PIDs, the init system in the container
can act as PID 1 and thus do its job as normal. In contrast to
chroot(1) this tool will implicitly mount /proc,
/sys for you.

Here’s an example how in three commands you can boot a Debian OS on
your Fedora machine inside an nspawn container:

# yum install debootstrap
# debootstrap --arch=amd64 unstable debian-tree/
# systemd-nspawn -D debian-tree/

This will bootstrap the OS directory tree and then simply invoke a
shell in it. If you want to boot a full system in the container, use a
command like this:

# systemd-nspawn -D debian-tree/ /sbin/init

And after a quick bootup you should have a shell prompt, inside a
complete OS, booted in your container. The container will not be able
to see any of the processes outside of it. It will share the network
configuration, but not be able to modify it. (Expect a couple of
EPERMs during boot for that, which however should not be
fatal). Directories like /sys and /proc/sys are
available in the container, but mounted read-only in order to avoid
that the container can modify kernel or hardware configuration. Note
however that this protects the host OS only from accidental
changes of its parameters. A process in the container can manually
remount the file systems read-writeable and then change whatever it
wants to change.

So, what’s so great about systemd-nspawn again?

It’s really easy to use. No need to manually mount /proc
and /sys into your chroot() environment. The tool will do it
for you and the kernel automatically cleans it up when the container
terminates.
The isolation is much more complete, protecting the host OS from
accidental changes from inside the container.
It’s so good that you can actually boot a full OS in the
container, not just a single lonesome shell.
It’s actually tiny and installed everywhere where systemd is
installed. No complicated installation or setup.

systemd itself has been modified to work very well in such a
container. For example, when shutting down and detecting that it is
run in a container, it just calls exit(), instead of reboot() as last
step.

Note that systemd-nspawn is not a full container
solution. If you need that LXC is the better choice for
you. It uses the same underlying kernel technology but offers a lot
more, including network virtualization. If you so will,
systemd-nspawn is the GNOME 3 of container solutions:
slick and trivially easy to use — but with few configuration
options. LXC OTOH is more like KDE: more configuration options than lines of
code. I wrote systemd-nspawn specifically to cover testing,
debugging, building, installing, recovering. That’s what you should use
it for and what it is really good at, and where it is a much much nicer
alternative to chroot(1).

So, let’s get this finished, this was already long enough. Here’s what to take home from
this little blog story:

Secure chroot()s are best done natively in the C sources of your program.
ReadOnlyDirectories=, InaccessibleDirectories=
might be suitable alternatives to a full chroot() environment.
RootDirectory= is your friend if you want to chroot() a specific service.
systemd-nspawn is made of awesome.
chroot()s are lame, file system namespaces are totally l33t.

All of this is readily available on your Fedora 15 system.

And that’s it for today. See you again for the next installment.

GNOME 3.0 Is Out!

2011-04-07 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/gnome3.html

The next generation desktop has arrived. I am running it as I type this, and so should you. So, go, get it!

If you are in Berlin on Friday you should also attend our GNOME
3.0 Release Party. It’s at the world famous c-base, in the remains of an alien spaceship
that crashed into Berlin 4.5 billion years ago (no kidding!). We’ve got
Ubuntu’s Daniel Holbach as DJ, and a
few folks from the GNOME community will do a talk or two (including that
annoying dude who created Avahi, PulseAudio and systemd). We even got Mirko Boehm
from the KDE side to say a few things. And there are going to be GNOME 3
goodies! How awesome is that? See the
wiki page for further details.

And here’s your homework until Friday: Try out GNOME 3.0!

The GNOME 3.0 Live CD

2011-04-01 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/live-cd.html

The
Fedora GNOME 3.0 Live CD is made of awesome. Not just because it showcases
the awesomeness that is GNOME 3, but also because it’s built on an awesome
systemd-based OS. Double awesome!

So, get it, play with it. It’s the future of computing: GNOME and systemd
and Linux. Triple awesome!

And did I mention that F15 is going the awesomest OS release ever?

Nope, there’s no April 1st joke in here. It’s really honestly just …
awesome!

Final Reminder

2011-03-23 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/final-reminder.html

Citizens! GNOMErs! Only two days are left and the GUADEC/Desktop Summit CFP
is over (end date is Friday). Submit your
presentation proposal now, or it is too late. Read the CFP.

Oh, and regarding the need for a KDE identity account: due to limited
manpower we decided to reuse existing infrastucture instead of setting up a
completely new one. We do acknowledge that this is not ideal and we’d like to
ask for your understanding. (Creating a KDE identity account is unrestricted,
and you can easily create one even if you never had anything to do with KDE in
your life.)

Note that we are looking for both lightning talks and full-length
presentations. If you are interested in doing a lightning talk (and we can only
encourage you to), please use the same form to make your submission.

Desktop Summit/GUADEC 2011 CFP ends in one Week

2011-03-17 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/cfp-ends-in-one-week.html

I’d like to remind everybody that only one week is left until the Desktop Summit (aka GUADEC 2011) Call for Participation ends. We want your talk proposals, and that quickly, before it’s too late!

Berlin in summer is fantastic. You wouldn’t want to miss that, would you?

So, read the CFP again, and then submit something.

The CFP ends next friday. So hurry!

Thank you,
Lennart

systemd for Administrators, Part V

2011-03-02 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/three-levels-of-off.html

It has been a while since the last
installment of
my systemd for
Administrators series,
but now with the release of Fedora 15 based on systemd looming, here’s
a new episode:

The Three Levels of “Off”

In systemd,
there are three levels of turning off a service (or other unit). Let’s
have a look which those are:

You can stop a service. That simply terminates the
running instance of the service and does little else. If due to some
form of activation (such as manual activation, socket activation, bus
activation, activation by system boot or activation by hardware plug)
the service is requested again afterwards it will be started. Stopping
a service is hence a very simple, temporary and superficial
operation. Here’s an example how to do this for the NTP service:
```
$ systemctl stop ntpd.service
```
This is roughly equivalent to the following traditional command which is available on most SysV inspired systems:
```
$ service ntpd stop
```
In fact, on Fedora 15, if you execute the latter command it will be transparently converted to the former.
You can disable a service. This unhooks a service from
its activation triggers. That means, that depending on your service it
will no longer be activated on boot, by socket or bus activation or by
hardware plug (or any other trigger that applies to it). However, you
can still start it manually if you wish. If there is already a started
instance disabling a service will not have the effect of stopping
it. Here’s an example how to disable a service:
```
$ systemctl disable ntpd.service
```
On traditional Fedora systems, this is roughly equivalent to the following command:
```
$ chkconfig ntpd off
```
And here too, on Fedora 15, the latter command will be
transparently converted to the former, if necessary.

Often you want to combine stopping and disabling a service, to
get rid of the current instance and make sure it is not started again (except when manually triggered):
```
$ systemctl disable ntpd.service
$ systemctl stop ntpd.service
```
Commands like this are for example used during package
deinstallation of systemd services on Fedora.

Disabling a service is a permanent change; until you undo it it
will be kept, even across reboots.
You can mask a service. This is like disabling a service,
but on steroids. It not only makes sure that service is not started
automatically anymore, but even ensures that a service cannot even be
started manually anymore. This is a bit of a hidden feature in
systemd, since it is not commonly useful and might be confusing the
user. But here’s how you do it:
```
$ ln -s /dev/null /etc/systemd/system/ntpd.service
$ systemctl daemon-reload
```
By symlinking a service file to /dev/null you tell
systemd to never start the service in question and completely block
its execution. Unit files stored in /etc/systemd/system
override those from /lib/systemd/system that carry the same
name. The former directory is administrator territory, the latter
terroritory of your package manager. By installing your symlink in
/etc/systemd/system/ntpd.service you hence make sure that
systemd will never read the upstream shipped service file
/lib/systemd/system/ntpd.service.

systemd will recognize units symlinked to /dev/null and
show them as masked. If you try to start such a service manually (via
systemctl start for example) this will fail with an
error.

A similar trick on SysV systems does not (officially) exist. However,
there are a few unofficial hacks, such as editing the init script and
placing an exit 0 at the top, or removing its execution
bit. However, these solutions have various drawbacks, for example they
interfere with the package manager.

Masking a service is a permanent change, much like disabling a service.

Now that we learned how to turn off services on three levels,
there’s only one question left: how do we turn them on again? Well,
it’s quite symmetric. use systemctl start to undo
systemctl stop. Use systemctl enable to undo
systemctl disable and use rm to undo
ln.

And that’s all for now. Thank you for your attention!

Desktop Summit 2011 Call For Participation

2011-03-02 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/guadec-cfp-2011.html

In case you haven’t noticed yet: the Call For Participation for the Desktop
Summit 2011 (aka GUADEC 2011, aka Akademy 2011) in Berlin, Germany is open
since yesterday. Submissions will be accepted until March 25th, so make sure to
submit your proposals quickly.

FOSDEM Talk on Video

2011-02-18 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fosdem2011-video.html

If you have already watched my presentation on
systemd I gave at linux.conf.au 2011 then this video of my talk on
the same topic which I have gave at FOSDEM
2011 in Brussels, Belgium will probably not be all new to you, but the
questions from the audience (and hopefully my responses) might answer a
question or two you might still have. So do watch it:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on YouTube.

Oh, and FOSDEM rocked, like every year!

LCA Talk on Video

2011-01-31 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lca2011-video.html

I won’t spare you the video of my talk about systemd at linux.conf.au 2011 in Brisbane, Australia last week:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on blip.tv.

LCA was fantastic and especially impressive given the circumstances of the recent floodings in Queensland. Really good conference, and congratulations to the organizers!

FOSDEM Interview with Yours Truly

2011-01-26 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fosdem2011.html

The FOSDEM organizers just published a brief interview with yours
truly regarding the presentation
about systemd
I will be giving there
on Sat. Feb. 5th,
3pm. If you come to Brussels make sure to drop by! And even
if you don’t have a look on the interview!

If you don’t make it to Brussels, there are two more stops in my
little systemd World Tour in the next weeks: today
(Wed. Jan. 26th,
2:30pm) I will be speaking at linux.conf.au in Brisbane,
Australia. And
on Fri. Feb. 11th,
1:20pm I’ll be speaking at the Red Hat Developer Conference in
Brno, Czech Republic.

Chorin

2010-11-21 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/photos/chorin.html

Chorin Abbey Church, Brandenburg, Germany. Yes, indeed, that’s a crane.

systemd for Administrators, Part IV

2010-11-19 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-4.html

Here’s the fourth installment of my ongoing
series about
systemd
for administrators.

Killing Services

Killing a system daemon is easy, right? Or is it?

Sure, as long as your daemon persists only of a single process this might
actually be somewhat true. You type killall rsyslogd and the syslog
daemon is gone. However it is a bit dirty to do it like that given that this
will kill all processes which happen to be called like this, including those an
unlucky user might have named that way by accident. A slightly more correct
version would be to read the .pid file, i.e. kill `cat /var/run/syslogd.pid`. That already gets us much further, but still, is
this really what we want?

More often than not it actually isn’t. Consider a service like Apache, or
crond, or atd, which as part of their usual operation spawn child processes.
Arbitrary, user configurable child processes, such as cron or at jobs, or CGI
scripts, even full application servers. If you kill the main apache/crond/atd
process this might or might not pull down the child processes too, and it’s up
to those processes whether they want to stay around or go down as well.
Basically that means that terminating Apache might very well cause its CGI
scripts to stay around, reassigned to be children of init, and difficult to
track down.

systemd to
the rescue: With systemctl kill you can easily send a signal to all
processes of a service. Example:

# systemctl kill crond.service

This will ensure that SIGTERM is delivered to all processes of the crond
service, not just the main process. Of course, you can also send a different
signal if you wish. For example, if you are bad-ass you might want to go for
SIGKILL right-away:

# systemctl kill -s SIGKILL crond.service

And there you go, the service will be brutally slaughtered in its entirety,
regardless how many times it forked, whether it tried to escape supervision by
double forking or fork bombing.

Sometimes all you need is to send a specific signal to the main process of a
service, maybe because you want to trigger a reload via SIGHUP. Instead of going via the
PID file, here’s an easier way to do this:

# systemctl kill -s HUP --kill-who=main crond.service

So again, what is so new and fancy about killing services in systemd? Well,
for the first time on Linux we can actually properly do that. Previous
solutions were always depending on the daemons to actually cooperate to bring
down everything they spawned if they themselves terminate. However, usually if
you want to use SIGTERM or SIGKILL you are doing that because they actually do
not cooperate properly with you.

How does this relate to systemctl stop? kill goes directly
and sends a signal to every process in the group, however stop goes
through the official configured way to shut down a service, i.e. invokes the
stop command configured with ExecStop= in the service file. Usually
stop should be sufficient. kill is the tougher version, for
cases where you either don’t want the official shutdown command of a service to
run, or when the service is hosed and hung in other ways.

(It’s up to you BTW to specify signal names with or without the SIG prefix
on the -s switch. Both works.)

It’s a bit surprising that we have come so far on Linux without even being
able to properly kill services. systemd for the first time enables you to do
this properly.

systemd Status Update

2010-11-19 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-update-2.html

It has been a
while since my last status update on systemd. Here’s another short,
incomprehensive status update on what we worked on for systemd since then.

Fedora F15 (Rawhide) now includes a split up
/etc/init.d/rc.sysinit (Bill Nottingham). This allows us to keep only
a minimal compatibility set of shell scripts around, and boot otherwise a
system without any shell scripts at all. In fact, shell scripts during early
boot are only used in exceptional cases, i.e. when you enabled autoswapping
(bad idea anyway), when a full SELinux relabel is necessary, during the first
boot after initialization, if you have static kernel modules to load (which are
not configured via the systemd-native way to do that), if you boot from a
read-only NFS server, or when you rely on LVM/RAID/Multipath. If nothing of
this applies to you can easily disable these parts of early boot and
save several seconds on boot. How to do this I will describe in a later blog
story.
We have a fully C coded shutdown logic that kills all remaining processes,
unmounts all remaining file systems, detaches all loop devices and DM volumes
and does that in the right way to ensure that all these things are properly
teared down even if they depend on each other in arbitrary ways. This is not
only considerably faster then the traditional shell hackery for this, but also
a lot safer, since we try to unmount/remount the remaining file systems with a
little bit of brains. This feature is available via systemctl --force poweroff to the administrator. The --force controls whether the
usual shutdown of all services is run or whether this is skipped and we
immediately shall enter this final C shutdown logic. Using --force
hence is a much safer replacement for the old /sbin/reboot -f and does
not leave dirty file systems behind. (Thanks to Fabiano Fidencio has his
colleagues from ProFUSION for this).
systemd now includes a minmalistic readahead implementation, based on
fanotify(), fadvise() and mincore(). It supports btrfs defragmentation and both
SSD and HDD disks. While the effect on boots that are anyway fast (such as most
stuff involving SSD) is minimal, slower and older machines benefit from this
more substantially.
We now control fsck and quota during early boot with a C tool that ensure
maximum parallelization but properly implements the necessary high-level
administration logic.
Every service, every user and every user session now gets its own cgroup in
the ‘cpu’ hierarchy thus creating better fairness between the logged in users
and their sessions.
We now provide /dev/log logging from early boot to late shutdown.
If no syslog daemon is running the output is passed on to kmsg. As soon as a
proper syslog daemon starts up the kmsg buffer is flushed to syslog, and hence
we will have complete log coverage in syslog even for early boot.
systemctl kill was introduced, an easy command to send a signal to
all processes of a service. Expect a blog story with more details about this
shortly.
systemd gained the ability to load the SELinux policy if necessary, thus
supporting non-initrd boots and initrd boots from the same binary with no
duplicate work. This is in fact (and surprisingly) a first among Linux init
systems.
We now initialize and set the system locale inside PID 1 to be inherited by
all services and users.
systemd has native support for /etc/crypttab and can activate
encrypted LUKS/dm-crypt disks both at boot-up and during runtime. A minimal
password querying infrastructure is available, where multiple agents can be
used to present the password to the user. During boot the password is queried
either via Plymouth or directly on the console. If a system crypto disk is
plugged in after boot you are queried for the password via a GNOME agent, or a
wall(1) agent. Finally, while you run systemctl start (or a similar
command) a minimal TTY password agent is available which asks you for passwords
right-away if this is necessary. The password querying logic is very simple,
additional agents can be implemented in a trivial amount of code (Yupp, KDE folks, you
can add an agent for this, too). Note that the password querying logic in
systemd is only for non-user passwords, i.e. passwords that have no relation to
a specific user, but rather to specific hardware or system software. In future
we hope to extend this so that this can be used to query the password of SSL
certificates when Apache or other servers start.
We offer a minimal interface that external projects can use to extend the
dependency graph systemd manages. In fact, the cryptsetup logic mentioned above
is implemented via this ‘plugin’-like system. Since we did not want to add code
that deals with cryptographic disks into the systemd process itself we
introduced this interface (after all cryptographic volumes are not an essential
feature of a minimal OS, and unncessary on most embedded uses; also the future
might bring us STC which might make this at least partially obsolete). Simply
by dropping a generator binary into
/lib/systemd/system-generators which should write out systemd unit
files into a temporary directory third-party packages may extend the systemd
dependency tree dynamically. This could be useful for example to automatically
create a systemd service for each KVM machine or LXC container. With that in
place those containers/machines could be managed and supervised with the same
tools as the usual system services.
We integrated automatic clean-up of directories such as /tmp into
the tmpfiles logic we already had in place that recreates files and
directories on volatile file systems such as /var/run,
/var/lock or /tmp.
We now always measure and write to the log files the system startup time we
measured, broken up into how many time was spent on the kernel, the initrd and
the initialization of userspace.
We now safely destroy all user session before going down. This is a feature
long missing on Linux: since user processes were not killed until the very last
moment the unhealthy situation that user code was running at a time where no
other daemon was remaining was a normal part of shutdown.
systemd now understands an ‘extreme’ form of disabling a service: if you
symlink a service name in /etc/systemd/system to /dev/null
then systemd will mark it as masked and completely refuse starting it,
regardless if this is requested manually or automaticallly. Normally it should
be sufficient to simply call systemctl disable to disable a service
which still allows manual activation but no automatic activation. Masking a
service goes one step further.
There’s now a simple condition syntax in places which allows
skipping or enabling units depending on the existance of a file, whether a
directory is empty or whether a kernel command line option is set.
In addition to normal shutdowns for reboot, halt or poweroff we now
similarly support a kexec reboot, that reboots the machine without going though
the BIOS code again.
We have bash completion support for systemctl. (Ran Benita)
Andrew Edmunds contributed basic support to boot Ubuntu with systemd.
Michael Biebl and Tollef Fog Heen have worked on the systemd integration
into Debian to a level that it is now possible to boot a system without having
the old initscripts packaged installed. For more details see the Debian Wiki. Michael even
tested this integration on an Ubuntu Natty system and as it turns out this
works almost equally well on Ubuntu already. If you are interesting in playing
around with this, ping Michael.

And that’s it for now. There’s a lot of other stuff in the git commits, but
most of it is smaller and I will it thus spare you.

We have come quite far in the last year. systemd is about a year old now,
and we are now able to boot a system without legacy shell scripts remaining,
something that appeared to be a task for the distant future.

All of this is available in systemd 13 and in F15/Rawhide as I type
this. If you want to play around with this then consider installing Rawhide
(it’s fun!).

27C3 Fudfest

2010-11-13 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/ccc-nervt.html

I really wonder why on earth the 27C3 accepted a nonsensical
paper like this into their programme. So .. stupid. You read half the
proposal and it’s already kinda obvious that the presenter has no idea what he
is talking of. Fundamental errors, obvious misinterpretations, outdated issues:
this is just FUD.

And apparently this talk even is anonymous? Such a coward! FUDing around
anonymously is acceptable at the CCC?

Linux Plumbers Conference/Gnome Summit Recap

2010-11-09 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2010-recap.html

Last week LPC and GS 2010 took place in Cambridge,
MA. Like the last years, LPC showed again that — at least for me — it is one of
the most relevant Linux conferences in existence, if not the single most
relevant one.

Here’s a terse, incomprehensive report of the different discussions I took
part in with various folks at the conference, in no particular order:

The Boot
and Init track led by Kay Sievers (Suse) was a great success. We had
exciting talks which I think helped quite a bit in clearing a few things up,
and hopefully helps us in consolidating the full Linux boot process among all
the components involved. We had talks covering everything from the BIOS boot,
to initrds, graphical boot splashes and systemd. Kay
Sievers and I spoke about systemd, also covering the state of it in the Fedora
and openSUSE distributions. Gustavo Barbieri (ProFUSION, Gentoo) and Michael
Biebl (Debian) gave interesting talks about systemd adoption in their
respective distributions. I was particularly interested in the various
statistics Michael showed about SysV/LSB init script usage in Debian, because
this gives an idea how much work we have in front of us in the long run. A
longer discussion about the future of initrds and the logic necessary to find
the root file system on boot was quite enlightening. I think this track was
helpful to increase the unification and consolidation of the way Linux systems
boot up and are maintained during runtime.

Kay and I and some other folks sat down with Arjan van de Ven (Intel), to
talk about the prospects of systemd in Meego. The discussions were very
positive. In particular Arjan hat some great suggestions regarding use of the
Simple
Boot Flag in systemd (expect this in one of the next versions) and
readahead. Before systemd can find adoption in Meego we’d have to add a short
number of features to systemd first, most of them should be easy to add.

Similarly, I sat down with Martin Pitt and James Hunt (both Canonical) and
discussed systemd in relation to Ubuntu. I think we managed to clear a lot of
things up, and have a good chance to improve cooperation between Ubuntu and
systemd in relation to APIs and maybe even more.

We talked to Thomas Gleixner regarding userspace notifications when the
wallclock time jumps relative to the monotonic clock. This is important to
systemd so that we can schedule calendar jobs similar to cron, but without
having to wake up periodically to check whether the wallclock time changed
relatively to the monotonic clock so that we can recalculate the next
point in time a calendar event is triggered. There has been previous work in
this area in the kernel world, but nothing got merged. Thomas’ suggestion how to
add this facility should be much easier than anything proposed so far.

I also tried to talk Andreas Grünbacher into supporting file system
user extended attributes in various virtual file systems such as procfs,
cgroupfs, sysfs and tmpfs. I hope I convinced him that this would be a good
idea, since this would allow setting externally accessible attributes to all
kinds of kernel objects, such as processes and devices. This would not only
have uses in systemd (where we could easily store all meta information systemd
needs to know about a service in the cgroupfs via xattrs, so that systemd could
even crash or go away at any time and we still can read all runtime information
necessary beyond mere cgrouping from the file system when systemd comes to live
again) but also in the desktop environments, so that we could for example
attach the human readable application name, an icon or a desktop file to the
processes currently running, in a simple way where the data we attach follows
the lifecycle of the process itself.

The Audio track
went really well, too. I was particularly excited about Pierre-Louis Bossart’s
(Intel) plans regarding AC3 (and other codecs) support in PulseAudio, and the simplicity of his
approach. Also great was hearing about Laurent Pinchart’s project to expose
audio and video device routing to userspace. Finally, I really enjoyed David
Henningsson’s and Luke Yelavich’s (both Canonical) talk regarding tracking down audio bugs on
Ubuntu. I was really impressed by the elaborate tools they created to test
audio drivers on users machines. Pretty cool stuff. Maybe this can be extended
into a test suite for driver writers, because the current approach for driver
writers (i.e. “If PulseAudio works correctly, your driver is correct”) doesn’t
really scale (although I like the idea and take it as a compliment…). I also
liked the timechart profiling results Pierre showed me that he generated for
PulseAudio. Seems PulseAudio is behaving quite nicely these days.

Together with Harald Hoyer I got a demo of David Zeuthen’s disk assembly
daemon (stc), which makes RAID/MD/LVM assembly more dynamic. Great stuff, and I
think we convinced him to leave actual mounting of file systems to systemd
instead of doing it himself.

Harald and I also hashed out a few things to make integration between dracut
and systemd nicer (i.e. passing along profiling information between the two,
and information regarding the root fsck).

I also hope I convinced Ray Strode to make Plymouth actively listen to udev
for notifications about DRM devices, so that further synchronization between
udev and plymouth won’t be necessary, which both makes things more robust and a
little bit faster.

Kay and I talked to Greg Kroah-Hartman regarding the brokeness of
VT_WAITEVENT in kernel TTY layer, and discussed what to do about this. After returning from the US Kay now
did the necessary hacking work to provide a minimal sysfs based solution that
allows userspace query to which TTYs /dev/console and
/dev/tty0 currently point, and get notifications when this changes.
This should allow us to greatly simplify ConsoleKit and make it possible to
add console-triggered activation to systemd (think: getty gets started the
moment you switch to its virtual terminal, not already at boot).

I also spent some time discussing the upcoming deadline scheduling kernel
logic with Dario, Dhaval and Tommaso regarding its possible use in PulseAudio.
I believe deadline schedule is a useful tool to hand out real-time scheduling
to applications securely. As an easy path to supporting deadline scheduling in
PulseAudio I suggested patching RealtimeKit to optionally use deadline
scheduling for its clients. This would magically teach PA (and other clients) to
use deadline scheduling without further patching in the clients.

At GNOME Summit I sat down with Ryan Lortie and Will Thompson to discuss the
the future of the D-Bus session bus and how we can move to a machine/user bus
instead in a nice way. We managed to come to a nice agreement here, and this
should enable us to introduce systemd for session management soonishly. Now we
only need to convince the other folks having stakes in D-Bus that what we
discussed is actually a good idea, expect more about this soon on dbus-devel.
Ryan and I also hashed out our remaining differences regarding the exact
semantics of XDG_RUNTIME_DIR, the result of which you can already
see on the XDG mailing list. Ryan already did the GLib work to introduce
XDG_RUNTIME_DIR and systemd already supports this inofficially since a few
versions.

I quite appreciate how Michael Meeks quoted me in his final
keynote. 😉

There was a lot of other stuff going on at the conference, and what I
wrote above is in no way complete. And of course, besides all the technical
stuff, it was great meeting all the good Linux folks again, especially my
colleagues from Red Hat.

I am still amazed how systemd is received so positively and with open arms
all across the board. It’s particularly amazing that systemd at this point in
time has already been adopted by various companies in the automotive and
aviation industry.

Off to LPC 2010, Boston

2010-11-01 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2010.html

Later this week the Linux Plumbers Conference 2010 will take place at the
Hyatt Regency in Cambridge.

Together with Mark Brown I’ll be running the conference track about Audio,
and I believe we managed to put together quite a nice schedule with various
interesting talks covering many areas of what Audio on Linux is about.

I’ll also be around at the Boot
and Init Systems track which Kay Sievers is running. Together with Kay I’ll
do a session about systemd,
everybody’s favourite system and session manager. We also managed to convince a
number of distribution maintainers of systemd to do short presentations about
the state of systemd adoption in their respective distributions: Michael Biebl
from Debian, Gustavo Barbieri from Gentoo, Kay for openSUSE and yours truly for
Fedora.

Because there never can be enough systemd coverage at a conference I’ll do
another talk about systemd, in Vincent Untz’ Desktop
track, this time focussing less on how to boot and maintain a system, but more
on doing the same for desktop sessions, in particular GNOME.

I’ll also stick around for the the first two days of the GNOME Boston Summit.

See you in Cambridge!

FOSS.in CFP Deadline Approaching!

2010-10-07 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/foss-in-2010.html

I just submitted my paper^[1] for FOSS.in 2010 in
Bangalore/India. Don’t forget to submit yours! The CFP closes on 10th of
October. That’s this Sunday! Hurry up, before it is too late!

FOSS.in is one of the most amazing Free Software conferences this world has
to offer (hey, and I think I can say that because I have presented at quite a few). A dedicated
audience, flawless organization, magic hospitality, and all this in incredible
India! Both the technical programme and everything around it are impressive. Which
other conference can offer you a concert of one of
India’s greatest acts as part of the schedule? Which other international
conference host city can be such a positive attack on your senses as Bangalore (see
that endless sea of flowers below)? And where else do they serve pure silver as part of the conference catering?

Read the CFP! Or, go straight to submitting a paper.

Footnotes

^[1] About systemd.

systemd for Administrators, Part III

2010-10-01 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-3.html

Here’s the third installment of my ongoing
series about systemd
for administrators.

How Do I Convert A SysV Init Script Into A systemd Service File?

Traditionally, Unix and Linux services (daemons) are started
via SysV init scripts. These are Bourne Shell scripts, usually
residing in a directory such as /etc/rc.d/init.d/ which when
called with one of a few standardized arguments (verbs) such as
start, stop or restart controls,
i.e. starts, stops or restarts the service in question. For starts
this usually involves invoking the daemon binary, which then forks a
background process (more precisely daemonizes). Shell scripts
tend to be slow, needlessly hard to read, very verbose and
fragile. Although they are immensly flexible (after all, they are just
code) some things are very hard to do properly with shell scripts,
such as ordering parallized execution, correctly supervising processes
or just configuring execution contexts in all detail. systemd provides
compatibility with these shell scripts, but due to the shortcomings
pointed out it is recommended to install native systemd service files
for all daemons installed. Also, in contrast to SysV init scripts
which have to be adjusted to the distribution systemd service files
are compatible with any kind of distribution running systemd (which
become more and more these days…). What follows is a terse guide how
to take a SysV init script and translate it into a native systemd
service file. Ideally, upstream projects should ship and install
systemd service files in their tarballs. If you have successfully
converted a SysV script according to the guidelines it might hence be
a good idea to submit the file as patch to upstream. How to prepare a
patch like that will be discussed in a later installment, suffice to
say at this point that the daemon(7)
manual page shipping with systemd contains a lot of useful information
regarding this.

So, let’s jump right in. As an example we’ll convert the init
script of the ABRT daemon into a systemd service file. ABRT is a
standard component of every Fedora install, and is an acronym for
Automatic Bug Reporting Tool, which pretty much describes what it
does, i.e. it is a service for collecting crash dumps. Its SysV script I have uploaded
here.

The first step when converting such a script is to read it
(surprise surprise!) and distill the useful information from the
usually pretty long script. In almost all cases the script consists of
mostly boilerplate code that is identical or at least very similar in
all init scripts, and usually copied and pasted from one to the
other. So, let’s extract the interesting information from the script
linked above:

A description string for the service is “Daemon to detect
crashing apps“. As it turns out, the header comments include a
redundant number of description strings, some of them describing less
the actual service but the init script to start it. systemd services
include a description too, and it should describe the service and not
the service file.
The LSB header^[1] contains dependency
information. systemd due to its design around socket-based activation
usually needs no (or very little) manually configured
dependencies. (For details regarding socket activation see the original
announcement blog post.) In this case the dependency on
$syslog (which encodes that abrtd requires a syslog daemon),
is the only valuable information. While the header lists another
dependency ($local_fs) this one is redundant with systemd as
normal system services are always started with all local file systems
available.
The LSB header suggests that this service should be started in
runlevels 3 (multi-user) and 5 (graphical).
The daemon binary is /usr/sbin/abrtd

And that’s already it. The entire remaining content of this
115-line shell script is simply boilerplate or otherwise redundant
code: code that deals with synchronizing and serializing startup
(i.e. the code regarding lock files) or that outputs status messages
(i.e. the code calling echo), or simply parsing of the verbs (i.e. the
big case block).

From the information extracted above we can now write our systemd service file:

[Unit]
Description=Daemon to detect crashing apps
After=syslog.target

[Service]
ExecStart=/usr/sbin/abrtd
Type=forking

[Install]
WantedBy=multi-user.target

A little explanation of the contents of this file: The
[Unit] section contains generic information about the
service. systemd not only manages system services, but also devices,
mount points, timer, and other components of the system. The generic
term for all these objects in systemd is a unit, and the
[Unit] section encodes information about it that might be
applicable not only to services but also in to the other unit types
systemd maintains. In this case we set the following unit settings: we
set the description string and configure that the daemon shall be
started after Syslog^[2], similar to what is encoded in the
LSB header of the original init script. For this Syslog dependency we
create a dependency of type After= on a systemd unit
syslog.target. The latter is a special target unit in systemd
and is the standardized name to pull in a syslog implementation. For
more information about these standardized names see the systemd.special(7). Note
that a dependency of type After= only encodes the suggested
ordering, but does not actually cause syslog to be started when abrtd
is — and this is exactly what we want, since abrtd actually works
fine even without syslog being around. However, if both are started
(and usually they are) then the order in which they are is controlled
with this dependency.

The next section is [Service] which encodes information
about the service itself. It contains all those settings that apply
only to services, and not the other kinds of units systemd maintains
(mount points, devices, timers, …). Two settings are used here:
ExecStart= takes the path to the binary to execute when the
service shall be started up. And with Type= we configure how
the service notifies the init system that it finished starting up. Since
traditional Unix daemons do this by returning to the parent process
after having forked off and initialized the background daemon we set
the type to forking here. That tells systemd to wait until
the start-up binary returns and then consider the processes still
running afterwards the daemon processes.

The final section is [Install]. It encodes information
about how the suggested installation should look like, i.e. under
which circumstances and by which triggers the service shall be
started. In this case we simply say that this service shall be started
when the multi-user.target unit is activated. This is a
special unit (see above) that basically takes the role of the classic
SysV Runlevel 3^[3]. The setting WantedBy= has
little effect on the daemon during runtime. It is only read by the
systemctl enable command, which is the recommended way to
enable a service in systemd. This command will simply ensure that our
little service gets automatically activated as soon as
multi-user.target is requested, which it is on all normal
boots^[4].

And that’s it. Now we already have a minimal working systemd
service file. To test it we copy it to
/etc/systemd/system/abrtd.service and invoke systemctl daemon-reload. This will make systemd take notice of it, and now
we can start the service with it: systemctl start abrtd.service. We can verify the status via systemctl status abrtd.service. And we can stop it again via systemctl stop abrtd.service. Finally, we can enable it, so that it is activated
by default on future boots with systemctl enable abrtd.service.

The service file above, while sufficient and basically a 1:1
translation (feature- and otherwise) of the SysV init script still has room for
improvement. Here it is a little bit updated:

[Unit]
Description=ABRT Automated Bug Reporting Tool
After=syslog.target

[Service]
Type=dbus
BusName=com.redhat.abrt
ExecStart=/usr/sbin/abrtd -d -s

[Install]
WantedBy=multi-user.target

So, what did we change? Two things: we improved the description
string a bit. More importantly however, we changed the type of the
service to dbus and configured the D-Bus bus name of the
service. Why did we do this? As mentioned classic SysV services
daemonize after startup, which usually involves double forking
and detaching from any terminal. While this is useful and necessary
when daemons are invoked via a script, this is unnecessary (and slow)
as well as counterproductive when a proper process babysitter such as
systemd is used. The reason for that is that the forked off daemon
process usually has little relation to the original process started by
systemd (after all the daemonizing scheme’s whole idea is to remove
this relation), and hence it is difficult for systemd to figure out
after the fork is finished which process belonging to the service is
actually the main process and which processes might just be
auxiliary. But that information is crucial to implement advanced
babysitting, i.e. supervising the process, automatic respawning on
abnormal termination, collectig crash and exit code information and
suchlike. In order to make it easier for systemd to figure out the
main process of the daemon we changed the service type to
dbus. The semantics of this service type are appropriate for
all services that take a name on the D-Bus system bus as last step of
their initialization^[5]. ABRT is one of those. With this setting systemd
will spawn the ABRT process, which will no longer fork (this is
configured via the -d -s switches to the daemon), and systemd
will consider the service fully started up as soon as
com.redhat.abrt appears on the bus. This way the process
spawned by systemd is the main process of the daemon, systemd has a
reliable way to figure out when the daemon is fully started up and
systemd can easily supervise it.

And that’s all there is to it. We have a simple systemd service
file now that encodes in 10 lines more information than the original
SysV init script encoded in 115. And even now there’s a lot of room
left for further improvement utilizing more features systemd
offers. For example, we could set Restart=restart-always to
tell systemd to automatically restart this service when it dies. Or,
we could use OOMScoreAdjust=-500 to ask the kernel to please
leave this process around when the OOM killer wreaks havoc. Or, we
could use CPUSchedulingPolicy=idle to ensure that abrtd
processes crash dumps in background only, always allowing the kernel
to give preference to whatever else might be running and needing CPU
time.

For more information about the configuration options mentioned
here, see the respective man pages systemd.unit(5),
systemd.service(5),
systemd.exec(5). Or,
browse all of
systemd’s man pages.

Of course, not all SysV scripts are as easy to convert as this
one. But gladly, as it turns out the vast majority actually are.

That’s it for today, come back soon for the next installment in our series.

Footnotes

[1] The LSB header of init scripts is a convention of
including meta data about the service in comment blocks at the top of
SysV init scripts and is
defined by the Linux Standard Base. This was intended to
standardize init scripts between distributions. While most
distributions have adopted this scheme, the handling of the headers
varies greatly between the distributions, and in fact still makes it
necessary to adjust init scripts for every distribution. As such the LSB spec
never kept the promise it made.

[2] Strictly speaking, this dependency does not even have to
be encoded here, as it is redundant in a system where the Syslog
daemon is socket activatable. Modern syslog systems (for example
rsyslog v5) have been patched upstream to be socket-activatable. If
such a init system is used configuration of the
After=syslog.target dependency is redundant and
implicit. However, to maintain compatibility with syslog services that
have not been updated we include this dependency here.

[3] At least how it used to be defined on Fedora.

[4] Note that in systemd the graphical bootup
(graphical.target, taking the role of SysV runlevel 5) is an
implicit superset of the console-only bootup
(multi-user.target, i.e. like runlevel 3). That means hooking
a service into the latter will also hook it into the
former.

[5] Actually the majority of services of the default Fedora
install now take a name on the bus after startup.

Xiph Video

2010-09-24 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/video.html

Don’t miss Monty’s awesome video.

Noise

All posts by Lennart Poettering

systemd for Administrators, Part VII

The Blame Game

systemd for Administrators, Part VI

Changing Roots

GNOME 3.0 Is Out!

The GNOME 3.0 Live CD

Final Reminder

Desktop Summit/GUADEC 2011 CFP ends in one Week

systemd for Administrators, Part V

The Three Levels of “Off”

Desktop Summit 2011 Call For Participation

FOSDEM Talk on Video

LCA Talk on Video

FOSDEM Interview with Yours Truly

Chorin

systemd for Administrators, Part IV

Killing Services

systemd Status Update

27C3 Fudfest

Linux Plumbers Conference/Gnome Summit Recap

Off to LPC 2010, Boston

FOSS.in CFP Deadline Approaching!

systemd for Administrators, Part III

How Do I Convert A SysV Init Script Into A systemd Service File?

Xiph Video

The collective thoughts of the interwebz