Tag Archives: Projects

Plumbers Conference 2011

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2011.html

The Linux Plumbers
Conference 2011 in Santa Rosa, CA, USA
is coming nearer (Sep. 7-9).
Together with Kay Sievers I am running the Boot&Init track, and together with
Mark Brown the Audio track.

For both tracks we still need proposals. So if you haven’t submitted
anything yet, please consider doing so. And that quickly. i.e. if you can
arrange for it, last sunday would be best, since that was actually the final
deadline. However, the submission form is still open, so if you submit
something really, really quickly we’ll ignore the absence of time travel and the calendar for a bit. So, go,
submit something. Now.

What are we looking for? Well, here’s what I just posted on the audio
related mailing lists
:

So, please consider submitting something if you haven't done so yet. We
are looking for all kinds of technical talks covering everything audio
plumbing related: audio drivers, audio APIs, sound servers, pro audio,
consumer audio. If you can propose something audio related -- like talks
on media controller routing, on audio for ASOC/Embedded, submit
something! If you care for low-latency audio, submit something. If you
care about the Linux audio stack in general, submit something.

LPC is probably the most relevant technical conference on the general
Linux platform, so be sure that if you want your project, your work,
your ideas to be heard then this is the right forum for everything
related to the Linux stack. And the Audio track covers everything in our
Audio Stack, regardless whether it is pro or consumer audio.

And here’s what I posted to the init
related lists
:

So, please consider submitting something if you haven't done so yet. We
are looking for all kinds of technical talks covering everything from
the BIOS (i.e. CoreBoot and friends), over boot loaders (i.e. GRUB and
friends), to initramfs (i.e. Dracut and friends) and init systems
(i.e. systemd and friends). If you have something smart to say about any
of these areas or maybe about related tools (i.e. you wrote a fancy new
tool to measure boot performance) or fancy boot schemes in your
favourite Linux based OS (i.e. the new Meego zero second boot ;-)) then
don't hesitate to submit something on the LPC web site, in the Boot&Init
track!

And now, quickly, go to the
LPC website
and post your session proposal in the Audio resp. Boot&Init; track! Thank you!

Thanks

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/thanks.html

As some of you might know Fedora 15 went Gold a couple of days ago. The
first big distribution based on systemd will be released 2011-05-24. Mark the date!

In little over a year systemd went from nowhere to became a core piece of
Fedora. This wasn’t possible without the numerous folks who worked with us on
getting systemd right, supplied patches, chased bugs, tested releases and posted
comments and generally made sure everything was in shape for the big
release.

At this point we’d like to thank everybody who contributed and a few folks in
particular:

A. Costa
Adrian Spinu
Alexey Shabalin
Andreas Jaeger
Andrew Edmunds
Andrey Borzenkov
Bill Nottingham
Brandon Philips
Brendan Jones
Brett Witherspoon
Chris E Ferron
Christian Ruppert
Conrad Meyer
Daniel J Walsh
Dave Reisner
Eric Paris
Fabian Henze
Fabiano Fidêncio
Florian Kriener
Franz Dietrich
Greg Kroah-Hartman
Gustavo Sverzut Barbieri
Harald Hoyer
James Laska
Jan Engelhardt
Jeff Mahoney
Jesse Zhang
Jóhann B. Guðmundsson
Karel Zak
Koen Kooi
Lucas De Marchi
Ludwig Nussel
Luis Felipe Strano Moraes
Maarten Lankhorst
Malcolm Studd
Marc-Antoine Perennou
Martin Mikkelsen
Matthew Miller
Matthias Clasen
Matthias Schiffer
Michael Biebl
Michael Olbrich
Michael Tremer
Michał Piotrowski
Michal Schmidt
Mike Kazantsev
Mike Kelly
Miklos Vajna
Milan Broz
Ozan Çağlayan
Paul Menzel
Pavol Rusnak
Rahul Sundaram
Rainer Gerhards
Ran Benita
Ray Strode
Robert Gerus
Sedat Dilek
Tero Roponen
Thierry Reding
Tollef Fog Heen
Tomasz Torcz
Tom Callaway
Tom Gundersen
Toshio Kuratomi
William Jon McCann
Wulf C. Krueger
Zbigniew Jędrzejewski-Szmek

And everybody else who I (or git shortlog) forgot.

Thank you!

Lennart and Kay

BTW, the interface stability promise is valid now.

systemd for Developers I

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/socket-activation.html

systemd
not only brings improvements for administrators and users, it also
brings a (small) number of new APIs with it. In this blog story (which might
become the first of a series) I hope to shed some light on one of the
most important new APIs in systemd:

Socket Activation

In the original blog
story about systemd
I tried to explain why socket activation is a
wonderful technology to spawn services. Let’s reiterate the background
here a bit.

The basic idea of socket activation is not new. The inetd
superserver was a standard component of most Linux and Unix systems
since time began: instead of spawning all local Internet services
already at boot, the superserver would listen on behalf of the
services and whenever a connection would come in an instance of the
respective service would be spawned. This allowed relatively weak
machines with few resources to offer a big variety of services at the
same time. However it quickly got a reputation for being somewhat
slow: since daemons would be spawned for each incoming connection a
lot of time was spent on forking and initialization of the services
— once for each connection, instead of once for them all.

Spawning one instance per connection was how inetd was primarily
used, even though inetd actually understood another mode: on the first
incoming connection it would notice this via poll() (or
select()) and spawn a single instance for all future
connections. (This was controllable with the
wait/nowait options.) That way the first connection
would be slow to set up, but subsequent ones would be as fast as with
a standalone service. In this mode inetd would work in a true
on-demand mode: a service would be made available lazily when it was
required.

inetd’s focus was clearly on AF_INET (i.e. Internet) sockets. As
time progressed and Linux/Unix left the server niche and became
increasingly relevant on desktops, mobile and embedded environments
inetd was somehow lost in the troubles of time. Its reputation for
being slow, and the fact that Linux’ focus shifted away from only
Internet servers made a Linux machine running inetd (or one of its newer
implementations, like xinetd) the exception, not the rule.

When Apple engineers worked on optimizing the MacOS boot time they
found a new way to make use of the idea of socket activation: they
shifted the focus away from AF_INET sockets towards AF_UNIX
sockets. And they noticed that on-demand socket activation was only
part of the story: much more powerful is socket activation when used
for all local services including those which need to be started
anyway on boot. They implemented these ideas in launchd, a central building
block of modern MacOS X systems, and probably the main reason why
MacOS is so fast booting up.

But, before we continue, let’s have a closer look what the benefits
of socket activation for non-on-demand, non-Internet services in
detail are. Consider the four services Syslog, D-Bus, Avahi and the
Bluetooth daemon. D-Bus logs to Syslog, hence on traditional Linux
systems it would get started after Syslog. Similarly, Avahi requires
Syslog and D-Bus, hence would get started after both. Finally
Bluetooth is similar to Avahi and also requires Syslog and D-Bus but
does not interface at all with Avahi. Sinceoin a traditional
SysV-based system only one service can be in the process of getting
started at a time, the following serialization of startup would take
place: Syslog → D-Bus → Avahi → Bluetooth (Of course, Avahi and
Bluetooth could be started in the opposite order too, but we have to
pick one here, so let’s simply go alphabetically.). To illustrate
this, here’s a plot showing the order of startup beginning with system
startup (at the top).

Parallelization plot

Certain distributions tried to improve this strictly serialized
start-up: since Avahi and Bluetooth are independent from each other,
they can be started simultaneously. The parallelization is increased,
the overall startup time slightly smaller. (This is visualized in the
middle part of the plot.)

Socket activation makes it possible to start all four services
completely simultaneously, without any kind of ordering. Since the
creation of the listening sockets is moved outside of the daemons
themselves we can start them all at the same time, and they are able
to connect to each other’s sockets right-away. I.e. in a single step
the /dev/log and /run/dbus/system_bus_socket sockets
are created, and in the next step all four services are spawned
simultaneously. When D-Bus then wants to log to syslog, it just writes
its messages to /dev/log. As long as the socket buffer does
not run full it can go on immediately with what else it wants to do
for initialization. As soon as the syslog service catches up it will
process the queued messages. And if the socket buffer runs full then
the client logging will temporarily block until the socket is writable
again, and continue the moment it can write its log messages. That
means the scheduling of our services is entirely done by the kernel:
from the userspace perspective all services are run at the same time,
and when one service cannot keep up the others needing it will
temporarily block on their request but go on as soon as these
requests are dispatched. All of this is completely automatic and
invisible to userspace. Socket activation hence allows us to
drastically parallelize start-up, enabling simultaneous start-up of
services which previously were thought to strictly require
serialization. Most Linux services use sockets as communication
channel. Socket activation allows starting of clients and servers of
these channels at the same time.

But it’s not just about parallelization. It offers a number of
other benefits:

  • We no longer need to configure dependencies explicitly. Since the
    sockets are initialized before all services they are simply available,
    and no userspace ordering of service start-up needs to take place
    anymore. Socket activation hence drastically simplifies configuration
    and development of services.
  • If a service dies its listening socket stays around, not losing a
    single message. After a restart of the crashed service it can continue
    right where it left off.
  • If a service is upgraded we can restart the service while keeping
    around its sockets, thus ensuring the service is continously
    responsive. Not a single connection is lost during the upgrade.
  • We can even replace a service during runtime in a way that is
    invisible to the client. For example, all systems running systemd
    start up with a tiny syslog daemon at boot which passes all log
    messages written to /dev/log on to the kernel message
    buffer. That way we provide reliable userspace logging starting from
    the first instant of boot-up. Then, when the actual rsyslog daemon is
    ready to start we terminate the mini daemon and replace it with the
    real daemon. And all that while keeping around the original logging
    socket and sharing it between the two daemons and not losing a single
    message. Since rsyslog flushes the kernel log buffer to disk after
    start-up all log messages from the kernel, from early-boot and from
    runtime end up on disk.

For another explanation of this idea consult the original blog
story about systemd
.

Socket activation has been available in systemd since its
inception. On Fedora 15 a number of services have been modified to
implement socket activation, including Avahi, D-Bus and rsyslog (to continue with the example above).

systemd’s socket activation is quite comprehensive. Not only classic
sockets are support but related technologies as well:

  • AF_UNIX sockets, in the flavours SOCK_DGRAM, SOCK_STREAM and SOCK_SEQPACKET; both in the filesystem and in the abstract namespace
  • AF_INET sockets, i.e. TCP/IP and UDP/IP; both IPv4 and IPv6
  • Unix named pipes/FIFOs in the filesystem
  • AF_NETLINK sockets, to subscribe to certain kernel features. This
    is currently used by udev, but could be useful for other
    netlink-related services too, such as audit.
  • Certain special files like /proc/kmsg or device nodes like /dev/input/*.
  • POSIX Message Queues

A service capable of socket activation must be able to receive its
preinitialized sockets from systemd, instead of creating them
internally. For most services this requires (minimal)
patching. However, since systemd actually provides inetd compatibility
a service working with inetd will also work with systemd — which is
quite useful for services like sshd for example.

So much about the background of socket activation, let’s now have a
look how to patch a service to make it socket activatable. Let’s start
with a theoretic service foobard. (In a later blog post we’ll focus on
real-life example.)

Our little (theoretic) service includes code like the following for
creating sockets (most services include code like this in one way or
another):

/* Source Code Example #1: ORIGINAL, NOT SOCKET-ACTIVATABLE SERVICE */
...
union {
        struct sockaddr sa;
        struct sockaddr_un un;
} sa;
int fd;

fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd < 0) {
        fprintf(stderr, "socket(): %m\n");
        exit(1);
}

memset(&sa, 0, sizeof(sa));
sa.un.sun_family = AF_UNIX;
strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));

if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
        fprintf(stderr, "bind(): %m\n");
        exit(1);
}

if (listen(fd, SOMAXCONN) < 0) {
        fprintf(stderr, "listen(): %m\n");
        exit(1);
}
...

A socket activatable service may use the following code instead:

/* Source Code Example #2: UPDATED, SOCKET-ACTIVATABLE SERVICE */
...
#include "sd-daemon.h"
...
int fd;

if (sd_listen_fds(0) != 1) {
        fprintf(stderr, "No or too many file descriptors received.\n");
        exit(1);
}

fd = SD_LISTEN_FDS_START + 0;
...

systemd might pass you more than one socket (based on
configuration, see below). In this example we are interested in one
only. sd_listen_fds()
returns how many file descriptors are passed. We simply compare that
with 1, and fail if we got more or less. The file descriptors systemd
passes to us are inherited one after the other beginning with fd
#3. (SD_LISTEN_FDS_START is a macro defined to 3). Our code hence just
takes possession of fd #3.

As you can see this code is actually much shorter than the
original. This of course comes at the price that our little service
with this change will no longer work in a non-socket-activation
environment. With minimal changes we can adapt our example to work nicely
both with and without socket activation:

/* Source Code Example #3: UPDATED, SOCKET-ACTIVATABLE SERVICE WITH COMPATIBILITY */
...
#include "sd-daemon.h"
...
int fd, n;

n = sd_listen_fds(0);
if (n > 1) {
        fprintf(stderr, "Too many file descriptors received.\n");
        exit(1);
} else if (n == 1)
        fd = SD_LISTEN_FDS_START + 0;
else {
        union {
                struct sockaddr sa;
                struct sockaddr_un un;
        } sa;

        fd = socket(AF_UNIX, SOCK_STREAM, 0);
        if (fd < 0) {
                fprintf(stderr, "socket(): %m\n");
                exit(1);
        }

        memset(&sa, 0, sizeof(sa));
        sa.un.sun_family = AF_UNIX;
        strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));

        if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
                fprintf(stderr, "bind(): %m\n");
                exit(1);
        }

        if (listen(fd, SOMAXCONN) < 0) {
                fprintf(stderr, "listen(): %m\n");
                exit(1);
        }
}
...

With this simple change our service can now make use of socket
activation but still works unmodified in classic environments. Now,
let’s see how we can enable this service in systemd. For this we have
to write two systemd unit files: one describing the socket, the other
describing the service. First, here’s foobar.socket:

[Socket]
ListenStream=/run/foobar.sk

[Install]
WantedBy=sockets.target

And here’s the matching service file foobar.service:

[Service]
ExecStart=/usr/bin/foobard

If we place these two files in /etc/systemd/system we can
enable and start them:

# systemctl enable foobar.socket
# systemctl start foobar.socket

Now our little socket is listening, but our service not running
yet. If we now connect to /run/foobar.sk the service will be
automatically spawned, for on-demand service start-up. With a
modification of foobar.service we can start our service
already at startup, thus using socket activation only for
parallelization purposes, not for on-demand auto-spawning anymore:

[Service]
ExecStart=/usr/bin/foobard

[Install]
WantedBy=multi-user.target

And now let’s enable this too:

# systemctl enable foobar.service
# systemctl start foobar.service

Now our little daemon will be started at boot and on-demand,
whatever comes first. It can be started fully in parallel with its
clients, and when it dies it will be automatically restarted when it
is used the next time.

A single .socket file can include multiple ListenXXX stanzas, which
is useful for services that listen on more than one socket. In this
case all configured sockets will be passed to the service in the exact
order they are configured in the socket unit file. Also,
you may configure various socket settings in the .socket
files.

In real life it’s a good idea to include description strings in
these unit files, to keep things simple we’ll leave this out of our
example. Speaking of real-life: our next installment will cover an
actual real-life example. We’ll add socket activation to the CUPS
printing server.

The sd_listen_fds() function call is defined in sd-daemon.h
and sd-daemon.c. These
two files are currently drop-in .c sources which projects should
simply copy into their source tree. Eventually we plan to turn this
into a proper shared library, however using the drop-in files allows
you to compile your project in a way that is compatible with socket
activation even without any compile time dependencies on
systemd. sd-daemon.c is liberally licensed, should compile
fine on the most exotic Unixes and the algorithms are trivial enough
to be reimplemented with very little code if the license should
nonetheless be a problem for your project. sd-daemon.c
contains a couple of other API functions besides
sd_listen_fds() that are useful when implementing socket
activation in a project. For example, there’s sd_is_socket()
which can be used to distuingish and identify particular sockets when
a service gets passed more than one.

Let me point out that the interfaces used here are in no way bound
directly to systemd. They are generic enough to be implemented in
other systems as well. We deliberately designed them as simple and
minimal as possible to make it possible for others to adopt similar
schemes.

Stay tuned for the next installment. As mentioned, it will cover a
real-life example of turning an existing daemon into a
socket-activatable one: the CUPS printing service. However, I hope
this blog story might already be enough to get you started if you plan
to convert an existing service into a socket activatable one. We
invite everybody to convert upstream projects to this scheme. If you
have any questions join us on #systemd on freenode.

PulseAudio Saves Power

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/pa-and-power.html

#nocomments yes

D.
Jansen has put up a blog story
including some power saving results when
running PulseAudio on modern HDA drivers.
This shows off some work Pierre-Louis Bossart from Intel did on the HDA drivers
which now enables the timer-based scheduling code in PulseAudio I added quite
some time ago to come to its full potential. You can save half a Watt and
reduce wakeups while playing audio to 1 wakeup/s.

Previously there was little public profiling data available about the
benefits PA brings you for low-power devices. Thanks to Dennis’ data there’s now
public data available that hopefully explains why PA is the best choice for
low-power devices as well as desktops. Hopefully this cleans up some misconceptions.

Pierre-Louis, thanks for your work!

Update: Arun Raghavan has posted a follow-up to this.

Why systemd?

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/why.html

systemd is
still a young project, but it is not a baby anymore. The initial
announcement
I posted precisely a year ago. Since then most of the
big distributions have decided to adopt it in one way or another, many
smaller distributions have already switched. The first big
distribution with systemd by default will be Fedora 15, due end of
May. It is expected that the others will follow the lead a bit later
(with one exception). Many
embedded developers have already adopted it too, and there’s even a company specializing on engineering and
consulting services for systemd
. In short: within one year
systemd became a really successful project.

However, there are still folks who we haven’t won over yet. If you
fall into one of the following categories, then please have a look on
the comparison of init systems below:

  • You are working on an embedded project and are wondering whether
    it should be based on systemd.
  • You are a user or administrator and wondering which distribution
    to pick, and are pondering whether it should be based on systemd or
    not.
  • You are a user or administrator and wondering why your favourite
    distribution has switched to systemd, if everything already worked so
    well before.
  • You are developing a distribution that hasn’t switched yet, and
    you are wondering whether to invest the work and go systemd.

And even if you don’t fall into any of these categories, you might still
find the comparison interesting.

We’ll be comparing the three most relevant init systems for Linux:
sysvinit, Upstart and systemd. Of course there are other init systems
in existance, but they play virtually no role in the big
picture. Unless you run Android (which is a completely different beast
anyway), you’ll almost definitely run one of these three init systems
on your Linux kernel. (OK, or busybox, but then you are basically not
running any init system at all.) Unless you have a soft spot for
exotic init systems there’s little need to look further. Also, I am
kinda lazy, and don’t want to spend the time on analyzing those other
systems in enough detail to be completely fair to them.

Speaking of fairness: I am of course one of the creators of
systemd. I will try my best to be fair to the other two contenders,
but in the end, take it with a grain of salt. I am sure though that
should I be grossly unfair or otherwise incorrect somebody will point
it out in the comments of this story, so consider having a look on
those, before you put too much trust in what I say.

We’ll look at the currently implemented features in a released
version. Grand plans don’t count.

General Features

sysvinit Upstart systemd
Interfacing via D-Bus no yes yes
Shell-free bootup no no yes
Modular C coded early boot services included no no yes
Read-Ahead no no[1] yes
Socket-based Activation no no[2] yes
Socket-based Activation: inetd compatibility no no[2] yes
Bus-based Activation no no[3] yes
Device-based Activation no no[4] yes
Configuration of device dependencies with udev rules no no yes
Path-based Activation (inotify) no no yes
Timer-based Activation no no yes
Mount handling no no[5] yes
fsck handling no no[5] yes
Quota handling no no yes
Automount handling no no yes
Swap handling no no yes
Snapshotting of system state no no yes
XDG_RUNTIME_DIR Support no no yes
Optionally kills remaining processes of users logging out no no yes
Linux Control Groups Integration no no yes
Audit record generation for started services no no yes
SELinux integration no no yes
PAM integration no no yes
Encrypted hard disk handling (LUKS) no no yes
SSL Certificate/LUKS Password handling, including Plymouth, Console, wall(1), TTY and GNOME agents no no yes
Network Loopback device handling no no yes
binfmt_misc handling no no yes
System-wide locale handling no no yes
Console and keyboard setup no no yes
Infrastructure for creating, removing, cleaning up of temporary and volatile files no no yes
Handling for /proc/sys sysctl no no yes
Plymouth integration no yes yes
Save/restore random seed no no yes
Static loading of kernel modules no no yes
Automatic serial console handling no no yes
Unique Machine ID handling no no yes
Dynamic host name and machine meta data handling no no yes
Reliable termination of services no no yes
Early boot /dev/log logging no no yes
Minimal kmsg-based syslog daemon for embedded use no no yes
Respawning on service crash without losing connectivity no no yes
Gapless service upgrades no no yes
Graphical UI no no yes
Built-In Profiling and Tools no no yes
Instantiated services no yes yes
PolicyKit integration no no yes
Remote access/Cluster support built into client tools no no yes
Can list all processes of a service no no yes
Can identify service of a process no no yes
Automatic per-service CPU cgroups to even out CPU usage between them no no yes
Automatic per-user cgroups no no yes
SysV compatibility yes yes yes
SysV services controllable like native services yes no yes
SysV-compatible /dev/initctl yes no yes
Reexecution with full serialization of state yes no yes
Interactive boot-up no[6] no[6] yes
Container support (as advanced chroot() replacement) no no yes
Dependency-based bootup no[7] no yes
Disabling of services without editing files yes no yes
Masking of services without editing files no no yes
Robust system shutdown within PID 1 no no yes
Built-in kexec support no no yes
Dynamic service generation no no yes
Upstream support in various other OS components yes no yes
Service files compatible between distributions no no yes
Signal delivery to services no no yes
Reliable termination of user sessions before shutdown no no yes
utmp/wtmp support yes yes yes
Easily writable, extensible and parseable service files, suitable for manipulation with enterprise management tools no no yes

[1] Read-Ahead implementation for Upstart available in separate package ureadahead, requires non-standard kernel patch.

[2] Socket activation implementation for Upstart available as preview, lacks parallelization support hence entirely misses the point of socket activation.

[3] Bus activation implementation for Upstart posted as patch, not merged.

[4] udev device event bridge implementation for Upstart available as preview, forwards entire udev database into Upstart, not practical.

[5] Mount handling utility mountall for Upstart available in separate package, covers only boot-time mounts, very limited dependency system.

[6] Some distributions offer this implemented in shell.

[7] LSB init scripts support this, if they are used.

Available Native Service Settings

sysvinit Upstart systemd
OOM Adjustment no yes[1] yes
Working Directory no yes yes
Root Directory (chroot()) no yes yes
Environment Variables no yes yes
Environment Variables from external file no no yes
Resource Limits no some[2] yes
umask no yes yes
User/Group/Supplementary Groups no no yes
IO Scheduling Class/Priority no no yes
CPU Scheduling Nice Value no yes yes
CPU Scheduling Policy/Priority no no yes
CPU Scheduling Reset on fork() control no no yes
CPU affinity no no yes
Timer Slack no no yes
Capabilities Control no no yes
Secure Bits Control no no yes
Control Group Control no no yes
High-level file system namespace control: making directories inacessible no no yes
High-level file system namespace control: making directories read-only no no yes
High-level file system namespace control: private /tmp no no yes
High-level file system namespace control: mount inheritance no no yes
Input on Console yes yes yes
Output on Syslog no no yes
Output on kmsg/dmesg no no yes
Output on arbitrary TTY no no yes
Kill signal control no no yes
Conditional execution: by identified CPU virtualization/container no no yes
Conditional execution: by file existance no no yes
Conditional execution: by security framework no no yes
Conditional execution: by kernel command line no no yes

[1] Upstart supports only the deprecated oom_score_adj mechanism, not the current oom_adj logic.

[2] Upstart lacks support for RLIMIT_RTTIME and RLIMIT_RTPRIO.

Note that some of these options are relatively easily added to SysV
init scripts, by editing the shell sources. The table above focusses
on easily accessible options that do not require source code
editing.

Miscellaneous

sysvinit Upstart systemd
Maturity > 15 years 6 years 1 year
Specialized professional consulting and engineering services available no no yes
SCM Subversion Bazaar git
Copyright-assignment-free contributing yes no yes

Summary

As the tables above hopefully show in all clarity systemd
has left behind both sysvinit and Upstart in almost every
aspect. With the exception of the project’s age/maturity systemd wins
in every category. At this point in time it will be very hard for
sysvinit and Upstart to catch up with the features systemd provides
today. In one year we managed to push systemd forward much further
than Upstart has been pushed in six.

It is our intention to drive forward the development of the Linux
platform with systemd. In the next release cycle we will focus more
strongly on providing the same features and speed improvement we
already offer for the system to the user login session. This will
bring much closer integration with the other parts of the OS and
applications, making the most of the features the service manager
provides, and making it available to login sessions. Certain
components such as ConsoleKit will be made redundant by these
upgrades, and services relying on them will be updated. The
burden for maintaining these then obsolete components
will be passed on the vendors who plan to continue to rely on
them.

If you are wondering whether or not to adopt systemd, then systemd
obviously wins when it comes to mere features. Of course that should
not be the only aspect to keep in mind. In the long run, sticking with
the existing infrastructure (such as ConsoleKit) comes at a price:
porting work needs to take place, and additional maintainance work for
bitrotting code needs to be done. Going it on your own means increased
workload.

That said, adopting systemd is also not free. Especially if you
made investments in the other two solutions adopting systemd means
work. The basic work to adopt systemd is relatively minimal for
porting over SysV systems (since compatibility is provided), but can
mean substantial work when coming from Upstart. If you plan to go for
a 100% systemd system without any SysV compatibility (recommended for
embedded, long run goal for the big distributions) you need to be
willing to invest some work to rewrite init scripts as simple systemd
unit files.

systemd is in the process of becoming a comprehensive, integrated
and modular platform providing everything needed to bootstrap and
maintain an operating system’s userspace. It includes C rewrites of
all basic early boot init scripts that are shipped with the various
distributions. Especially for the embedded case adopting systemd
provides you in one step with almost everything you need, and you can
pick the modules you want. The other two init systems are singular
individual components, which to be useful need a great number of
additional components with differing interfaces. The emphasis of
systemd to provide a platform instead of just a component allows for
closer integration, and cleaner APIs. Sooner or later this will
trickle up to the applications. Already, there are accepted XDG
specifications (e.g. XDG basedir spec, more specifically
XDG_RUNTIME_DIR) that are not supported on the other init systems.

systemd is also a big opportunity for Linux standardization. Since
it standardizes many interfaces of the system that previously have
been differing on every distribution, on every implementation,
adopting it helps to work against the balkanization of the Linux
interfaces. Choosing systemd means redefining more closely
what the Linux platform is about. This improves the lifes of
programmers, users and administrators alike.

I believe that momentum is clearly with systemd. We invite you to
join our community and be part of that momentum.

systemd for Administrators, Part VIII

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/the-new-configuration-files.html

Another episode of my
ongoing
series
on
systemd
for
Administrators:

The New Configuration Files

One of the formidable new features of systemd is
that it comes with a complete set of modular early-boot services that are
written in simple, fast, parallelizable and robust C, replacing the
shell “novels” the various distributions featured before. Our little
Project Zero Shell[1] has been a full success. We currently
cover pretty much everything most desktop and embedded
distributions should need, plus a big part of the server needs:

  • Checking and mounting of all file systems
  • Updating and enabling quota on all file systems
  • Setting the host name
  • Configuring the loopback network device
  • Loading the SELinux policy and relabelling /run and /dev as necessary on boot
  • Registering additional binary formats in the kernel, such as Java, Mono and WINE binaries
  • Setting the system locale
  • Setting up the console font and keyboard map
  • Creating, removing and cleaning up of temporary and volatile files and directories
  • Applying mount options from /etc/fstab to pre-mounted API VFS
  • Applying sysctl kernel settings
  • Collecting and replaying readahead information
  • Updating utmp boot and shutdown records
  • Loading and saving the random seed
  • Statically loading specific kernel modules
  • Setting up encrypted hard disks and partitions
  • Spawning automatic gettys on serial kernel consoles
  • Maintenance of Plymouth
  • Machine ID maintenance
  • Setting of the UTC distance for the system clock

On a standard Fedora 15 install, only a few legacy and storage
services still require shell scripts during early boot. If you don’t
need those, you can easily disable them end enjoy your shell-free boot
(like I do every day). The shell-less boot systemd offers you is a
unique feature on Linux.

Many of these small components are configured via configuration
files in /etc. Some of these are fairly standardized among
distributions and hence supporting them in the C implementations was
easy and obvious. Examples include: /etc/fstab,
/etc/crypttab or /etc/sysctl.conf. However, for
others no standardized file or directory existed which forced us to add
#ifdef orgies to our sources to deal with the different
places the distributions we want to support store these things. All
these configuration files have in common that they are dead-simple and
there is simply no good reason for distributions to distuingish
themselves with them: they all do the very same thing, just
a bit differently.

To improve the situation and benefit from the unifying force that
systemd is we thus decided to read the per-distribution configuration
files only as fallbacks — and to introduce new configuration
files as primary source of configuration wherever applicable. Of
course, where possible these standardized configuration files should
not be new inventions but rather just standardizations of the best
distribution-specific configuration files previously used. Here’s a
little overview over these new common configuration files systemd
supports on all distributions:

  • /etc/hostname:
    the host name for the system. One of the most basic and trivial
    system settings. Nonetheless previously all distributions used
    different files for this. Fedora used /etc/sysconfig/network,
    OpenSUSE /etc/HOSTNAME. We chose to standardize on the
    Debian configuration file /etc/hostname.
  • /etc/vconsole.conf:
    configuration of the default keyboard mapping and console font.
  • /etc/locale.conf:
    configuration of the system-wide locale.
  • /etc/modules-load.d/*.conf:
    a drop-in directory for kernel modules to statically load at
    boot (for the very few that still need this).
  • /etc/sysctl.d/*.conf:
    a drop-in directory for kernel sysctl parameters, extending what you
    can already do with /etc/sysctl.conf.
  • /etc/tmpfiles.d/*.conf:
    a drop-in directory for configuration of runtime files that need to be
    removed/created/cleaned up at boot and during uptime.
  • /etc/binfmt.d/*.conf:
    a drop-in directory for registration of additional binary formats for
    systems like Java, Mono and WINE.
  • /etc/os-release:
    a standardization of the various distribution ID files like
    /etc/fedora-release and similar. Really every distribution
    introduced their own file here; writing a simple tool that just prints
    out the name of the local distribution usually means including a
    database of release files to check. The LSB tried to standardize
    something like this with the lsb_release
    tool, but quite frankly the idea of employing a shell script in this
    is not the best choice the LSB folks ever made. To rectify this we
    just decided to generalize this, so that everybody can use the same
    file here.
  • /etc/machine-id:
    a machine ID file, superseding D-Bus’ machine ID file. This file is
    guaranteed to be existing and valid on a systemd system, covering also
    stateless boots. By moving this out of the D-Bus logic it is hopefully
    interesting for a lot of additional uses as a unique and stable
    machine identifier.
  • /etc/machine-info:
    a new information file encoding meta data about a host, like a pretty
    host name and an icon name, replacing stuff like
    /etc/favicon.png and suchlike. This is maintained by systemd-hostnamed.

It is our definite intention to convince you to use these new
configuration files in your configuration tools: if your
configuration frontend writes these files instead of the old ones, it
automatically becomes more portable between Linux distributions, and
you are helping standardizing Linux. This makes things simpler to
understand and more obvious for users and administrators. Of course,
right now, only systemd-based distributions read these files, but that
already covers all important distributions in one way or another, except for one. And it’s a bit of a
chicken-and-egg problem: a standard becomes a standard by being
used. In order to gently push everybody to standardize on these files
we also want to make clear that sooner or later we plan to drop the
fallback support for the old configuration files from
systemd. That means adoption of this new scheme can happen slowly and piece
by piece. But the final goal of only having one set of configuration
files must be clear.

Many of these configuration files are relevant not only for
configuration tools but also (and sometimes even primarily) in
upstream projects. For example, we invite projects like Mono, Java, or
WINE to install a drop-in file in /etc/binfmt.d/ from their
upstream build systems. Per-distribution downstream support for binary
formats would then no longer be necessary and your platform would work
the same on all distributions. Something similar applies to all
software which need creation/cleaning of certain runtime files and
directories at boot, for example beneath the /run hierarchy
(i.e. /var/run as it used to be known). These
projects should just drop in configuration files in
/etc/tmpfiles.d, also from the upstream build systems. This
also helps speeding up the boot process, as separate per-project SysV
shell scripts which implement trivial things like registering a binary
format or removing/creating temporary/volatile files at boot are no
longer necessary. Or another example, where upstream support would be
fantastic: projects like X11 could probably benefit from reading the
default keyboard mapping for its displays from
/etc/vconsole.conf.

Of course, I have no doubt that not everybody is happy with our
choice of names (and formats) for these configuration files. In the
end we had to pick something, and from all the choices these appeared
to be the most convincing. The file formats are as simple as they can
be, and usually easily written and read even from shell scripts. That
said, /etc/bikeshed.conf could of course also have been a
fantastic configuration file name!

So, help us standardizing Linux! Use the new configuration files!
Adopt them upstream, adopt them downstream, adopt them all across the
distributions!

Oh, and in case you are wondering: yes, all of these files were
discussed in one way or another with various folks from the various
distributions. And there has even been some push towards supporting
some of these files even outside of systemd systems.

Footnotes

[1] Our slogan: “The only shell that should get started
during boot is gnome-shell!
” — Yes, the slogan needs a bit of
work, but you get the idea.

systemd for Administrators, Part VII

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/blame-game.html

Here’s yet another installment of my ongoing
series
on
systemd
for
Administrators:

The Blame Game

Fedora 15[1] is the first Fedora release to sport systemd. Our
primary goal for F15 was to get everything integrated and working
well. One focus for Fedora 16 will be to further polish and speed up
what we have in the distribution now. To prepare for this cycle we
have implemented a few tools (which are already available in F15),
which can help us pinpoint where exactly the biggest problems in our
boot-up remain. With this blog story I hope to shed some light on how
to figure out what to blame for your slow boot-up, and what to do
about it. We want to allow you to put the blame where the blame
belongs: on the system component responsible.

The first utility is a very simple one: systemd will automatically
write a log message with the time it needed to syslog/kmsg when it
finished booting up.

systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.

And here’s how you read this: 2s have been spent for kernel
initialization, until the time where the initial RAM disk (initrd,
i.e. dracut) was started. A bit less than 3s have then been spent in
the initrd. Finally, a bit less than 12s have been spent after the
actual system init daemon (systemd) has been invoked by the initrd to
bring up userspace. Summing this up the time that passed since the
boot loader jumped into the kernel code until systemd was finished
doing everything it needed to do at boot was a bit less than 17s. This
number is nice and simple to understand — and also easy to
misunderstand: it does not include the time that is spent initializing
your GNOME session, as that is outside of the scope of the init
system. Also, in many cases this is just where systemd finished doing
everything it needed to do. Very likely some daemons are still busy
doing whatever they need to do to finish startup when this time
is elapsed. Hence: while the time logged here is a good indication on
the general boot speed, it is not the time the user might feel
the boot actually takes.

Also, it is a pretty superficial value: it gives no insight which
system component systemd was waiting for all the time. To break this
up, we introduced the tool systemd-analyze blame:

$ systemd-analyze blame
  6207ms udev-settle.service
  5228ms cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service
   735ms NetworkManager.service
   642ms avahi-daemon.service
   600ms abrtd.service
   517ms rtkit-daemon.service
   478ms fedora-storage-init.service
   396ms dbus.service
   390ms rpcidmapd.service
   346ms systemd-tmpfiles-setup.service
   322ms fedora-sysinit-unhack.service
   316ms cups.service
   310ms console-kit-log-system-start.service
   309ms libvirtd.service
   303ms rpcbind.service
   298ms ksmtuned.service
   288ms lvm2-monitor.service
   281ms rpcgssd.service
   277ms sshd.service
   276ms livesys.service
   267ms iscsid.service
   236ms mdmonitor.service
   234ms nfslock.service
   223ms ksm.service
   218ms mcelog.service
...

This tool lists which systemd unit needed how much time to finish
initialization at boot, the worst offenders listed first. What we can
see here is that on this boot two services required more than 1s of
boot time: udev-settle.service and
cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service. This
tool’s output is easily misunderstood as well, it does not shed any
light on why the services in question actually need this much time, it
just determines that they did. Also note that the times listed here
might be spent “in parallel”, i.e. two services might be initializing
at the same time and thus the time spent to initialize them both is
much less than the sum of both individual times combined.

Let’s have a closer look at the worst offender on this boot: a
service by the name of udev-settle.service. So why does it
take that much time to initialize, and what can we do about it? This
service actually does very little: it just waits for the device
probing being done by udev to finish and then exits. Device probing
can be slow. In this instance for example, the reason for the device
probing to take more than 6s is the 3G modem built into the machine,
which when not having an inserted SIM card takes this long to respond
to software probe requests. The software probing is part of the logic
that makes ModemManager work and enables NetworkManager to offer easy
3G setup. An obvious reflex might now be to blame ModemManager for
having such a slow prober. But that’s actually ill-directed: hardware
probing quite frequently is this slow, and in the case of ModemManager
it’s a simple fact that the 3G hardware takes this long. It is an
essential requirement for a proper hardware probing solution that
individual probers can take this much time to finish probing. The
actual culprit is something else: the fact that we actually wait for
the probing, in other words: that udev-settle.service is part
of our boot process.

So, why is udev-settle.service part of our boot process?
Well, it actually doesn’t need to be. It is pulled in by the storage
setup logic of Fedora: to be precise, by the LVM, RAID and Multipath
setup script. These storage services have not been implemented in the
way hardware detection and probing work today: they expect to be
initialized at a point in time where “all devices have been probed”,
so that they can simply iterate through the list of available disks
and do their work on it. However, on modern machinery this is not how
things actually work: hardware can come and hardware can go all the
time, during boot and during runtime. For some technologies it is not
even possible to know when the device enumeration is complete
(example: USB, or iSCSI), thus waiting for all storage devices to show
up and be probed must necessarily include a fixed delay when it is
assumed that all devices that can show up have shown up, and got
probed. In this case all this shows very negatively in the boot time: the
storage scripts force us to delay bootup until all potential devices
have shown up and all devices that did got probed — and all that even
though we don’t actually need most devices for anything. In particular
since this machine actually does not make use of LVM, RAID or
Multipath![2]

Knowing what we know now we can go and disable
udev-settle.service for the next boots: since neither LVM,
RAID nor Multipath is used we can mask the services in question and
thus speed up our boot a little:

# ln -s /dev/null /etc/systemd/system/udev-settle.service
# ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service
# ln -s /dev/null /etc/systemd/system/fedora-storage-init.service
# systemctl daemon-reload

After restarting we can measure that the boot is now about 1s
faster. Why just 1s? Well, the second worst offender is cryptsetup
here: the machine in question has an encrypted
/home directory. For testing purposes I have stored the
passphrase in a file on disk, so that the boot-up is not delayed
because I as the user am a slow typer. The cryptsetup tool
unfortunately still takes more han 5s to set up the encrypted
partition. Being lazy instead of trying to fix
cryptsetup[3] we’ll just tape over it here [4]:
systemd will normally wait for all file systems not marked with the
noauto option in /etc/fstab to show up, to be fscked and to
be mounted before proceeding bootup and starting the usual system
services. In the case of /home (unlike for example
/var) we know that it is needed only very late (i.e. when the
user actually logs in). An easy fix is hence to make the mount point
available already during boot, but not actually wait until cryptsetup,
fsck and mount finished running for it. You ask how we can make a
mount point available before actually mounting the file system behind
it? Well, systemd possesses magic powers, in form of the
comment=systemd.automount mount option in
/etc/fstab. If you specify it, systemd will create an
automount point at /home and when at the time of the first
access to the file system it still isn’t backed by a proper file
system systemd will wait for the device, fsck and mount it.

And here’s the result with this change to /etc/fstab
made:

systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.

Nice! With a few fixes we took almost 7s off our boot-time. And
these two changes are only fixes for the two most superficial
problems. With a bit of love and detail work there’s a lot of
additional room for improvements. In fact, on a different machine, a
more than two year old X300 laptop (which even back then wasn’t the
fastest machine on earth) and a bit of decrufting we have boot times
of around 4s (total) now, with a resonably complete GNOME system. And there’s
still a lot of room in it.

systemd-analyze blame is a nice and simple tool for
tracking down slow services. However, it suffers by a big problem: it
does not visualize how the parallel execution of the services actually
diminishes the price one pays for slow starting services. For that we
have prepared systemd-analyize plot for you. Use it like
this:

$ systemd-analyze plot > plot.svg
$ eog plot.svg

It creates pretty graphs, showing the time services spent to start
up in relation to the other services. It currently doesn’t visualize
explicitly which services wait for which ones, but with a bit of guess
work this is easily seen nonetheless.

To see the effect of our two little optimizations here are two
graphs generated with systemd-analyze plot, the first before
and the other after our change:

Before After

(For the sake of completeness, here are the two complete outputs of
systemd-analyze blame for these two boots: before and after.)

The well-informed reader probably wonders how this relates to Michael Meeks’
bootchart
. This plot and bootchart do show similar graphs, that is
true. Bootchart is by far the more powerful tool. It plots in all
detail what is happening during the boot, how much CPU and IO is
used. systemd-analyze plot shows more high-level data: which
service took how much time to initialize, and what needed to wait for
it. If you use them both together you’ll have a wonderful toolset to
figure out why your boot is not as fast as it could be.

Now, before you now take these tools and start filing bugs against
the worst boot-up time offenders on your system: think twice. These
tools give you raw data, don’t misread it. As my optimization example
above hopefully shows, the blame for the slow bootup was not actually
with udev-settle.service, and not with the ModemManager
prober run by it either. It is with the subsystem that pulled this
service in in the first place. And that’s where the problem needs to
be fixed. So, file the bugs at the right places. Put the blame where
the blame belongs.

As mentioned, these three utilities are available on your Fedora 15
system out-of-the-box.

And here’s what to take home from this little blog story:

  • systemd-analyze is a wonderful tool and systemd comes
    with profiling built in.
  • Don’t misread the data these tools generate!
  • With two simple changes you might be able to speed up your system
    by 7s!
  • Fix your software if it can’t handle dynamic hardware
    properly!
  • The Fedora default of installing the OS on an enterprise-level
    storage managing system might be something to rethink.

And that’s all for now. Thank you for your interest.

Footnotes

[1] Also known as the greatest Free Software OS release
ever.

[2] The right fix here is to improve the services in
question to actively listen to hotplug events via libudev or similar
and act on the devices showing up as they show up, so that we can
continue with the bootup the instant everything we really need to go
on has shown up. To get a quick bootup we should wait for what we
actually need to proceed, not for everything. Also note that the
storage services are not the only services which do not cope well with
modern dynamic hardware, and assume that the device list is static and
stays unchanged. For example, in this example the reason the initrd is
actually as slow as it is is mostly due to the fact that Plymouth
expects to be executed when all video devices have shown up and have
been probed. For an unknown reason (at least unknown to me) loading
the video kernel modules for my Intel graphics cards takes multiple
seconds, and hence the entire boot is delayed unnecessarily. (Here too
I’d not put the blame on the probing but on the fact that we
wait for it to complete before going on.)

[3] Well, to be precise, I actually did try to get this
fixed. Most of the delay of crypsetup stems from the — in my eyes —
unnecessarily high default values for --iter-time in
cryptsetup. I tried to convince our cryptsetup maintainers that 100ms
as a default here are not really less secure than 1s, but well, I
failed.

[4] Of course, it’s usually not our style to just tape over
problems instead of fixing them, but this is such a nice occasion to
show off yet another cool systemd feature…

systemd for Administrators, Part VI

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/changing-roots.html

Here’s another installment of
my
ongoing
series
on
systemd for Administrators:

Changing Roots

As administrator or developer sooner or later you’ll ecounter chroot()
environments
. The chroot() system call simply shifts what
a process and all its children consider the root directory /, thus
limiting what the process can see of the file hierarchy to a subtree
of it. Primarily chroot() environments have two uses:

  1. For security purposes: In this use a specific isolated daemon is
    chroot()ed into a private subdirectory, so that when exploited the
    attacker can see only the subdirectory instead of the full OS
    hierarchy: he is trapped inside the chroot() jail.
  2. To set up and control a debugging, testing, building, installation
    or recovery image of an OS: For this a whole guest operating
    system hierarchy is mounted or bootstraped into a subdirectory of the
    host OS, and then a shell (or some other application) is started
    inside it, with this subdirectory turned into its /. To the shell it
    appears as if it was running inside a system that can differ greatly
    from the host OS. For example, it might run a different distribution
    or even a different architecture (Example: host x86_64, guest
    i386). The full hierarchy of the host OS it cannot see.

On a classic System-V-based operating system it is relatively easy
to use chroot() environments. For example, to start a specific daemon
for test or other reasons inside a chroot()-based guest OS tree, mount
/proc, /sys and a few other API file systems into
the tree, and then use chroot(1) to enter the chroot, and
finally run the SysV init script via /sbin/service from
inside the chroot.

On a systemd-based OS things are not that easy anymore. One of the
big advantages of systemd is that all daemons are guaranteed to be
invoked in a completely clean and independent context which is in no
way related to the context of the user asking for the service to be
started. While in sysvinit-based systems a large part of the execution
context (like resource limits, environment variables and suchlike) is
inherited from the user shell invoking the init skript, in systemd the
user just notifies the init daemon, and the init daemon will then fork
off the daemon in a sane, well-defined and pristine execution context
and no inheritance of the user context parameters takes place. While
this is a formidable feature it actually breaks traditional approaches
to invoke a service inside a chroot() environment: since the actual
daemon is always spawned off PID 1 and thus inherits the chroot()
settings from it, it is irrelevant whether the client which asked for
the daemon to start is chroot()ed or not. On top of that, since
systemd actually places its local communications sockets in
/run/systemd a process in a chroot() environment will not even
be able to talk to the init system (which however is probably a good thing, and the
daring can work around this of course by making use of bind
mounts.)

This of course opens the question how to use chroot()s properly in
a systemd environment. And here’s what we came up with for you, which
hopefully answers this question thoroughly and comprehensively:

Let’s cover the first usecase first: locking a daemon into a
chroot() jail for security purposes. To begin with, chroot() as a
security tool is actually quite dubious, since chroot() is not a
one-way street. It is relatively easy to escape a chroot()
environment, as even the
man page points out
. Only in combination with a few other
techniques it can be made somewhat secure. Due to that it usually
requires specific support in the applications to chroot() themselves
in a tamper-proof way. On top of that it usually requires a deep
understanding of the chroot()ed service to set up the chroot()
environment properly, for example to know which directories to bind mount from
the host tree, in order to make available all communication channels
in the chroot() the service actually needs. Putting this together,
chroot()ing software for security purposes is almost always done best
in the C code of the daemon itself. The developer knows best (or at
least should know best) how to properly secure down the
chroot(), and what the minimal set of files, file systems and
directories is the daemon will need inside the chroot(). These days a
number of daemons are capable of doing this, unfortunately however of
those running by default on a normal Fedora installation only two are
doing this: Avahi and
RealtimeKit. Both apparently written by the same really smart
dude. Chapeau! 😉 (Verify this easily by running ls -l
/proc/*/root
on your system.)

That all said, systemd of course does offer you a way to chroot()
specific daemons and manage them like any other with the usual
tools. This is supported via the RootDirectory= option in
systemd service files. Here’s an example:

[Unit]
Description=A chroot()ed Service

[Service]
RootDirectory=/srv/chroot/foobar
ExecStartPre=/usr/local/bin/setup-foobar-chroot.sh
ExecStart=/usr/bin/foobard
RootDirectoryStartOnly=yes

In this example, RootDirectory= configures where to
chroot() to before invoking the daemon binary specified with
ExecStart=. Note that the path specified in
ExecStart= needs to refer to the binary inside the chroot(),
it is not a path to the binary in the host tree (i.e. in this example
the binary executed is seen as
/srv/chroot/foobar/usr/bin/foobard from the host OS). Before
the daemon is started a shell script setup-foobar-chroot.sh
is invoked, whose purpose it is to set up the chroot environment as
necessary, i.e. mount /proc and similar file systems into it,
depending on what the service might need. With the
RootDirectoryStartOnly= switch we ensure that only the daemon
as specified in ExecStart= is chrooted, but not the
ExecStartPre= script which needs to have access to the full
OS hierarchy so that it can bind mount directories from there. (For
more information on these switches see the respective man
pages.)
If you place a unit file like this in
/etc/systemd/system/foobar.service you can start your
chroot()ed service by typing systemctl start
foobar.service
. You may then introspect it with systemctl
status foobar.service
. It is accessible to the administrator like
any other service, the fact that it is chroot()ed does — unlike on
SysV — not alter how your monitoring and control tools interact with
it.

Newer Linux kernels support file system namespaces. These are
similar to chroot() but a lot more powerful, and they do not
suffer by the same security problems as chroot(). systemd
exposes a subset of what you can do with file system namespaces right
in the unit files themselves. Often these are a useful and simpler
alternative to setting up full chroot() environment in a
subdirectory. With the switches ReadOnlyDirectories= and
InaccessibleDirectories= you may setup a file system
namespace jail for your service. Initially, it will be identical to
your host OS’ file system namespace. By listing directories in these
directives you may then mark certain directories or mount points of
the host OS as read-only or even completely inaccessible to the
daemon. Example:

[Unit]
Description=A Service With No Access to /home

[Service]
ExecStart=/usr/bin/foobard
InaccessibleDirectories=/home

This service will have access to the entire file system tree of the
host OS with one exception: /home will not be visible to it, thus
protecting the user’s data from potential exploiters. (See the
man page for details on these options.
)

File system namespaces are in fact a better replacement for
chroot()s in many many ways. Eventually Avahi and RealtimeKit
should probably be updated to make use of namespaces replacing
chroot()s.

So much about the security usecase. Now, let’s look at the other
use case: setting up and controlling OS images for debugging, testing,
building, installing or recovering.

chroot() environments are relatively simple things: they only
virtualize the file system hierarchy. By chroot()ing into a
subdirectory a process still has complete access to all system calls,
can kill all processes and shares about everything else with the host
it is running on. To run an OS (or a small part of an OS) inside a
chroot() is hence a dangerous affair: the isolation between host and
guest is limited to the file system, everything else can be freely
accessed from inside the chroot(). For example, if you upgrade a
distribution inside a chroot(), and the package scripts send a SIGTERM
to PID 1 to trigger a reexecution of the init system, this will
actually take place in the host OS! On top of that, SysV shared
memory, abstract namespace sockets and other IPC primitives are shared
between host and guest. While a completely secure isolation for
testing, debugging, building, installing or recovering an OS is
probably not necessary, a basic isolation to avoid accidental
modifications of the host OS from inside the chroot() environment is
desirable: you never know what code package scripts execute which
might interfere with the host OS.

To deal with chroot() setups for this use systemd offers you a
couple of features:

First of all, systemctl detects when it is run in a
chroot. If so, most of its operations will become NOPs, with the
exception of systemctl enable and systemctl
disable
. If a package installation script hence calls these two
commands, services will be enabled in the guest OS. However, should a
package installation script include a command like systemctl
restart
as part of the package upgrade process this will have no
effect at all when run in a chroot() environment.

More importantly however systemd comes out-of-the-box with the systemd-nspawn
tool which acts as chroot(1) on steroids: it makes use of file system
and PID namespaces to boot a simple lightweight container on a file
system tree. It can be used almost like chroot(1), except that the
isolation from the host OS is much more complete, a lot more secure
and even easier to use. In fact, systemd-nspawn is capable of
booting a complete systemd or sysvinit OS in container with a single
command. Since it virtualizes PIDs, the init system in the container
can act as PID 1 and thus do its job as normal. In contrast to
chroot(1) this tool will implicitly mount /proc,
/sys for you.

Here’s an example how in three commands you can boot a Debian OS on
your Fedora machine inside an nspawn container:

# yum install debootstrap
# debootstrap --arch=amd64 unstable debian-tree/
# systemd-nspawn -D debian-tree/

This will bootstrap the OS directory tree and then simply invoke a
shell in it. If you want to boot a full system in the container, use a
command like this:

# systemd-nspawn -D debian-tree/ /sbin/init

And after a quick bootup you should have a shell prompt, inside a
complete OS, booted in your container. The container will not be able
to see any of the processes outside of it. It will share the network
configuration, but not be able to modify it. (Expect a couple of
EPERMs during boot for that, which however should not be
fatal). Directories like /sys and /proc/sys are
available in the container, but mounted read-only in order to avoid
that the container can modify kernel or hardware configuration. Note
however that this protects the host OS only from accidental
changes of its parameters. A process in the container can manually
remount the file systems read-writeable and then change whatever it
wants to change.

So, what’s so great about systemd-nspawn again?

  1. It’s really easy to use. No need to manually mount /proc
    and /sys into your chroot() environment. The tool will do it
    for you and the kernel automatically cleans it up when the container
    terminates.
  2. The isolation is much more complete, protecting the host OS from
    accidental changes from inside the container.
  3. It’s so good that you can actually boot a full OS in the
    container, not just a single lonesome shell.
  4. It’s actually tiny and installed everywhere where systemd is
    installed. No complicated installation or setup.

systemd itself has been modified to work very well in such a
container. For example, when shutting down and detecting that it is
run in a container, it just calls exit(), instead of reboot() as last
step.

Note that systemd-nspawn is not a full container
solution. If you need that LXC is the better choice for
you. It uses the same underlying kernel technology but offers a lot
more, including network virtualization. If you so will,
systemd-nspawn is the GNOME 3 of container solutions:
slick and trivially easy to use — but with few configuration
options. LXC OTOH is more like KDE: more configuration options than lines of
code. I wrote systemd-nspawn specifically to cover testing,
debugging, building, installing, recovering. That’s what you should use
it for and what it is really good at, and where it is a much much nicer
alternative to chroot(1).

So, let’s get this finished, this was already long enough. Here’s what to take home from
this little blog story:

  1. Secure chroot()s are best done natively in the C sources of your program.
  2. ReadOnlyDirectories=, InaccessibleDirectories=
    might be suitable alternatives to a full chroot() environment.
  3. RootDirectory= is your friend if you want to chroot() a specific service.
  4. systemd-nspawn is made of awesome.
  5. chroot()s are lame, file system namespaces are totally l33t.

All of this is readily available on your Fedora 15 system.

And that’s it for today. See you again for the next installment.

GNOME 3.0 Is Out!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/gnome3.html

The next generation desktop has arrived. I am running it as I type this, and so should you. So, go, get it!

If you are in Berlin on Friday you should also attend our GNOME
3.0 Release Party
. It’s at the world famous c-base, in the remains of an alien spaceship
that crashed into Berlin 4.5 billion years ago (no kidding!). We’ve got
Ubuntu’s Daniel Holbach as DJ, and a
few folks from the GNOME community will do a talk or two (including that
annoying dude who created Avahi, PulseAudio and systemd). We even got Mirko Boehm
from the KDE side to say a few things. And there are going to be GNOME 3
goodies! How awesome is that? See the
wiki page for further details.

And here’s your homework until Friday: Try out GNOME 3.0!

I am GNOME

The GNOME 3.0 Live CD

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/live-cd.html

The
Fedora GNOME 3.0 Live CD
is made of awesome. Not just because it showcases
the awesomeness that is GNOME 3, but also because it’s built on an awesome
systemd-based OS. Double awesome!

So, get it, play with it. It’s the future of computing: GNOME and systemd
and Linux. Triple awesome!

And did I mention that F15 is going the awesomest OS release ever?

Nope, there’s no April 1st joke in here. It’s really honestly just …
awesome!

Final Reminder

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/final-reminder.html

Citizens! GNOMErs! Only two days are left and the GUADEC/Desktop Summit CFP
is over (end date is Friday). Submit your
presentation proposal now
, or it is too late. Read the CFP.

Oh, and regarding the need for a KDE identity account: due to limited
manpower we decided to reuse existing infrastucture instead of setting up a
completely new one. We do acknowledge that this is not ideal and we’d like to
ask for your understanding. (Creating a KDE identity account is unrestricted,
and you can easily create one even if you never had anything to do with KDE in
your life.)

Note that we are looking for both lightning talks and full-length
presentations. If you are interested in doing a lightning talk (and we can only
encourage you to), please use the same form to make your submission.

Desktop Summit/GUADEC 2011 CFP ends in one Week

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/cfp-ends-in-one-week.html

I’d like to remind everybody that only one week is left until the Desktop Summit (aka GUADEC 2011) Call for Participation ends. We want your talk proposals, and that quickly, before it’s too late!

Berlin in summer is fantastic. You wouldn’t want to miss that, would you?

So, read the CFP again, and then submit something.

The CFP ends next friday. So hurry!

Thank you,
      Lennart

systemd for Administrators, Part V

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/three-levels-of-off.html

It has been a while since the last
installment
of
my systemd
for
Administrators
series,
but now with the release of Fedora 15 based on systemd looming, here’s
a new episode:

The Three Levels of “Off”

In systemd,
there are three levels of turning off a service (or other unit). Let’s
have a look which those are:

  1. You can stop a service. That simply terminates the
    running instance of the service and does little else. If due to some
    form of activation (such as manual activation, socket activation, bus
    activation, activation by system boot or activation by hardware plug)
    the service is requested again afterwards it will be started. Stopping
    a service is hence a very simple, temporary and superficial
    operation. Here’s an example how to do this for the NTP service:

    $ systemctl stop ntpd.service

    This is roughly equivalent to the following traditional command which is available on most SysV inspired systems:

    $ service ntpd stop

    In fact, on Fedora 15, if you execute the latter command it will be transparently converted to the former.

  2. You can disable a service. This unhooks a service from
    its activation triggers. That means, that depending on your service it
    will no longer be activated on boot, by socket or bus activation or by
    hardware plug (or any other trigger that applies to it). However, you
    can still start it manually if you wish. If there is already a started
    instance disabling a service will not have the effect of stopping
    it. Here’s an example how to disable a service:

    $ systemctl disable ntpd.service

    On traditional Fedora systems, this is roughly equivalent to the following command:

    $ chkconfig ntpd off

    And here too, on Fedora 15, the latter command will be
    transparently converted to the former, if necessary.

    Often you want to combine stopping and disabling a service, to
    get rid of the current instance and make sure it is not started again (except when manually triggered):

    $ systemctl disable ntpd.service
    $ systemctl stop ntpd.service

    Commands like this are for example used during package
    deinstallation of systemd services on Fedora.

    Disabling a service is a permanent change; until you undo it it
    will be kept, even across reboots.

  3. You can mask a service. This is like disabling a service,
    but on steroids. It not only makes sure that service is not started
    automatically anymore, but even ensures that a service cannot even be
    started manually anymore. This is a bit of a hidden feature in
    systemd, since it is not commonly useful and might be confusing the
    user. But here’s how you do it:

    $ ln -s /dev/null /etc/systemd/system/ntpd.service
    $ systemctl daemon-reload

    By symlinking a service file to /dev/null you tell
    systemd to never start the service in question and completely block
    its execution. Unit files stored in /etc/systemd/system
    override those from /lib/systemd/system that carry the same
    name. The former directory is administrator territory, the latter
    terroritory of your package manager. By installing your symlink in
    /etc/systemd/system/ntpd.service you hence make sure that
    systemd will never read the upstream shipped service file
    /lib/systemd/system/ntpd.service.

    systemd will recognize units symlinked to /dev/null and
    show them as masked. If you try to start such a service manually (via
    systemctl start for example) this will fail with an
    error.

    A similar trick on SysV systems does not (officially) exist. However,
    there are a few unofficial hacks, such as editing the init script and
    placing an exit 0 at the top, or removing its execution
    bit. However, these solutions have various drawbacks, for example they
    interfere with the package manager.

    Masking a service is a permanent change, much like disabling a service.

Now that we learned how to turn off services on three levels,
there’s only one question left: how do we turn them on again? Well,
it’s quite symmetric. use systemctl start to undo
systemctl stop. Use systemctl enable to undo
systemctl disable and use rm to undo
ln.

And that’s all for now. Thank you for your attention!

FOSDEM Talk on Video

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fosdem2011-video.html

If you have already watched my presentation on
systemd I gave at linux.conf.au 2011
then this video of my talk on
the same topic which I have gave at FOSDEM
2011
in Brussels, Belgium will probably not be all new to you, but the
questions from the audience (and hopefully my responses) might answer a
question or two you might still have. So do watch it:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on YouTube.

Oh, and FOSDEM rocked, like every year!

LCA Talk on Video

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lca2011-video.html

I won’t spare you the video of my talk about systemd at linux.conf.au 2011 in Brisbane, Australia last week:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on blip.tv.

LCA was fantastic and especially impressive given the circumstances of the recent floodings in Queensland. Really good conference, and congratulations to the organizers!

FOSDEM Interview with Yours Truly

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fosdem2011.html

The FOSDEM organizers just published a brief interview with yours
truly
regarding the presentation
about systemd
I will be giving there
on Sat. Feb. 5th,
3pm
. If you come to Brussels make sure to drop by! And even
if you don’t have a look on the interview!

If you don’t make it to Brussels, there are two more stops in my
little systemd World Tour in the next weeks: today
(Wed. Jan. 26th,
2:30pm
) I will be speaking at linux.conf.au in Brisbane,
Australia. And
on Fri. Feb. 11th,
1:20pm
I’ll be speaking at the Red Hat Developer Conference in
Brno, Czech Republic.