Tag Archives: Projects

systemd for Administrators, Part IV

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-4.html

Here’s the fourth installment of my ongoing
series
about
systemd

for administrators
.

Killing Services

Killing a system daemon is easy, right? Or is it?

Sure, as long as your daemon persists only of a single process this might
actually be somewhat true. You type killall rsyslogd and the syslog
daemon is gone. However it is a bit dirty to do it like that given that this
will kill all processes which happen to be called like this, including those an
unlucky user might have named that way by accident. A slightly more correct
version would be to read the .pid file, i.e. kill `cat
/var/run/syslogd.pid`
. That already gets us much further, but still, is
this really what we want?

More often than not it actually isn’t. Consider a service like Apache, or
crond, or atd, which as part of their usual operation spawn child processes.
Arbitrary, user configurable child processes, such as cron or at jobs, or CGI
scripts, even full application servers. If you kill the main apache/crond/atd
process this might or might not pull down the child processes too, and it’s up
to those processes whether they want to stay around or go down as well.
Basically that means that terminating Apache might very well cause its CGI
scripts to stay around, reassigned to be children of init, and difficult to
track down.

systemd to
the rescue: With systemctl kill you can easily send a signal to all
processes of a service. Example:

# systemctl kill crond.service

This will ensure that SIGTERM is delivered to all processes of the crond
service, not just the main process. Of course, you can also send a different
signal if you wish. For example, if you are bad-ass you might want to go for
SIGKILL right-away:

# systemctl kill -s SIGKILL crond.service

And there you go, the service will be brutally slaughtered in its entirety,
regardless how many times it forked, whether it tried to escape supervision by
double forking or fork bombing.

Sometimes all you need is to send a specific signal to the main process of a
service, maybe because you want to trigger a reload via SIGHUP. Instead of going via the
PID file, here’s an easier way to do this:

# systemctl kill -s HUP --kill-who=main crond.service

So again, what is so new and fancy about killing services in systemd? Well,
for the first time on Linux we can actually properly do that. Previous
solutions were always depending on the daemons to actually cooperate to bring
down everything they spawned if they themselves terminate. However, usually if
you want to use SIGTERM or SIGKILL you are doing that because they actually do
not cooperate properly with you.

How does this relate to systemctl stop? kill goes directly
and sends a signal to every process in the group, however stop goes
through the official configured way to shut down a service, i.e. invokes the
stop command configured with ExecStop= in the service file. Usually
stop should be sufficient. kill is the tougher version, for
cases where you either don’t want the official shutdown command of a service to
run, or when the service is hosed and hung in other ways.

(It’s up to you BTW to specify signal names with or without the SIG prefix
on the -s switch. Both works.)

It’s a bit surprising that we have come so far on Linux without even being
able to properly kill services. systemd for the first time enables you to do
this properly.

systemd Status Update

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-update-2.html

It has been a
while since my last status update on systemd
. Here’s another short,
incomprehensive status update on what we worked on for systemd since then.

  • Fedora F15 (Rawhide) now includes a split up
    /etc/init.d/rc.sysinit (Bill Nottingham). This allows us to keep only
    a minimal compatibility set of shell scripts around, and boot otherwise a
    system without any shell scripts at all. In fact, shell scripts during early
    boot are only used in exceptional cases, i.e. when you enabled autoswapping
    (bad idea anyway), when a full SELinux relabel is necessary, during the first
    boot after initialization, if you have static kernel modules to load (which are
    not configured via the systemd-native way to do that), if you boot from a
    read-only NFS server, or when you rely on LVM/RAID/Multipath. If nothing of
    this applies to you can easily disable these parts of early boot and
    save several seconds on boot. How to do this I will describe in a later blog
    story.
  • We have a fully C coded shutdown logic that kills all remaining processes,
    unmounts all remaining file systems, detaches all loop devices and DM volumes
    and does that in the right way to ensure that all these things are properly
    teared down even if they depend on each other in arbitrary ways. This is not
    only considerably faster then the traditional shell hackery for this, but also
    a lot safer, since we try to unmount/remount the remaining file systems with a
    little bit of brains. This feature is available via systemctl --force
    poweroff
    to the administrator. The --force controls whether the
    usual shutdown of all services is run or whether this is skipped and we
    immediately shall enter this final C shutdown logic. Using --force
    hence is a much safer replacement for the old /sbin/reboot -f and does
    not leave dirty file systems behind. (Thanks to Fabiano Fidencio has his
    colleagues from ProFUSION for this).
  • systemd now includes a minmalistic readahead implementation, based on
    fanotify(), fadvise() and mincore(). It supports btrfs defragmentation and both
    SSD and HDD disks. While the effect on boots that are anyway fast (such as most
    stuff involving SSD) is minimal, slower and older machines benefit from this
    more substantially.
  • We now control fsck and quota during early boot with a C tool that ensure
    maximum parallelization but properly implements the necessary high-level
    administration logic.
  • Every service, every user and every user session now gets its own cgroup in
    the ‘cpu’ hierarchy thus creating better fairness between the logged in users
    and their sessions.
  • We now provide /dev/log logging from early boot to late shutdown.
    If no syslog daemon is running the output is passed on to kmsg. As soon as a
    proper syslog daemon starts up the kmsg buffer is flushed to syslog, and hence
    we will have complete log coverage in syslog even for early boot.
  • systemctl kill was introduced, an easy command to send a signal to
    all processes of a service. Expect a blog story with more details about this
    shortly.
  • systemd gained the ability to load the SELinux policy if necessary, thus
    supporting non-initrd boots and initrd boots from the same binary with no
    duplicate work. This is in fact (and surprisingly) a first among Linux init
    systems.
  • We now initialize and set the system locale inside PID 1 to be inherited by
    all services and users.
  • systemd has native support for /etc/crypttab and can activate
    encrypted LUKS/dm-crypt disks both at boot-up and during runtime. A minimal
    password querying infrastructure is available, where multiple agents can be
    used to present the password to the user. During boot the password is queried
    either via Plymouth or directly on the console. If a system crypto disk is
    plugged in after boot you are queried for the password via a GNOME agent, or a
    wall(1) agent. Finally, while you run systemctl start (or a similar
    command) a minimal TTY password agent is available which asks you for passwords
    right-away if this is necessary. The password querying logic is very simple,
    additional agents can be implemented in a trivial amount of code (Yupp, KDE folks, you
    can add an agent for this, too). Note that the password querying logic in
    systemd is only for non-user passwords, i.e. passwords that have no relation to
    a specific user, but rather to specific hardware or system software. In future
    we hope to extend this so that this can be used to query the password of SSL
    certificates when Apache or other servers start.
  • We offer a minimal interface that external projects can use to extend the
    dependency graph systemd manages. In fact, the cryptsetup logic mentioned above
    is implemented via this ‘plugin’-like system. Since we did not want to add code
    that deals with cryptographic disks into the systemd process itself we
    introduced this interface (after all cryptographic volumes are not an essential
    feature of a minimal OS, and unncessary on most embedded uses; also the future
    might bring us STC which might make this at least partially obsolete). Simply
    by dropping a generator binary into
    /lib/systemd/system-generators which should write out systemd unit
    files into a temporary directory third-party packages may extend the systemd
    dependency tree dynamically. This could be useful for example to automatically
    create a systemd service for each KVM machine or LXC container. With that in
    place those containers/machines could be managed and supervised with the same
    tools as the usual system services.
  • We integrated automatic clean-up of directories such as /tmp into
    the tmpfiles logic we already had in place that recreates files and
    directories on volatile file systems such as /var/run,
    /var/lock or /tmp.
  • We now always measure and write to the log files the system startup time we
    measured, broken up into how many time was spent on the kernel, the initrd and
    the initialization of userspace.
  • We now safely destroy all user session before going down. This is a feature
    long missing on Linux: since user processes were not killed until the very last
    moment the unhealthy situation that user code was running at a time where no
    other daemon was remaining was a normal part of shutdown.
  • systemd now understands an ‘extreme’ form of disabling a service: if you
    symlink a service name in /etc/systemd/system to /dev/null
    then systemd will mark it as masked and completely refuse starting it,
    regardless if this is requested manually or automaticallly. Normally it should
    be sufficient to simply call systemctl disable to disable a service
    which still allows manual activation but no automatic activation. Masking a
    service goes one step further.
  • There’s now a simple condition syntax in places which allows
    skipping or enabling units depending on the existance of a file, whether a
    directory is empty or whether a kernel command line option is set.
  • In addition to normal shutdowns for reboot, halt or poweroff we now
    similarly support a kexec reboot, that reboots the machine without going though
    the BIOS code again.
  • We have bash completion support for systemctl. (Ran Benita)
  • Andrew Edmunds contributed basic support to boot Ubuntu with systemd.
  • Michael Biebl and Tollef Fog Heen have worked on the systemd integration
    into Debian to a level that it is now possible to boot a system without having
    the old initscripts packaged installed. For more details see the Debian Wiki. Michael even
    tested this integration on an Ubuntu Natty system and as it turns out this
    works almost equally well on Ubuntu already. If you are interesting in playing
    around with this, ping Michael.

And that’s it for now. There’s a lot of other stuff in the git commits, but
most of it is smaller and I will it thus spare you.

We have come quite far in the last year. systemd is about a year old now,
and we are now able to boot a system without legacy shell scripts remaining,
something that appeared to be a task for the distant future.

All of this is available in systemd 13 and in F15/Rawhide as I type
this. If you want to play around with this then consider installing Rawhide
(it’s fun!).

27C3 Fudfest

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/ccc-nervt.html

I really wonder why on earth the 27C3 accepted a nonsensical
paper like this
into their programme. So .. stupid. You read half the
proposal and it’s already kinda obvious that the presenter has no idea what he
is talking of. Fundamental errors, obvious misinterpretations, outdated issues:
this is just FUD.

And apparently this talk even is anonymous? Such a coward! FUDing around
anonymously is acceptable at the CCC?

Linux Plumbers Conference/Gnome Summit Recap

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2010-recap.html

Last week LPC and GS 2010 took place in Cambridge,
MA. Like the last years, LPC showed again that — at least for me — it is one of
the most relevant Linux conferences in existence, if not the single most
relevant one.

Here’s a terse, incomprehensive report of the different discussions I took
part in with various folks at the conference, in no particular order:

The Boot
and Init
track led by Kay Sievers (Suse) was a great success. We had
exciting talks which I think helped quite a bit in clearing a few things up,
and hopefully helps us in consolidating the full Linux boot process among all
the components involved. We had talks covering everything from the BIOS boot,
to initrds, graphical boot splashes and systemd. Kay
Sievers and I spoke about systemd, also covering the state of it in the Fedora
and openSUSE distributions. Gustavo Barbieri (ProFUSION, Gentoo) and Michael
Biebl (Debian) gave interesting talks about systemd adoption in their
respective distributions. I was particularly interested in the various
statistics Michael showed about SysV/LSB init script usage in Debian, because
this gives an idea how much work we have in front of us in the long run. A
longer discussion about the future of initrds and the logic necessary to find
the root file system on boot was quite enlightening. I think this track was
helpful to increase the unification and consolidation of the way Linux systems
boot up and are maintained during runtime.

Kay and I and some other folks sat down with Arjan van de Ven (Intel), to
talk about the prospects of systemd in Meego. The discussions were very
positive. In particular Arjan hat some great suggestions regarding use of the
Simple
Boot Flag
in systemd (expect this in one of the next versions) and
readahead. Before systemd can find adoption in Meego we’d have to add a short
number of features to systemd first, most of them should be easy to add.

Similarly, I sat down with Martin Pitt and James Hunt (both Canonical) and
discussed systemd in relation to Ubuntu. I think we managed to clear a lot of
things up, and have a good chance to improve cooperation between Ubuntu and
systemd in relation to APIs and maybe even more.

We talked to Thomas Gleixner regarding userspace notifications when the
wallclock time jumps relative to the monotonic clock. This is important to
systemd so that we can schedule calendar jobs similar to cron, but without
having to wake up periodically to check whether the wallclock time changed
relatively to the monotonic clock so that we can recalculate the next
point in time a calendar event is triggered. There has been previous work in
this area in the kernel world, but nothing got merged. Thomas’ suggestion how to
add this facility should be much easier than anything proposed so far.

I also tried to talk Andreas Grünbacher into supporting file system
user extended attributes in various virtual file systems such as procfs,
cgroupfs, sysfs and tmpfs. I hope I convinced him that this would be a good
idea, since this would allow setting externally accessible attributes to all
kinds of kernel objects, such as processes and devices. This would not only
have uses in systemd (where we could easily store all meta information systemd
needs to know about a service in the cgroupfs via xattrs, so that systemd could
even crash or go away at any time and we still can read all runtime information
necessary beyond mere cgrouping from the file system when systemd comes to live
again) but also in the desktop environments, so that we could for example
attach the human readable application name, an icon or a desktop file to the
processes currently running, in a simple way where the data we attach follows
the lifecycle of the process itself.

The Audio track
went really well, too. I was particularly excited about Pierre-Louis Bossart’s
(Intel) plans regarding AC3 (and other codecs) support in PulseAudio, and the simplicity of his
approach. Also great was hearing about Laurent Pinchart’s project to expose
audio and video device routing to userspace. Finally, I really enjoyed David
Henningsson’s and Luke Yelavich’s (both Canonical) talk regarding tracking down audio bugs on
Ubuntu. I was really impressed by the elaborate tools they created to test
audio drivers on users machines. Pretty cool stuff. Maybe this can be extended
into a test suite for driver writers, because the current approach for driver
writers (i.e. “If PulseAudio works correctly, your driver is correct”) doesn’t
really scale (although I like the idea and take it as a compliment…). I also
liked the timechart profiling results Pierre showed me that he generated for
PulseAudio. Seems PulseAudio is behaving quite nicely these days.

Together with Harald Hoyer I got a demo of David Zeuthen’s disk assembly
daemon (stc), which makes RAID/MD/LVM assembly more dynamic. Great stuff, and I
think we convinced him to leave actual mounting of file systems to systemd
instead of doing it himself.

Harald and I also hashed out a few things to make integration between dracut
and systemd nicer (i.e. passing along profiling information between the two,
and information regarding the root fsck).

I also hope I convinced Ray Strode to make Plymouth actively listen to udev
for notifications about DRM devices, so that further synchronization between
udev and plymouth won’t be necessary, which both makes things more robust and a
little bit faster.

Kay and I talked to Greg Kroah-Hartman regarding the brokeness of
VT_WAITEVENT in kernel TTY layer, and discussed what to do about this. After returning from the US Kay now
did the necessary hacking work to provide a minimal sysfs based solution that
allows userspace query to which TTYs /dev/console and
/dev/tty0 currently point, and get notifications when this changes.
This should allow us to greatly simplify ConsoleKit and make it possible to
add console-triggered activation to systemd (think: getty gets started the
moment you switch to its virtual terminal, not already at boot).

I also spent some time discussing the upcoming deadline scheduling kernel
logic with Dario, Dhaval and Tommaso regarding its possible use in PulseAudio.
I believe deadline schedule is a useful tool to hand out real-time scheduling
to applications securely. As an easy path to supporting deadline scheduling in
PulseAudio I suggested patching RealtimeKit to optionally use deadline
scheduling for its clients. This would magically teach PA (and other clients) to
use deadline scheduling without further patching in the clients.

At GNOME Summit I sat down with Ryan Lortie and Will Thompson to discuss the
the future of the D-Bus session bus and how we can move to a machine/user bus
instead in a nice way. We managed to come to a nice agreement here, and this
should enable us to introduce systemd for session management soonishly. Now we
only need to convince the other folks having stakes in D-Bus that what we
discussed is actually a good idea, expect more about this soon on dbus-devel.
Ryan and I also hashed out our remaining differences regarding the exact
semantics of XDG_RUNTIME_DIR, the result of which you can already
see on the XDG mailing list
. Ryan already did the GLib work to introduce
XDG_RUNTIME_DIR and systemd already supports this inofficially since a few
versions.

I quite appreciate how Michael Meeks quoted me in his final
keynote. 😉

There was a lot of other stuff going on at the conference, and what I
wrote above is in no way complete. And of course, besides all the technical
stuff, it was great meeting all the good Linux folks again, especially my
colleagues from Red Hat.

I am still amazed how systemd is received so positively and with open arms
all across the board. It’s particularly amazing that systemd at this point in
time has already been adopted by various companies in the automotive and
aviation industry.

Off to LPC 2010, Boston

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2010.html

Later this week the Linux Plumbers Conference 2010 will take place at the
Hyatt Regency in Cambridge.

Together with Mark Brown I’ll be running the conference track about Audio,
and I believe we managed to put together quite a nice schedule with various
interesting talks covering many areas of what Audio on Linux is about.

I’ll also be around at the Boot
and Init Systems
track which Kay Sievers is running. Together with Kay I’ll
do a session about systemd,
everybody’s favourite system and session manager. We also managed to convince a
number of distribution maintainers of systemd to do short presentations about
the state of systemd adoption in their respective distributions: Michael Biebl
from Debian, Gustavo Barbieri from Gentoo, Kay for openSUSE and yours truly for
Fedora.

Because there never can be enough systemd coverage at a conference I’ll do
another talk about systemd, in Vincent Untz’ Desktop
track, this time focussing less on how to boot and maintain a system, but more
on doing the same for desktop sessions, in particular GNOME.

I’ll also stick around for the the first two days of the GNOME Boston Summit.

See you in Cambridge!

FOSS.in CFP Deadline Approaching!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/foss-in-2010.html

I just submitted my paper[1] for FOSS.in 2010 in
Bangalore/India. Don’t forget to submit yours! The CFP closes on 10th of
October. That’s this Sunday! Hurry up, before it is too late!

FOSS.in is one of the most amazing Free Software conferences this world has
to offer (hey, and I think I can say that because I have presented at quite a few). A dedicated
audience, flawless organization, magic hospitality, and all this in incredible
India! Both the technical programme and everything around it are impressive. Which
other conference can offer you a concert of one of
India’s greatest acts
as part of the schedule? Which other international
conference host city can be such a positive attack on your senses as Bangalore (see
that endless sea of flowers below)? And where else do they serve pure silver as part of the conference catering?

Bangalore Market

Read the CFP! Or, go straight to submitting a paper.

Footnotes

[1] About systemd.

systemd for Administrators, Part III

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-3.html

Here’s the third installment of my ongoing
series about
systemd
for administrators
.

How Do I Convert A SysV Init Script Into A systemd Service File?

Traditionally, Unix and Linux services (daemons) are started
via SysV init scripts. These are Bourne Shell scripts, usually
residing in a directory such as /etc/rc.d/init.d/ which when
called with one of a few standardized arguments (verbs) such as
start, stop or restart controls,
i.e. starts, stops or restarts the service in question. For starts
this usually involves invoking the daemon binary, which then forks a
background process (more precisely daemonizes). Shell scripts
tend to be slow, needlessly hard to read, very verbose and
fragile. Although they are immensly flexible (after all, they are just
code) some things are very hard to do properly with shell scripts,
such as ordering parallized execution, correctly supervising processes
or just configuring execution contexts in all detail. systemd provides
compatibility with these shell scripts, but due to the shortcomings
pointed out it is recommended to install native systemd service files
for all daemons installed. Also, in contrast to SysV init scripts
which have to be adjusted to the distribution systemd service files
are compatible with any kind of distribution running systemd (which
become more and more these days…). What follows is a terse guide how
to take a SysV init script and translate it into a native systemd
service file. Ideally, upstream projects should ship and install
systemd service files in their tarballs. If you have successfully
converted a SysV script according to the guidelines it might hence be
a good idea to submit the file as patch to upstream. How to prepare a
patch like that will be discussed in a later installment, suffice to
say at this point that the daemon(7)
manual page shipping with systemd contains a lot of useful information
regarding this.

So, let’s jump right in. As an example we’ll convert the init
script of the ABRT daemon into a systemd service file. ABRT is a
standard component of every Fedora install, and is an acronym for
Automatic Bug Reporting Tool, which pretty much describes what it
does, i.e. it is a service for collecting crash dumps. Its SysV script I have uploaded
here.

The first step when converting such a script is to read it
(surprise surprise!) and distill the useful information from the
usually pretty long script. In almost all cases the script consists of
mostly boilerplate code that is identical or at least very similar in
all init scripts, and usually copied and pasted from one to the
other. So, let’s extract the interesting information from the script
linked above:

  • A description string for the service is “Daemon to detect
    crashing apps
    “. As it turns out, the header comments include a
    redundant number of description strings, some of them describing less
    the actual service but the init script to start it. systemd services
    include a description too, and it should describe the service and not
    the service file.
  • The LSB header[1] contains dependency
    information. systemd due to its design around socket-based activation
    usually needs no (or very little) manually configured
    dependencies. (For details regarding socket activation see the original
    announcement blog post.
    ) In this case the dependency on
    $syslog (which encodes that abrtd requires a syslog daemon),
    is the only valuable information. While the header lists another
    dependency ($local_fs) this one is redundant with systemd as
    normal system services are always started with all local file systems
    available.
  • The LSB header suggests that this service should be started in
    runlevels 3 (multi-user) and 5 (graphical).
  • The daemon binary is /usr/sbin/abrtd

And that’s already it. The entire remaining content of this
115-line shell script is simply boilerplate or otherwise redundant
code: code that deals with synchronizing and serializing startup
(i.e. the code regarding lock files) or that outputs status messages
(i.e. the code calling echo), or simply parsing of the verbs (i.e. the
big case block).

From the information extracted above we can now write our systemd service file:

[Unit]
Description=Daemon to detect crashing apps
After=syslog.target

[Service]
ExecStart=/usr/sbin/abrtd
Type=forking

[Install]
WantedBy=multi-user.target

A little explanation of the contents of this file: The
[Unit] section contains generic information about the
service. systemd not only manages system services, but also devices,
mount points, timer, and other components of the system. The generic
term for all these objects in systemd is a unit, and the
[Unit] section encodes information about it that might be
applicable not only to services but also in to the other unit types
systemd maintains. In this case we set the following unit settings: we
set the description string and configure that the daemon shall be
started after Syslog[2], similar to what is encoded in the
LSB header of the original init script. For this Syslog dependency we
create a dependency of type After= on a systemd unit
syslog.target. The latter is a special target unit in systemd
and is the standardized name to pull in a syslog implementation. For
more information about these standardized names see the systemd.special(7). Note
that a dependency of type After= only encodes the suggested
ordering, but does not actually cause syslog to be started when abrtd
is — and this is exactly what we want, since abrtd actually works
fine even without syslog being around. However, if both are started
(and usually they are) then the order in which they are is controlled
with this dependency.

The next section is [Service] which encodes information
about the service itself. It contains all those settings that apply
only to services, and not the other kinds of units systemd maintains
(mount points, devices, timers, …). Two settings are used here:
ExecStart= takes the path to the binary to execute when the
service shall be started up. And with Type= we configure how
the service notifies the init system that it finished starting up. Since
traditional Unix daemons do this by returning to the parent process
after having forked off and initialized the background daemon we set
the type to forking here. That tells systemd to wait until
the start-up binary returns and then consider the processes still
running afterwards the daemon processes.

The final section is [Install]. It encodes information
about how the suggested installation should look like, i.e. under
which circumstances and by which triggers the service shall be
started. In this case we simply say that this service shall be started
when the multi-user.target unit is activated. This is a
special unit (see above) that basically takes the role of the classic
SysV Runlevel 3[3]. The setting WantedBy= has
little effect on the daemon during runtime. It is only read by the
systemctl enable command, which is the recommended way to
enable a service in systemd. This command will simply ensure that our
little service gets automatically activated as soon as
multi-user.target is requested, which it is on all normal
boots[4].

And that’s it. Now we already have a minimal working systemd
service file. To test it we copy it to
/etc/systemd/system/abrtd.service and invoke systemctl
daemon-reload
. This will make systemd take notice of it, and now
we can start the service with it: systemctl start
abrtd.service
. We can verify the status via systemctl status
abrtd.service
. And we can stop it again via systemctl stop
abrtd.service
. Finally, we can enable it, so that it is activated
by default on future boots with systemctl enable
abrtd.service
.

The service file above, while sufficient and basically a 1:1
translation (feature- and otherwise) of the SysV init script still has room for
improvement. Here it is a little bit updated:

[Unit]
Description=ABRT Automated Bug Reporting Tool
After=syslog.target

[Service]
Type=dbus
BusName=com.redhat.abrt
ExecStart=/usr/sbin/abrtd -d -s

[Install]
WantedBy=multi-user.target

So, what did we change? Two things: we improved the description
string a bit. More importantly however, we changed the type of the
service to dbus and configured the D-Bus bus name of the
service. Why did we do this? As mentioned classic SysV services
daemonize after startup, which usually involves double forking
and detaching from any terminal. While this is useful and necessary
when daemons are invoked via a script, this is unnecessary (and slow)
as well as counterproductive when a proper process babysitter such as
systemd is used. The reason for that is that the forked off daemon
process usually has little relation to the original process started by
systemd (after all the daemonizing scheme’s whole idea is to remove
this relation), and hence it is difficult for systemd to figure out
after the fork is finished which process belonging to the service is
actually the main process and which processes might just be
auxiliary. But that information is crucial to implement advanced
babysitting, i.e. supervising the process, automatic respawning on
abnormal termination, collectig crash and exit code information and
suchlike. In order to make it easier for systemd to figure out the
main process of the daemon we changed the service type to
dbus. The semantics of this service type are appropriate for
all services that take a name on the D-Bus system bus as last step of
their initialization[5]. ABRT is one of those. With this setting systemd
will spawn the ABRT process, which will no longer fork (this is
configured via the -d -s switches to the daemon), and systemd
will consider the service fully started up as soon as
com.redhat.abrt appears on the bus. This way the process
spawned by systemd is the main process of the daemon, systemd has a
reliable way to figure out when the daemon is fully started up and
systemd can easily supervise it.

And that’s all there is to it. We have a simple systemd service
file now that encodes in 10 lines more information than the original
SysV init script encoded in 115. And even now there’s a lot of room
left for further improvement utilizing more features systemd
offers. For example, we could set Restart=restart-always to
tell systemd to automatically restart this service when it dies. Or,
we could use OOMScoreAdjust=-500 to ask the kernel to please
leave this process around when the OOM killer wreaks havoc. Or, we
could use CPUSchedulingPolicy=idle to ensure that abrtd
processes crash dumps in background only, always allowing the kernel
to give preference to whatever else might be running and needing CPU
time.

For more information about the configuration options mentioned
here, see the respective man pages systemd.unit(5),
systemd.service(5),
systemd.exec(5). Or,
browse all of
systemd’s man pages
.

Of course, not all SysV scripts are as easy to convert as this
one. But gladly, as it turns out the vast majority actually are.

That’s it for today, come back soon for the next installment in our series.

Footnotes

[1] The LSB header of init scripts is a convention of
including meta data about the service in comment blocks at the top of
SysV init scripts and is
defined by the Linux Standard Base
. This was intended to
standardize init scripts between distributions. While most
distributions have adopted this scheme, the handling of the headers
varies greatly between the distributions, and in fact still makes it
necessary to adjust init scripts for every distribution. As such the LSB spec
never kept the promise it made.

[2] Strictly speaking, this dependency does not even have to
be encoded here, as it is redundant in a system where the Syslog
daemon is socket activatable. Modern syslog systems (for example
rsyslog v5) have been patched upstream to be socket-activatable. If
such a init system is used configuration of the
After=syslog.target dependency is redundant and
implicit. However, to maintain compatibility with syslog services that
have not been updated we include this dependency here.

[3] At least how it used to be defined on Fedora.

[4] Note that in systemd the graphical bootup
(graphical.target, taking the role of SysV runlevel 5) is an
implicit superset of the console-only bootup
(multi-user.target, i.e. like runlevel 3). That means hooking
a service into the latter will also hook it into the
former.

[5] Actually the majority of services of the default Fedora
install now take a name on the bus after startup.

systemd for Administrators, Part II

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-2.html

Here’s the second installment of my ongoing series about systemd for administrators.

Which Service Owns Which Processes?

On most Linux systems the number of processes that are running by
default is substantial. Knowing which process does what and where it
belongs to becomes increasingly difficult. Some services even maintain
a couple of worker processes which clutter the “ps” output with
many additional processes that are often not easy to recognize. This is
further complicated if daemons spawn arbitrary 3rd-party processes, as
Apache does with CGI processes, or cron does with user jobs.

A slight remedy for this is often the process inheritance tree, as
shown by “ps xaf“. However this is usually not reliable, as processes
whose parents die get reparented to PID 1, and hence all information
about inheritance gets lost. If a process “double forks” it hence loses
its relationships to the processes that started it. (This actually is
supposed to be a feature and is relied on for the traditional Unix
daemonizing logic.) Furthermore processes can freely change their names
with PR_SETNAME or by patching argv[0], thus making
it harder to recognize them. In fact they can play hide-and-seek with
the administrator pretty nicely this way.

In systemd we place every process that is spawned in a control
group
named after its service. Control groups (or cgroups)
at their most basic are simply groups of processes that can be
arranged in a hierarchy and labelled individually. When processes
spawn other processes these children are automatically made members of
the parents cgroup. Leaving a cgroup is not possible for unprivileged
processes. Thus, cgroups can be used as an effective way to label
processes after the service they belong to and be sure that the
service cannot escape from the label, regardless how often it forks or
renames itself. Furthermore this can be used to safely kill a service
and all processes it created, again with no chance of escaping.

In today’s installment I want to introduce you to two commands you
may use to relate systemd services and processes. The first one, is
the well known ps command which has been updated to show
cgroup information along the other process details. And this is how it
looks:

$ ps xawf -eo pid,user,cgroup,args
  PID USER     CGROUP                              COMMAND
    2 root     -                                   [kthreadd]
    3 root     -                                    \_ [ksoftirqd/0]
[...]
 4281 root     -                                    \_ [flush-8:0]
    1 root     name=systemd:/systemd-1             /sbin/init
  455 root     name=systemd:/systemd-1/sysinit.service /sbin/udevd -d
28188 root     name=systemd:/systemd-1/sysinit.service  \_ /sbin/udevd -d
28191 root     name=systemd:/systemd-1/sysinit.service  \_ /sbin/udevd -d
 1096 dbus     name=systemd:/systemd-1/dbus.service /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation
 1131 root     name=systemd:/systemd-1/auditd.service auditd
 1133 root     name=systemd:/systemd-1/auditd.service  \_ /sbin/audispd
 1135 root     name=systemd:/systemd-1/auditd.service      \_ /usr/sbin/sedispatch
 1171 root     name=systemd:/systemd-1/NetworkManager.service /usr/sbin/NetworkManager --no-daemon
 4028 root     name=systemd:/systemd-1/NetworkManager.service  \_ /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-wlan0.pid -lf /var/lib/dhclient/dhclient-7d32a784-ede9-4cf6-9ee3-60edc0bce5ff-wlan0.lease -
 1175 avahi    name=systemd:/systemd-1/avahi-daemon.service avahi-daemon: running [epsilon.local]
 1194 avahi    name=systemd:/systemd-1/avahi-daemon.service  \_ avahi-daemon: chroot helper
 1193 root     name=systemd:/systemd-1/rsyslog.service /sbin/rsyslogd -c 4
 1195 root     name=systemd:/systemd-1/cups.service cupsd -C /etc/cups/cupsd.conf
 1207 root     name=systemd:/systemd-1/mdmonitor.service mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
 1210 root     name=systemd:/systemd-1/irqbalance.service irqbalance
 1216 root     name=systemd:/systemd-1/dbus.service /usr/sbin/modem-manager
 1219 root     name=systemd:/systemd-1/dbus.service /usr/libexec/polkit-1/polkitd
 1242 root     name=systemd:/systemd-1/dbus.service /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid
 1249 68       name=systemd:/systemd-1/haldaemon.service hald
 1250 root     name=systemd:/systemd-1/haldaemon.service  \_ hald-runner
 1273 root     name=systemd:/systemd-1/haldaemon.service      \_ hald-addon-input: Listening on /dev/input/event3 /dev/input/event9 /dev/input/event1 /dev/input/event7 /dev/input/event2 /dev/input/event0 /dev/input/event8
 1275 root     name=systemd:/systemd-1/haldaemon.service      \_ /usr/libexec/hald-addon-rfkill-killswitch
 1284 root     name=systemd:/systemd-1/haldaemon.service      \_ /usr/libexec/hald-addon-leds
 1285 root     name=systemd:/systemd-1/haldaemon.service      \_ /usr/libexec/hald-addon-generic-backlight
 1287 68       name=systemd:/systemd-1/haldaemon.service      \_ /usr/libexec/hald-addon-acpi
 1317 root     name=systemd:/systemd-1/abrtd.service /usr/sbin/abrtd -d -s
 1332 root     name=systemd:/systemd-1/[email protected]/tty2 /sbin/mingetty tty2
 1339 root     name=systemd:/systemd-1/[email protected]/tty3 /sbin/mingetty tty3
 1342 root     name=systemd:/systemd-1/[email protected]/tty5 /sbin/mingetty tty5
 1343 root     name=systemd:/systemd-1/[email protected]/tty4 /sbin/mingetty tty4
 1344 root     name=systemd:/systemd-1/crond.service crond
 1346 root     name=systemd:/systemd-1/[email protected]/tty6 /sbin/mingetty tty6
 1362 root     name=systemd:/systemd-1/sshd.service /usr/sbin/sshd
 1376 root     name=systemd:/systemd-1/prefdm.service /usr/sbin/gdm-binary -nodaemon
 1391 root     name=systemd:/systemd-1/prefdm.service  \_ /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 --force-active-vt
 1394 root     name=systemd:/systemd-1/prefdm.service      \_ /usr/bin/Xorg :0 -nr -verbose -auth /var/run/gdm/auth-for-gdm-f2KUOh/database -nolisten tcp vt1
 1495 root     name=systemd:/user/lennart/1             \_ pam: gdm-password
 1521 lennart  name=systemd:/user/lennart/1                 \_ gnome-session
 1621 lennart  name=systemd:/user/lennart/1                     \_ metacity
 1635 lennart  name=systemd:/user/lennart/1                     \_ gnome-panel
 1638 lennart  name=systemd:/user/lennart/1                     \_ nautilus
 1640 lennart  name=systemd:/user/lennart/1                     \_ /usr/libexec/polkit-gnome-authentication-agent-1
 1641 lennart  name=systemd:/user/lennart/1                     \_ /usr/bin/seapplet
 1644 lennart  name=systemd:/user/lennart/1                     \_ gnome-volume-control-applet
 1646 lennart  name=systemd:/user/lennart/1                     \_ /usr/sbin/restorecond -u
 1652 lennart  name=systemd:/user/lennart/1                     \_ /usr/bin/devilspie
 1662 lennart  name=systemd:/user/lennart/1                     \_ nm-applet --sm-disable
 1664 lennart  name=systemd:/user/lennart/1                     \_ gnome-power-manager
 1665 lennart  name=systemd:/user/lennart/1                     \_ /usr/libexec/gdu-notification-daemon
 1670 lennart  name=systemd:/user/lennart/1                     \_ /usr/libexec/evolution/2.32/evolution-alarm-notify
 1672 lennart  name=systemd:/user/lennart/1                     \_ /usr/bin/python /usr/share/system-config-printer/applet.py
 1674 lennart  name=systemd:/user/lennart/1                     \_ /usr/lib64/deja-dup/deja-dup-monitor
 1675 lennart  name=systemd:/user/lennart/1                     \_ abrt-applet
 1677 lennart  name=systemd:/user/lennart/1                     \_ bluetooth-applet
 1678 lennart  name=systemd:/user/lennart/1                     \_ gpk-update-icon
 1408 root     name=systemd:/systemd-1/console-kit-daemon.service /usr/sbin/console-kit-daemon --no-daemon
 1419 gdm      name=systemd:/systemd-1/prefdm.service /usr/bin/dbus-launch --exit-with-session
 1453 root     name=systemd:/systemd-1/dbus.service /usr/libexec/upowerd
 1473 rtkit    name=systemd:/systemd-1/rtkit-daemon.service /usr/libexec/rtkit-daemon
 1496 root     name=systemd:/systemd-1/accounts-daemon.service /usr/libexec/accounts-daemon
 1499 root     name=systemd:/systemd-1/systemd-logger.service /lib/systemd/systemd-logger
 1511 lennart  name=systemd:/systemd-1/prefdm.service /usr/bin/gnome-keyring-daemon --daemonize --login
 1534 lennart  name=systemd:/user/lennart/1        dbus-launch --sh-syntax --exit-with-session
 1535 lennart  name=systemd:/user/lennart/1        /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
 1603 lennart  name=systemd:/user/lennart/1        /usr/libexec/gconfd-2
 1612 lennart  name=systemd:/user/lennart/1        /usr/libexec/gnome-settings-daemon
 1615 lennart  name=systemd:/user/lennart/1        /usr/libexec/gvfsd
 1626 lennart  name=systemd:/user/lennart/1        /usr/libexec//gvfs-fuse-daemon /home/lennart/.gvfs
 1634 lennart  name=systemd:/user/lennart/1        /usr/bin/pulseaudio --start --log-target=syslog
 1649 lennart  name=systemd:/user/lennart/1         \_ /usr/libexec/pulse/gconf-helper
 1645 lennart  name=systemd:/user/lennart/1        /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=24
 1668 lennart  name=systemd:/user/lennart/1        /usr/libexec/im-settings-daemon
 1701 lennart  name=systemd:/user/lennart/1        /usr/libexec/gvfs-gdu-volume-monitor
 1707 lennart  name=systemd:/user/lennart/1        /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
 1725 lennart  name=systemd:/user/lennart/1        /usr/libexec/clock-applet
 1727 lennart  name=systemd:/user/lennart/1        /usr/libexec/wnck-applet
 1729 lennart  name=systemd:/user/lennart/1        /usr/libexec/notification-area-applet
 1733 root     name=systemd:/systemd-1/dbus.service /usr/libexec/udisks-daemon
 1747 root     name=systemd:/systemd-1/dbus.service  \_ udisks-daemon: polling /dev/sr0
 1759 lennart  name=systemd:/user/lennart/1        gnome-screensaver
 1780 lennart  name=systemd:/user/lennart/1        /usr/libexec/gvfsd-trash --spawner :1.9 /org/gtk/gvfs/exec_spaw/0
 1864 lennart  name=systemd:/user/lennart/1        /usr/libexec/gvfs-afc-volume-monitor
 1874 lennart  name=systemd:/user/lennart/1        /usr/libexec/gconf-im-settings-daemon
 1903 lennart  name=systemd:/user/lennart/1        /usr/libexec/gvfsd-burn --spawner :1.9 /org/gtk/gvfs/exec_spaw/1
 1909 lennart  name=systemd:/user/lennart/1        gnome-terminal
 1913 lennart  name=systemd:/user/lennart/1         \_ gnome-pty-helper
 1914 lennart  name=systemd:/user/lennart/1         \_ bash
29231 lennart  name=systemd:/user/lennart/1         |   \_ ssh tango
 2221 lennart  name=systemd:/user/lennart/1         \_ bash
 4193 lennart  name=systemd:/user/lennart/1         |   \_ ssh tango
 2461 lennart  name=systemd:/user/lennart/1         \_ bash
29219 lennart  name=systemd:/user/lennart/1         |   \_ emacs systemd-for-admins-1.txt
15113 lennart  name=systemd:/user/lennart/1         \_ bash
27251 lennart  name=systemd:/user/lennart/1             \_ empathy
29504 lennart  name=systemd:/user/lennart/1             \_ ps xawf -eo pid,user,cgroup,args
 1968 lennart  name=systemd:/user/lennart/1        ssh-agent
 1994 lennart  name=systemd:/user/lennart/1        gpg-agent --daemon --write-env-file
18679 lennart  name=systemd:/user/lennart/1        /bin/sh /usr/lib64/firefox-3.6/run-mozilla.sh /usr/lib64/firefox-3.6/firefox
18741 lennart  name=systemd:/user/lennart/1         \_ /usr/lib64/firefox-3.6/firefox
28900 lennart  name=systemd:/user/lennart/1             \_ /usr/lib64/nspluginwrapper/npviewer.bin --plugin /usr/lib64/mozilla/plugins/libflashplayer.so --connection /org/wrapper/NSPlugins/libflashplayer.so/18741-6
 4016 root     name=systemd:/systemd-1/sysinit.service /usr/sbin/bluetoothd --udev
 4094 smmsp    name=systemd:/systemd-1/sendmail.service sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 4096 root     name=systemd:/systemd-1/sendmail.service sendmail: accepting connections
 4112 ntp      name=systemd:/systemd-1/ntpd.service /usr/sbin/ntpd -n -u ntp:ntp -g
27262 lennart  name=systemd:/user/lennart/1        /usr/libexec/mission-control-5
27265 lennart  name=systemd:/user/lennart/1        /usr/libexec/telepathy-haze
27268 lennart  name=systemd:/user/lennart/1        /usr/libexec/telepathy-logger
27270 lennart  name=systemd:/user/lennart/1        /usr/libexec/dconf-service
27280 lennart  name=systemd:/user/lennart/1        /usr/libexec/notification-daemon
27284 lennart  name=systemd:/user/lennart/1        /usr/libexec/telepathy-gabble
27285 lennart  name=systemd:/user/lennart/1        /usr/libexec/telepathy-salut
27297 lennart  name=systemd:/user/lennart/1        /usr/libexec/geoclue-yahoo

(Note that this output is shortened, I have removed most of the
kernel threads here, since they are not relevant in the context of
this blog story)

In the third column you see the cgroup systemd assigned to each
process. You’ll find that the udev processes are in the
name=systemd:/systemd-1/sysinit.service cgroup, which is
where systemd places all processes started by the
sysinit.service service, which covers early boot.

My personal recommendation is to set the shell alias psc
to the ps command line shown above:

alias psc='ps xawf -eo pid,user,cgroup,args'

With this service information of processes is just four keypresses
away!

A different way to present the same information is the
systemd-cgls tool we ship with systemd. It shows the cgroup
hierarchy in a pretty tree. Its output looks like this:

$ systemd-cgls
+    2 [kthreadd]
[...]
+ 4281 [flush-8:0]
+ user
| \ lennart
|   \ 1
|     +  1495 pam: gdm-password
|     +  1521 gnome-session
|     +  1534 dbus-launch --sh-syntax --exit-with-session
|     +  1535 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
|     +  1603 /usr/libexec/gconfd-2
|     +  1612 /usr/libexec/gnome-settings-daemon
|     +  1615 /ushr/libexec/gvfsd
|     +  1621 metacity
|     +  1626 /usr/libexec//gvfs-fuse-daemon /home/lennart/.gvfs
|     +  1634 /usr/bin/pulseaudio --start --log-target=syslog
|     +  1635 gnome-panel
|     +  1638 nautilus
|     +  1640 /usr/libexec/polkit-gnome-authentication-agent-1
|     +  1641 /usr/bin/seapplet
|     +  1644 gnome-volume-control-applet
|     +  1645 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=24
|     +  1646 /usr/sbin/restorecond -u
|     +  1649 /usr/libexec/pulse/gconf-helper
|     +  1652 /usr/bin/devilspie
|     +  1662 nm-applet --sm-disable
|     +  1664 gnome-power-manager
|     +  1665 /usr/libexec/gdu-notification-daemon
|     +  1668 /usr/libexec/im-settings-daemon
|     +  1670 /usr/libexec/evolution/2.32/evolution-alarm-notify
|     +  1672 /usr/bin/python /usr/share/system-config-printer/applet.py
|     +  1674 /usr/lib64/deja-dup/deja-dup-monitor
|     +  1675 abrt-applet
|     +  1677 bluetooth-applet
|     +  1678 gpk-update-icon
|     +  1701 /usr/libexec/gvfs-gdu-volume-monitor
|     +  1707 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
|     +  1725 /usr/libexec/clock-applet
|     +  1727 /usr/libexec/wnck-applet
|     +  1729 /usr/libexec/notification-area-applet
|     +  1759 gnome-screensaver
|     +  1780 /usr/libexec/gvfsd-trash --spawner :1.9 /org/gtk/gvfs/exec_spaw/0
|     +  1864 /usr/libexec/gvfs-afc-volume-monitor
|     +  1874 /usr/libexec/gconf-im-settings-daemon
|     +  1882 /usr/libexec/gvfs-gphoto2-volume-monitor
|     +  1903 /usr/libexec/gvfsd-burn --spawner :1.9 /org/gtk/gvfs/exec_spaw/1
|     +  1909 gnome-terminal
|     +  1913 gnome-pty-helper
|     +  1914 bash
|     +  1968 ssh-agent
|     +  1994 gpg-agent --daemon --write-env-file
|     +  2221 bash
|     +  2461 bash
|     +  4193 ssh tango
|     + 15113 bash
|     + 18679 /bin/sh /usr/lib64/firefox-3.6/run-mozilla.sh /usr/lib64/firefox-3.6/firefox
|     + 18741 /usr/lib64/firefox-3.6/firefox
|     + 27251 empathy
|     + 27262 /usr/libexec/mission-control-5
|     + 27265 /usr/libexec/telepathy-haze
|     + 27268 /usr/libexec/telepathy-logger
|     + 27270 /usr/libexec/dconf-service
|     + 27280 /usr/libexec/notification-daemon
|     + 27284 /usr/libexec/telepathy-gabble
|     + 27285 /usr/libexec/telepathy-salut
|     + 27297 /usr/libexec/geoclue-yahoo
|     + 28900 /usr/lib64/nspluginwrapper/npviewer.bin --plugin /usr/lib64/mozilla/plugins/libflashplayer.so --connection /org/wrapper/NSPlugins/libflashplayer.so/18741-6
|     + 29219 emacs systemd-for-admins-1.txt
|     + 29231 ssh tango
|     \ 29519 systemd-cgls
\ systemd-1
  + 1 /sbin/init
  + ntpd.service
  | \ 4112 /usr/sbin/ntpd -n -u ntp:ntp -g
  + systemd-logger.service
  | \ 1499 /lib/systemd/systemd-logger
  + accounts-daemon.service
  | \ 1496 /usr/libexec/accounts-daemon
  + rtkit-daemon.service
  | \ 1473 /usr/libexec/rtkit-daemon
  + console-kit-daemon.service
  | \ 1408 /usr/sbin/console-kit-daemon --no-daemon
  + prefdm.service
  | + 1376 /usr/sbin/gdm-binary -nodaemon
  | + 1391 /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 --force-active-vt
  | + 1394 /usr/bin/Xorg :0 -nr -verbose -auth /var/run/gdm/auth-for-gdm-f2KUOh/database -nolisten tcp vt1
  | + 1419 /usr/bin/dbus-launch --exit-with-session
  | \ 1511 /usr/bin/gnome-keyring-daemon --daemonize --login
  + [email protected]
  | + tty6
  | | \ 1346 /sbin/mingetty tty6
  | + tty4
  | | \ 1343 /sbin/mingetty tty4
  | + tty5
  | | \ 1342 /sbin/mingetty tty5
  | + tty3
  | | \ 1339 /sbin/mingetty tty3
  | \ tty2
  |   \ 1332 /sbin/mingetty tty2
  + abrtd.service
  | \ 1317 /usr/sbin/abrtd -d -s
  + crond.service
  | \ 1344 crond
  + sshd.service
  | \ 1362 /usr/sbin/sshd
  + sendmail.service
  | + 4094 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
  | \ 4096 sendmail: accepting connections
  + haldaemon.service
  | + 1249 hald
  | + 1250 hald-runner
  | + 1273 hald-addon-input: Listening on /dev/input/event3 /dev/input/event9 /dev/input/event1 /dev/input/event7 /dev/input/event2 /dev/input/event0 /dev/input/event8
  | + 1275 /usr/libexec/hald-addon-rfkill-killswitch
  | + 1284 /usr/libexec/hald-addon-leds
  | + 1285 /usr/libexec/hald-addon-generic-backlight
  | \ 1287 /usr/libexec/hald-addon-acpi
  + irqbalance.service
  | \ 1210 irqbalance
  + avahi-daemon.service
  | + 1175 avahi-daemon: running [epsilon.local]
  + NetworkManager.service
  | + 1171 /usr/sbin/NetworkManager --no-daemon
  | \ 4028 /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-wlan0.pid -lf /var/lib/dhclient/dhclient-7d32a784-ede9-4cf6-9ee3-60edc0bce5ff-wlan0.lease -cf /var/run/nm-dhclient-wlan0.conf wlan0
  + rsyslog.service
  | \ 1193 /sbin/rsyslogd -c 4
  + mdmonitor.service
  | \ 1207 mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
  + cups.service
  | \ 1195 cupsd -C /etc/cups/cupsd.conf
  + auditd.service
  | + 1131 auditd
  | + 1133 /sbin/audispd
  | \ 1135 /usr/sbin/sedispatch
  + dbus.service
  | +  1096 /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation
  | +  1216 /usr/sbin/modem-manager
  | +  1219 /usr/libexec/polkit-1/polkitd
  | +  1242 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid
  | +  1453 /usr/libexec/upowerd
  | +  1733 /usr/libexec/udisks-daemon
  | +  1747 udisks-daemon: polling /dev/sr0
  | \ 29509 /usr/libexec/packagekitd
  + dev-mqueue.mount
  + dev-hugepages.mount
  \ sysinit.service
    +   455 /sbin/udevd -d
    +  4016 /usr/sbin/bluetoothd --udev
    + 28188 /sbin/udevd -d
    \ 28191 /sbin/udevd -d

(This too is shortened, the same way)

As you can see, this command shows the processes by their cgroup
and hence service, as systemd labels the cgroups after the
services. For example, you can easily see that the auditing service
auditd.service spawns three individual processes,
auditd, audisp and sedispatch.

If you look closely you will notice that a number of processes have
been assigned to the cgroup /user/1. At this point let’s
simply leave it at that systemd not only maintains services in cgroups,
but user session processes as well. In a later installment we’ll discuss in
more detail what this about.

So much for now, come back soon for the next installment!

systemd Status Update

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-update.html

It has been a while since my original
announcement of systemd
. Here’s a little status update, on what
happened since then. For simplicity’s sake I’ll just list here what we
worked on in a bulleted list, with no particular order and without
trying to cover this comprehensively:

  • systemd has been accepted as Feature for Fedora 14, and as it
    looks right now everything worked out nicely and we’ll ship F14 with
    systemd as init system.
  • We added a number of additional unit types: .timer for
    cron-style timer-based activation of services, .swap exposes
    swap files and partitions the same way we handle mount points, and
    .path can be used to activate units dependending on the
    existance/creation of files or fill status of spool directories.
  • We hooked systemd up to SELinux: systemd is now capabale of
    properly labelling directories, sockets and FIFOs it creates according
    to the SELinux policy for the services we maintain.
  • We hooked systemd up to the Linux auditing subsystem: as first
    init system at all systemd now generates auditing records for all
    services it starts/stops, including their failure status.
  • We hooked systemd up to TCP wrappers, for all socket connections
    it accepts.
  • We hooked systemd up to PAM, so that optionally, when systemd runs
    a service as a different user it initializes the usual PAM session
    setup and teardown hooks.
  • We hooked systemd up to D-Bus, so that D-Bus passes activation
    requests to systemd and systemd becomes the central point for all
    kinds of activation, thus greatly extending the control of the
    execution environment of bus activated services, and making them
    accessible through the same utilities as SysV services. Also, this
    enables us to do race-free parallelized start-up for D-Bus services
    and their clients, thus speeding up things even further.
  • systemd is now able to handle various Debian and OpenSUSE-specific
    extensions to the classic SysV init script formats natively, on top of
    the Fedora extensions we already parse.
  • The D-Bus coverage of the systemd interface is now complete,
    allowing both introspection of runtime data and of parsed
    configuration data. It’s fun now to introspect systemd with gdbus
    or d-feet.
  • We added a systemd
    PAM module
    , which assigns the processes of each user session to
    its own cgroup in the systemd cgroup tree. This also enables reliable
    killing of all processes associated with a session when the user logs
    out. This also manages a secure per-user /var/run-style directory
    which is supposed to be used for sockets and similar files that shall
    be cleaned up when the user logs out.
  • There’s a new tool systemd-cgls,
    which plots a pretty process tree based on the systemd cgroup
    hierarchy. It’s really pretty. Try it!
  • We now have our own cgroup hierarchy beneath
    /cgroup/systemd (though is will move to /sys/fs/
    before the F14 release).
  • We have pretty code that automatically spawns a getty on a serial
    port when the kernel console is redirected to a serial TTY.
  • systemctl got beefed up substantially (it can even draw
    dependency graphs now, via dot!), and the SysV compatiblity
    tools were extended to more completely and correctly support what was
    historically provided by SysV. For example, we’ll now warn the user
    when systemd service files have changed but systemd was not asked to
    reload its configuration. Also, you can now use systemd’s native
    client tools to reboot or shut-down an Upstart or sysvinit system, to
    facilitate upgrades.
  • We provide a reference
    implementation
    for the socket activation and other APIs for nicer
    interaction with systemd.
  • We have a pretty complete set of documentation
    now, some
    of it
    even extending to areas not directly related to systemd
    itself.
  • Quite a number of upstream packages now ship with systemd service
    files out-of-the-box now, that work across all distributions that have
    adopted systemd. It is our intention to unify the boot and service
    management between distributions with systemd, and this shows fruits
    already. Furthermore a number of upstream packages now ship our
    patches for socket-based activation.
  • Even more options that control the process execution environment
    or the sockets we create are now supported.
  • Earlier today I began my series of blog stories on systemd
    for administrators
    .
  • We reimplemented almost all boot-up and shutdown scripts of the
    standard Fedora install in much smaller, simpler and faster C
    utilities, or in systemd itself. Most of this will not be enabled in
    F14 however, even though it is shipped with systemd upstream. With
    this enabled the entire Linux system gains a completely new feeling as
    the number of shells we spawn approaches zero, and the PID of the
    first user terminal is way < 500 now, and the early boot-up is
    fully parallelized. We looked at the boot scripts of Fedora, OpenSUSE
    and Debian and distilled from this a list of functionality that makes
    up the early boot process and reimplemented this in C, if possible
    following the bahaviour of one of the existing implementations from
    these three distributions. This turned out to be much less effort than
    anticipated, and we are actually quite excited about this. Look
    forward to the fruits of this work in F15, when we might be able to
    present you a shell-less boot at least for standard desktop/laptop
    systems.
  • We spent some time reinvestigating the current syslog logic, and
    came up with an elegant and simple scheme to provide /dev/log
    compatible logging right from the time systemd is first initialized
    right until the time the kernel halts the machine. Through the wonders
    of socket based activation we first connect the /dev/log
    socket with a minimal bridge to the kernel log buffer (kmsg)
    and then, as soon as the real syslog is started up as part of the
    later bootup phase, we dynamically replace this minimal bridge by the
    real syslog daemon — without losing a single log message. Since one
    of the first things the real syslog daemon does is flushing the kernel
    log buffer into log files, all logged messages will sooner or later be
    stored on disk, regardless whether they have been generated during
    early boot, late boot or system runtime. On top of that if the syslog
    daemon terminates or is shut down during runtime, the bridge becomes
    active again and log output is written to kmsg again. The same applies
    when the system goes down. This provides a simple an robust way how we
    can ensure that no logs will ever be lost again, and logging is
    available from the beginning of boot-up to the end of
    shut-down. Plymouth will most likely adopt a similar scheme for initrd
    logging, thus ensuring that everything ever logged on the system will
    properly end up in the log files, whether it comes from the kernel,
    from the initrd, from early-boot, from runtime or shutdown. And if
    syslogd is not around, dmesg will provide you with access to
    the log messages. While this bridge is part of systemd upstream, we’ll
    most likely enable this bridge in Fedora only starting with F15. Also
    note that embedded systems that have no interest in shipping a full
    syslogd solution can simply use this syslog bridge during the entire
    runtime, and thus making the kernel log buffer the centralized log
    storage, with all the advantages this offers: zero disk IO at runtime,
    access to serial and netconsole logging, and remote debug access to
    the kernel log buffer.
  • We now install autofs units for many “API” kernel virtual file
    systems by default, such as binfmt_misc or
    hugetlbfs. That means that the file system access is readily
    available, client code no longer has to manually load the respective
    kernel modules, as they are autoloaded on first access of the file
    system. This has many advantages: it is not only faster to set up
    during boot, but also simpler for applications, as they can just
    assume the functionality is available. On top of that permission
    problems for the initialization go away, since manual module loading
    requires root privileges.
  • Many smaller fixes and enhancements, all across the board, which
    if mentioned here would make this blog story another blog
    novel. Suffice to say, we did a lot of polishing to ready systemd for
    F14.

All in all, systemd is progressing nicely, and the features we have
been working on in the last months are without exception features not
existing in any other of the init systems available on Linux and our
feature set already was far ahead of what the older init
implementations provide. And we have quite a bit planned for the
future. So, stay tuned!

Also note that I’ll speak about systemd at LinuxKongress
2010
in Nuremberg, Germany. Later this year I’ll also be speaking
at the Linux
Plumbers Conference
in Boston, MA. Make sure to drop by if you
want to learn about systemd or discuss exiciting new ideas or features
with us.

systemd for Administrators, Part 1

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-for-admins-1.html

As many of you know, systemd is the new
Fedora init system, starting with F14, and it is also on its way to being adopted in
a number of other distributions as well (for example, OpenSUSE). For administrators
systemd provides a variety of new features and changes and enhances the
administrative process substantially. This blog story is the first part of a
series of articles I plan to post roughly every week for the next months. In
every post I will try to explain one new feature of systemd. Many of these features
are small and simple, so these stories should be interesting to a broader audience.
However, from time to time we’ll dive a little bit deeper into the great new
features systemd provides you with.

Verifying Bootup

Traditionally, when booting up a Linux system, you see a lot of
little messages passing by on your screen. As we work on speeding up
and parallelizing the boot process these messages are becoming visible
for a shorter and shorter time only and be less and less readable —
if they are shown at all, given we use graphical boot splash
technology like Plymouth these days. Nonetheless the information of
the boot screens was and still is very relevant, because it shows you
for each service that is being started as part of bootup, wether it
managed to start up successfully or failed (with those green or red
[ OK ] or [ FAILED ] indicators). To improve the
situation for machines that boot up fast and parallelized and to make
this information more nicely available during runtime, we added a
feature to systemd that tracks and remembers for each service whether
it started up successfully, whether it exited with a non-zero exit
code, whether it timed out, or whether it terminated abnormally (by
segfaulting or similar), both during start-up and runtime. By simply
typing systemctl in your shell you can query the state of all
services, both systemd native and SysV/LSB services:

[root@lambda] ~# systemctl
UNIT                                          LOAD   ACTIVE       SUB          JOB             DESCRIPTION
dev-hugepages.automount                       loaded active       running                      Huge Pages File System Automount Point
dev-mqueue.automount                          loaded active       running                      POSIX Message Queue File System Automount Point
proc-sys-fs-binfmt_misc.automount             loaded active       waiting                      Arbitrary Executable File Formats File System Automount Point
sys-kernel-debug.automount                    loaded active       waiting                      Debug File System Automount Point
sys-kernel-security.automount                 loaded active       waiting                      Security File System Automount Point
sys-devices-pc...0000:02:00.0-net-eth0.device loaded active       plugged                      82573L Gigabit Ethernet Controller
[...]
sys-devices-virtual-tty-tty9.device           loaded active       plugged                      /sys/devices/virtual/tty/tty9
-.mount                                       loaded active       mounted                      /
boot.mount                                    loaded active       mounted                      /boot
dev-hugepages.mount                           loaded active       mounted                      Huge Pages File System
dev-mqueue.mount                              loaded active       mounted                      POSIX Message Queue File System
home.mount                                    loaded active       mounted                      /home
proc-sys-fs-binfmt_misc.mount                 loaded active       mounted                      Arbitrary Executable File Formats File System
abrtd.service                                 loaded active       running                      ABRT Automated Bug Reporting Tool
accounts-daemon.service                       loaded active       running                      Accounts Service
acpid.service                                 loaded active       running                      ACPI Event Daemon
atd.service                                   loaded active       running                      Execution Queue Daemon
auditd.service                                loaded active       running                      Security Auditing Service
avahi-daemon.service                          loaded active       running                      Avahi mDNS/DNS-SD Stack
bluetooth.service                             loaded active       running                      Bluetooth Manager
console-kit-daemon.service                    loaded active       running                      Console Manager
cpuspeed.service                              loaded active       exited                       LSB: processor frequency scaling support
crond.service                                 loaded active       running                      Command Scheduler
cups.service                                  loaded active       running                      CUPS Printing Service
dbus.service                                  loaded active       running                      D-Bus System Message Bus
[email protected]                            loaded active       running                      Getty on tty2
[email protected]                            loaded active       running                      Getty on tty3
[email protected]                            loaded active       running                      Getty on tty4
[email protected]                            loaded active       running                      Getty on tty5
[email protected]                            loaded active       running                      Getty on tty6
haldaemon.service                             loaded active       running                      Hardware Manager
[email protected]                            loaded active       running                      sda shock protection daemon
irqbalance.service                            loaded active       running                      LSB: start and stop irqbalance daemon
iscsi.service                                 loaded active       exited                       LSB: Starts and stops login and scanning of iSCSI devices.
iscsid.service                                loaded active       exited                       LSB: Starts and stops login iSCSI daemon.
livesys-late.service                          loaded active       exited                       LSB: Late init script for live image.
livesys.service                               loaded active       exited                       LSB: Init script for live image.
lvm2-monitor.service                          loaded active       exited                       LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
mdmonitor.service                             loaded active       running                      LSB: Start and stop the MD software RAID monitor
modem-manager.service                         loaded active       running                      Modem Manager
netfs.service                                 loaded active       exited                       LSB: Mount and unmount network filesystems.
NetworkManager.service                        loaded active       running                      Network Manager
ntpd.service                                  loaded maintenance  maintenance                  Network Time Service
polkitd.service                               loaded active       running                      Policy Manager
prefdm.service                                loaded active       running                      Display Manager
rc-local.service                              loaded active       exited                       /etc/rc.local Compatibility
rpcbind.service                               loaded active       running                      RPC Portmapper Service
rsyslog.service                               loaded active       running                      System Logging Service
rtkit-daemon.service                          loaded active       running                      RealtimeKit Scheduling Policy Service
sendmail.service                              loaded active       running                      LSB: start and stop sendmail
[email protected]:22-172.31.0.4:36368.service  loaded active       running                      SSH Per-Connection Server
sysinit.service                               loaded active       running                      System Initialization
systemd-logger.service                        loaded active       running                      systemd Logging Daemon
udev-post.service                             loaded active       exited                       LSB: Moves the generated persistent udev rules to /etc/udev/rules.d
udisks.service                                loaded active       running                      Disk Manager
upowerd.service                               loaded active       running                      Power Manager
wpa_supplicant.service                        loaded active       running                      Wi-Fi Security Service
avahi-daemon.socket                           loaded active       listening                    Avahi mDNS/DNS-SD Stack Activation Socket
cups.socket                                   loaded active       listening                    CUPS Printing Service Sockets
dbus.socket                                   loaded active       running                      dbus.socket
rpcbind.socket                                loaded active       listening                    RPC Portmapper Socket
sshd.socket                                   loaded active       listening                    sshd.socket
systemd-initctl.socket                        loaded active       listening                    systemd /dev/initctl Compatibility Socket
systemd-logger.socket                         loaded active       running                      systemd Logging Socket
systemd-shutdownd.socket                      loaded active       listening                    systemd Delayed Shutdown Socket
dev-disk-by\x1...x1db22a\x1d870f1adf2732.swap loaded active       active                       /dev/disk/by-uuid/fd626ef7-34a4-4958-b22a-870f1adf2732
basic.target                                  loaded active       active                       Basic System
bluetooth.target                              loaded active       active                       Bluetooth
dbus.target                                   loaded active       active                       D-Bus
getty.target                                  loaded active       active                       Login Prompts
graphical.target                              loaded active       active                       Graphical Interface
local-fs.target                               loaded active       active                       Local File Systems
multi-user.target                             loaded active       active                       Multi-User
network.target                                loaded active       active                       Network
remote-fs.target                              loaded active       active                       Remote File Systems
sockets.target                                loaded active       active                       Sockets
swap.target                                   loaded active       active                       Swap
sysinit.target                                loaded active       active                       System Initialization

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

221 units listed. Pass --all to see inactive units, too.
[root@lambda] ~#

(I have shortened the output above a little, and removed a few lines not relevant for this blog post.)

Look at the ACTIVE column, which shows you the high-level state of
a service (or in fact of any kind of unit systemd maintains, which can
be more than just services, but we’ll have a look on this in a later
blog posting), whether it is active (i.e. running),
inactive (i.e. not running) or in any other state. If you look
closely you’ll see one item in the list that is marked maintenance
and highlighted in red. This informs you about a service that failed
to run or otherwise encountered a problem. In this case this is
ntpd. Now, let’s find out what actually
happened to ntpd, with the systemctl status
command:

[root@lambda] ~# systemctl status ntpd.service
ntpd.service - Network Time Service
	  Loaded: loaded (/etc/systemd/system/ntpd.service)
	  Active: maintenance
	    Main: 953 (code=exited, status=255)
	  CGroup: name=systemd:/systemd-1/ntpd.service
[root@lambda] ~#

This shows us that NTP terminated during runtime (when it ran as
PID 953), and tells us exactly the error condition: the process exited
with an exit status of 255.

In a later systemd version, we plan to hook this up to ABRT, as soon as
this enhancement request is fixed
. Then, if systemctl
status
shows you information about a service that crashed it will
direct you right-away to the appropriate crash dump in ABRT.

Summary: use systemctl and systemctl
status
as modern, more complete replacements for the traditional
boot-up status messages of SysV services. systemctl status
not only captures in more detail the error condition but also shows
runtime errors in addition to start-up errors.

That’s it for this week, make sure to come back next week, for the
next posting about systemd for administrators!

Dear Canonical,

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/sound-theme-canonical.html

#ignore yes

Today I came
across this blog post of your design team
. In context of the recent
criticism you had to endure regarding upstream contributions I am disappointed
that you have not bothered to ping anybody from the upstream freedesktop sound
theme (for example yours truly) about this in advance. No, you went to cook
your own soup. What really disappoints me is that we have asked multiple times
for help and support and contributions for the sound theme, to only very little
success, and I even asked some of the Canonical engineers about this topic and
in particular regarding some clarifications of the licensing of the old Ubuntu
sound theme. I am sorry, but if you had listened, or looked, or asked you would
have been aware that we were looking for somebody to maintain this actively,
upstream — and because we didn’t have the time to maintain this we only
did the absolute minimum work necessary and we only maintain this ourselves
because noone else wanted to.

It should be upstream first, downstream second.

I am sorry if I sound like an always complaining prick to you. But believe
me, I am not saying this because I wouldn’t like you or anything like that.
I am just saying this because I believe you could do things
oh so much better.

Please fix this. We want your contributions. Upstream.

Linux Plumbers Conference 2010 CFP Ending Soon!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/plumbersconf-2010.html

#nocomments y

The Call
for Papers
for the Linux
Plumbers Conference (LPC)
in November in Cambridge, Massachusetts is ending
soon, on July 19th 2010 (That’s the upcoming monday!). It’s a conference
about the core infrastructure of Linux systems: the part of the system where
userspace and the kernel interface. It’s the only conference where the focus is
specifically on getting together the kernel people who work on the userspace
interfaces and the userspace people who have to deal with kernel interfaces.
It’s supposed to be a place where all the people doing infrastructure work sit
down and talk, so that both parties understand better what the requirements and
needs of the other are, and where we can work towards fixing the major problems
we currently have with our lower-level infrastructure and APIs.

The two previous LPCs were hugely successful (as reported on LWN on various
occasions), and this time we hope to repeat that.

Like the previous years, I will be running the Audio conference track of
LPC, this time together with Mark Brown. Audio infrastructure on Linux has been
steadily improving the last years all over the place, but there’s still a lot
to do. Join us at the LPC to discuss the next steps and help improving Linux
audio further! If you are doing audio infrastructure work on Linux, make
sure to attend and submit a paper!

Sign up soon!
Send
in your paper quickly!
Only three days left to the end of the
CFP!

Plumbers Logo

(I am also planning to do a presentation there about systemd, together
with Kay. Make sure to attend if you are interested in that topic.)

See you in Boston!

Addendum on the Brokenness of File Locking

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/locking2.html

I forgot to mention another central problem in my blog story about file locking
on Linux
:

Different machines have access to different features of the same file
system. Here’s an example: let’s say you have two machines in your home LAN.
You want them to share their $HOME directory, so that you (or your family) can
use either machine and have access to all your (or their) data. So you export
/home on one machine via NFS and mount it from the other machine.

So far so good. But what happens to file locking now? Programs on the first
machine see a fully-featured ext3 or ext4 file system, where all kinds of
locking works (even though the API might suck as mentioned in the earlier blog
story). But what about the other machine? If you set up lockd properly
then POSIX locking will work on both. If you didn’t one machine can use POSIX
locking properly, the other cannot. And it gets even worse: as mentioned recent
NFS implementations on Linux transparently convert client-side BSD locking into
POSIX locking on the server side. Now, if the same application uses BSD locking on both
the client and the server side from two instances they will end up with two
orthogonal locks and although both sides think they have properly acquired a
lock (and they actually did) they will overwrite each other’s data, because
those two locks are independent. (And one wonders why the NFS developers
implemented this brokenness nonetheless…).

This basically means that locking cannot be used unless it is verified that
everyone accessing a file system can make use of the same file system feature
set. If you use file locking on a file system you should do so only if you are
sufficiently sure that nobody using a broken or weird NFS implementation might
want to access and lock those files as well. And practically that is
impossible. Even if fpathconf() was improved so that it could inform
the caller whether it can successfully apply a file lock to a file, this would
still not give any hint if the same is true for everybody else accessing the
file. But that is essential when speaking of advisory (i.e. cooperative) file
locking.

And no, this isn’t easy to fix. So again, the recommendation: forget about
file locking on Linux, it’s nothing more than a useless toy.

Also read Jeremy
Allison’s
(Samba) take on POSIX file locking. It’s an interesting read.

On the Brokenness of File Locking

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/locking.html

It’s amazing how far Linux has come without providing for proper file
locking that works and is usable from userspace. A little overview why file
locking is still in a very sad state:

To begin with, there’s a plethora of APIs, and all of them are awful:

  • POSIX File locking as available with fcntl(F_SET_LK): the POSIX
    locking API is the most portable one and in theory works across NFS. It can do
    byte-range locking. So much on the good side. On the bad side there’s a lot
    more however: locks are bound to processes, not file descriptors. That means
    that this logic cannot be used in threaded environments unless combined with a
    process-local mutex. This is hard to get right, especially in libraries that do
    not know the environment they are run in, i.e. whether they are used in
    threaded environments or not. The worst part however is that POSIX locks are
    automatically released if a process calls close() on any (!) of
    its open file descriptors for that file. That means that when one part of a
    program locks a file and another by coincidence accesses it too for a short
    time, the first part’s lock will be broken and it won’t be notified about that.
    Modern software tends to load big frameworks (such as Gtk+ or Qt) into memory
    as well as arbitrary modules via mechanisms such as NSS, PAM, gvfs,
    GTK_MODULES, Apache modules, GStreamer modules where one module seldom can
    control what another module in the same process does or accesses. The effect of
    this is that POSIX locks are unusable in any non-trivial program where it
    cannot be ensured that a file that is locked is never accessed by
    any other part of the process at the same time. Example: a user managing
    daemon wants to write /etc/passwd and locks the file for that. At
    the same time in another thread (or from a stack frame further down)
    something calls getpwuid() which internally accesses
    /etc/passwd and causes the lock to be released, the first thread
    (or stack frame) not knowing that. Furthermore should two threads use the
    locking fcntl()s on the same file they will interfere with each other’s locks
    and reset the locking ranges and flags of each other. On top of that locking
    cannot be used on any file that is publicly accessible (i.e. has the R bit set
    for groups/others, i.e. more access bits on than 0600), because that would
    otherwise effectively give arbitrary users a way to indefinitely block
    execution of any process (regardless of the UID it is running under) that wants
    to access and lock the file. This is generally not an acceptable security risk.
    Finally, while POSIX file locks are supposedly NFS-safe they not always really
    are as there are still many NFS implementations around where locking is not properly
    implemented, and NFS tends to be used in heterogenous networks. The biggest
    problem about this is that there is no way to properly detect whether file
    locking works on a specific NFS mount (or any mount) or not.
  • The other API for POSIX file locks: lockf() is another API for the
    same mechanism and suffers by the same problems. One wonders why there are two
    APIs for the same messed up interface.
  • BSD locking based on flock(). The semantics of this kind of
    locking are much nicer than for POSIX locking: locks are bound to file
    descriptors, not processes. This kind of locking can hence be used safely
    between threads and can even be inherited across fork() and
    exec(). Locks are only automatically broken on the close()
    call for the one file descriptor they were created with (or the last duplicate
    of it). On the other hand this kind of locking does not offer byte-range
    locking and suffers by the same security problems as POSIX locking, and works
    on even less cases on NFS than POSIX locking (i.e. on BSD and Linux < 2.6.12
    they were NOPs returning success). And since BSD locking is not as portable as
    POSIX locking this is sometimes an unsafe choice. Some OSes even find it funny
    to make flock() and fcntl(F_SET_LK) control the same locks.
    Linux treats them independently — except for the cases where it doesn’t: on
    Linux NFS they are transparently converted to POSIX locks, too now. What a chaos!
  • Mandatory locking is available too. It’s based on the POSIX locking API but
    not portable in itself. It’s dangerous business and should generally be avoided
    in cleanly written software.
  • Traditional lock file based file locking. This is how things where done
    traditionally, based around known atomicity guarantees of certain basic file
    system operations. It’s a cumbersome thing, and requires polling of the file
    system to get notifications when a lock is released. Also, On Linux NFS < 2.6.5
    it doesn’t work properly, since O_EXCL isn’t atomic there. And of course the
    client cannot really know what the server is running, so again this brokeness
    is not detectable.

The Disappointing Summary

File locking on Linux is just broken. The broken semantics of POSIX locking
show that the designers of this API apparently never have tried to actually use
it in real software. It smells a lot like an interface that kernel people
thought makes sense but in reality doesn’t when you try to use it from
userspace.

Here’s a list of places where you shouldn’t use file locking due to the
problems shown above: If you want to lock a file in $HOME, forget about it as
$HOME might be NFS and locks generally are not reliable there. The same applies
to every other file system that might be shared across the network. If the file
you want to lock is accessible to more than your own user (i.e. an access mode
> 0700), forget about locking, it would allow others to block your
application indefinitely. If your program is non-trivial or threaded or uses a
framework such as Gtk+ or Qt or any of the module-based APIs such as NSS, PAM,
… forget about about POSIX locking. If you care about portability, don’t use
file locking.

Or to turn this around, the only case where it is kind of safe to use file locking
is in trivial applications where portability is not key and by using BSD
locking on a file system where you can rely that it is local and on files
inaccessible to others. Of course, that doesn’t leave much, except for private
files in /tmp for trivial user applications.

Or in one sentence: in its current state Linux file locking is unusable.

And that is a shame.

Update: Check out the follow-up story on this topic.

On IDs

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/ids.html

When programming software that cooperates with software running on behalf of
other users, other sessions or other computers it is often necessary to work with
unique identifiers. These can be bound to various hardware and software objects
as well as lifetimes. Often, when people look for such an ID to use they pick
the wrong one because semantics and lifetime or the IDs are not clear. Here’s a
little incomprehensive list of IDs accessible on Linux and how you should or
should not use them.

Hardware IDs

  1. /sys/class/dmi/id/product_uuid: The main board product UUID, as
    set by the board manufacturer and encoded in the BIOS DMI information. It may
    be used to identify a mainboard and only the mainboard. It changes when the
    user replaces the main board. Also, often enough BIOS manufacturers write bogus
    serials into it. In addition, it is x86-specific. Access for unprivileged users
    is forbidden. Hence it is of little general use.
  2. CPUID/EAX=3 CPU serial number: A CPU UUID, as set by the
    CPU manufacturer and encoded on the CPU chip. It may be used to identify a CPU
    and only a CPU. It changes when the user replaces the CPU. Also, most modern
    CPUs don’t implement this feature anymore, and older computers tend to disable
    this option by default, controllable via a BIOS Setup option. In addition, it
    is x86-specific. Hence this too is of little general use.
  3. /sys/class/net/*/address: One or more network MAC addresses, as
    set by the network adapter manufacturer and encoded on some network card
    EEPROM. It changes when the user replaces the network card. Since network cards
    are optional and there may be more than one the availability if this ID is not
    guaranteed and you might have more than one to choose from. On virtual machines
    the MAC addresses tend to be random. This too is hence of little general use.
  4. /sys/bus/usb/devices/*/serial: Serial numbers of various USB
    devices, as encoded in the USB device EEPROM. Most devices don’t have a serial
    number set, and if they have it is often bogus. If the user replaces his USB
    hardware or plugs it into another machine these IDs may change or appear in
    other machines. This hence too is of little use.

There are various other hardware IDs available, many of which you may
discover via the ID_SERIAL udev property of various devices, such hard disks
and similar. They all have in common that they are bound to specific
(replacable) hardware, not universally available, often filled with bogus data
and random in virtualized environments. Or in other words: don’t use them, don’t
rely on them for identification, unless you really know what you are doing and
in general they do not guarantee what you might hope they guarantee.

Software IDs

  1. /proc/sys/kernel/random/boot_id: A random ID that is regenerated
    on each boot. As such it can be used to identify the local machine’s current
    boot. It’s universally available on any recent Linux kernel. It’s a good and
    safe choice if you need to identify a specific boot on a specific booted
    kernel.
  2. gethostname(), /proc/sys/kernel/hostname: A non-random ID
    configured by the administrator to identify a machine in the network. Often
    this is not set at all or is set to some default value such as
    localhost and not even unique in the local network. In addition it
    might change during runtime, for example because it changes based on updated
    DHCP information. As such it is almost entirely useless for anything but
    presentation to the user. It has very weak semantics and relies on correct
    configuration by the administrator. Don’t use this to identify machines in a
    distributed environment. It won’t work unless centrally administered, which
    makes it useless in a globalized, mobile world. It has no place in
    automatically generated filenames that shall be bound to specific hosts. Just
    don’t use it, please. It’s really not what many people think it is.
    gethostname() is standardized in POSIX and hence portable to other
    Unixes.
  3. IP Addresses returned by SIOCGIFCONF or the respective Netlink APIs: These
    tend to be dynamically assigned and often enough only valid on local networks
    or even only the local links (i.e. 192.168.x.x style addresses, or even
    169.254.x.x/IPv4LL). Unfortunately they hence have little use outside of
    networking.
  4. gethostid(): Returns a supposedly unique 32-bit identifier for the
    current machine. The semantics of this is not clear. On most machines this
    simply returns a value based on a local IPv4 address. On others it is
    administrator controlled via the /etc/hostid file. Since the semantics
    of this ID are not clear and most often is just a value based on the IP address it is
    almost always the wrong choice to use. On top of that 32bit are not
    particularly a lot. On the other hand this is standardized in POSIX and hence
    portable to other Unixes. It’s probably best to ignore this value and if people
    don’t want to ignore it they should probably symlink /etc/hostid to
    /var/lib/dbus/machine-id or something similar.
  5. /var/lib/dbus/machine-id: An ID identifying a specific Linux/Unix
    installation. It does not change if hardware is replaced. It is not unreliable
    in virtualized environments. This value has clear semantics and is considered
    part of the D-Bus API. It is supposedly globally unique and portable to all
    systems that have D-Bus. On Linux, it is universally available, given that
    almost all non-embedded and even a fair share of the embedded machines ship
    D-Bus now. This is the recommended way to identify a machine, possibly with a
    fallback to the host name to cover systems that still lack D-Bus. If your
    application links against libdbus, you may access this ID with
    dbus_get_local_machine_id(), if not you can read it directly from the file system.
  6. /proc/self/sessionid: An ID identifying a specific Linux login
    session. This ID is maintained by the kernel and part of the auditing logic. It
    is uniquely assigned to each login session during a specific system boot,
    shared by each process of a session, even across su/sudo and cannot be changed
    by userspace. Unfortunately some distributions have so far failed to set things
    up properly for this to work (Hey, you, Ubuntu!), and this ID is always
    (uint32_t) -1 for them. But there’s hope they get this fixed
    eventually. Nonetheless it is a good choice for a unique session identifier on
    the local machine and for the current boot. To make this ID globally unique it
    is best combined with /proc/sys/kernel/random/boot_id.
  7. getuid(): An ID identifying a specific Unix/Linux user. This ID is
    usually automatically assigned when a user is created. It is not unique across
    machines and may be reassigned to a different user if the original user was
    deleted. As such it should be used only locally and with the limited validity
    in time in mind. To make this ID globally unique it is not sufficient to
    combine it with /var/lib/dbus/machine-id, because the same ID might be
    used for a different user that is created later with the same UID. Nonetheless
    this combination is often good enough. It is available on all POSIX systems.
  8. ID_FS_UUID: an ID that identifies a specific file system in the
    udev tree. It is not always clear how these serials are generated but this
    tends to be available on almost all modern disk file systems. It is not
    available for NFS mounts or virtual file systems. Nonetheless this is often a
    good way to identify a file system, and in the case of the root directory even
    an installation. However due to the weakly defined generation semantics the
    D-Bus machine ID is generally preferrable.

Generating IDs

Linux offers a kernel interface to generate UUIDs on demand, by reading from
/proc/sys/kernel/random/uuid. This is a very simple interface to
generate UUIDs. That said, the logic behind UUIDs is unnecessarily complex and
often it is a better choice to simply read 16 bytes or so from
/dev/urandom.

Summary

And the gist of it all: Use /var/lib/dbus/machine-id! Use
/proc/self/sessionid! Use /proc/sys/kernel/random/boot_id!
Use getuid()! Use /dev/urandom!
And forget about the
rest, in particular the host name, or the hardware IDs such as DMI. And keep in
mind that you may combine the aforementioned IDs in various ways to get
different semantics and validity constraints.