systemd Status Update

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemd-update-3.html

It
has been way too long since my last status update on
systemd. Here’s another short, incomprehensive status update on
what we worked on for systemd since
then.

We have been working hard to turn systemd into the most viable set
of components to build operating systems, appliances and devices from,
and make it the best choice for servers, for desktops and for embedded
environments alike. I think we have a really convincing set of
features now, but we are actively working on making it even
better.

Here’s a list of some more and some less interesting features, in
no particular order:

We added an automatic pager to systemctl (and related tools), similar
to how git has it.
systemctl learnt a new switch --failed, to show only
failed services.
You may now start services immediately, overrding all dependency
logic by passing --ignore-dependencies to
systemctl. This is mostly a debugging tool and nothing people
should use in real life.
Sending SIGKILL as final part of the implicit shutdown
logic of services is now optional and may be configured with the
SendSIGKILL= option individually for each service.
We split off the Vala/Gtk tools into its own project systemd-ui.
systemd-tmpfiles learnt file globbing and creating FIFO
special files as well as character and block device nodes, and
symlinks. It also is capable of relabelling certain directories at
boot now (in the SELinux sense).
Immediately before shuttding dow we will now invoke all binaries
found in /lib/systemd/system-shutdown/, which is useful for
debugging late shutdown.
You may now globally control where STDOUT/STDERR of services goes
(unless individual service configuration overrides it).
There’s a new ConditionVirtualization= option, that makes
systemd skip a specific service if a certain virtualization technology
is found or not found. Similar, we now have a new option to detect
whether a certain security technology (such as SELinux) is available,
called ConditionSecurity=. There’s also
ConditionCapability= to check whether a certain process
capability is in the capability bounding set of the system. There’s
also a new ConditionFileIsExecutable=,
ConditionPathIsMountPoint=,
ConditionPathIsReadWrite=,
ConditionPathIsSymbolicLink=.
The file system condition directives now support globbing.
Service conditions may now be “triggering” and “mandatory”, meaning that
they can be a necessary requirement to hold for a service to start, or
simply one trigger among many.
At boot time we now print warnings if: /usr
is on a split-off partition but not already mounted by an initrd;
if /etc/mtab is not a symlink to /proc/mounts; CONFIG_CGROUPS
is not enabled in the kernel. We’ll also expose this as
tainted flag on the bus.
You may now boot the same OS image on a bare metal machine and in
Linux namespace containers and will get a clean boot in both
cases. This is more complicated than it sounds since device management
with udev or write access to /sys, /proc/sys or
things like /dev/kmsg is not available in a container. This
makes systemd a first-class choice for managing thin container
setups. This is all tested with systemd’s own systemd-nspawn
tool but should work fine in LXC setups, too. Basically this means
that you do not have to adjust your OS manually to make it work in a
container environment, but will just work out of the box. It also
makes it easier to convert real systems into containers.
We now automatically spawn gettys on HVC ttys when booting in VMs.
We introduced /etc/machine-id as a generalization of
D-Bus machine ID logic. See this
blog story for more information. On stateless/read-only systems
the machine ID is initialized randomly at boot. In virtualized
environments it may be passed in from the machine manager (with qemu’s
-uuid switch, or via the container
interface).
All of the systemd-specific /etc/fstab mount options are
now in the x-systemd-xyz format.
To make it easy to find non-converted services we will now
implicitly prefix all LSB and SysV init script descriptions with the
strings “LSB:” resp. “SYSV:“.
We introduced /run and made it a hard dependency of
systemd. This directory is now widely accepted and implemented on all
relevant Linux distributions.
systemctl can now execute all its operations remotely too (-H switch).
We now ship systemd-nspawn,
a really powerful tool that can be used to start containers for
debugging, building and testing, much like chroot(1). It is useful to
just get a shell inside a build tree, but is good enough to boot up a
full system in it, too.
If we query the user for a hard disk password at boot he may hit
TAB to hide the asterisks we normally show for each key that is
entered, for extra paranoia.
We don’t enable udev-settle.service anymore, which is
only required for certain legacy software that still hasn’t been
updated to follow devices coming and going cleanly.
We now include a tool that can plot boot speed graphs, similar to
bootchartd, called systemd-analyze.
At boot, we now initialize the kernel’s binfmt_misc logic with the data from /etc/binfmt.d.
systemctl now recognizes if it is run in a chroot()
environment and will work accordingly (i.e. apply changes to the tree
it is run in, instead of talking to the actual PID 1 for this). It also has a new --root= switch to work on an OS tree from outside of it.
There’s a new unit dependency type OnFailureIsolate= that
allows entering a different target whenever a certain unit fails. For
example, this is interesting to enter emergency mode if file system
checks of crucial file systems failed.
Socket units may now listen on Netlink sockets, special files
from /proc and POSIX message queues, too.
There’s a new IgnoreOnIsolate= flag which may be used to
ensure certain units are left untouched by isolation requests. There’s
a new IgnoreOnSnapshot= flag which may be used to exclude
certain units from snapshot units when they are created.
There’s now small mechanism services for
changing the local hostname and other host meta data, changing
the system locale and console settings and the system
clock.
We now limit the capability bounding set for a number of our
internal services by default.
Plymouth may now be disabled globally with
plymouth.enable=0 on the kernel command line.
We now disallocate VTs when a getty finished running (and
optionally other tools run on VTs). This adds extra security since it
clears up the scrollback buffer so that subsequent users cannot get
access to a user’s session output.
In socket units there are now options to control the
IP_TRANSPARENT, SO_BROADCAST, SO_PASSCRED,
SO_PASSSEC socket options.
The receive and send buffers of socket units may now be set larger
than the default system settings if needed by using
SO_{RCV,SND}BUFFORCE.
We now set the hardware timezone as one of the first things in PID
1, in order to avoid time jumps during normal userspace operation, and
to guarantee sensible times on all generated logs. We also no longer
save the system clock to the RTC on shutdown, assuming that this is
done by the clock control tool when the user modifies the time, or
automatically by the kernel if NTP is enabled.
The SELinux directory got moved from /selinux to
/sys/fs/selinux.
We added a small service systemd-logind that keeps tracks
of logged in users and their sessions. It creates control groups for
them, implements the XDG_RUNTIME_DIR
specification for them, maintains seats and device node ACLs and
implements shutdown/idle inhibiting for clients. It auto-spawns gettys
on all local VTs when the user switches to them (instead of starting
six of them unconditionally), thus reducing the resource foot print by
default. It has a D-Bus interface as well as a
simple synchronous library interface. This mechanism obsoletes
ConsoleKit which is now deprecated and should no longer be used.
There’s now full, automatic multi-seat support, and this is
enabled in GNOME 3.4. Just by pluging in new seat hardware you get a
new login screen on your seat’s screen.
There is now an option ControlGroupModify= to allow
services to change the properties of their control groups dynamically,
and one to make control groups persistent in the tree
(ControlGroupPersistent=) so that they can be created and
maintained by external tools.
We now jump back into the initrd in shutdown, so that it can
detach the root file system and the storage devices backing it. This
allows (for the first time!) to reliably undo complex storage setups
on shutdown and leave them in a clean state.
systemctl now supports presets, a way for distributions and
administrators to define their own policies on whether services should
be enabled or disabled by default on package installation.
systemctl now has high-level verbs for masking/unmasking
units. There’s also a new command (systemctl list-unit-files)
for determining the list of all installed unit file files and whether
they are enabled or not.
We now apply sysctl variables to each new network device, as it
appears. This makes /etc/sysctl.d compatible with hot-plug
network devices.
There’s limited profiling for SELinux start-up perfomance built
into PID 1.
There’s a new switch PrivateNetwork=
to turn of any network access for a specific service.
Service units may now include configuration for control group
parameters. A few (such as MemoryLimit=) are exposed with
high-level options, and all others are available via the generic
ControlGroupAttribute= setting.
There’s now the option to mount certain cgroup controllers
jointly at boot. We do this now for cpu and
cpuacct by default.
We added the
journal and turned it on by default.
All service output is now written to the Journal by default,
regardless whether it is sent via syslog or simply written to
stdout/stderr. Both message streams end up in the same location and
are interleaved the way they should. All log messages even from the
kernel and from early boot end up in the journal. Now, no service
output gets unnoticed and is saved and indexed at the same
location.
systemctl status will now show the last 10 log lines for
each service, directly from the journal.
We now show the progress of fsck at boot on the console,
again. We also show the much loved colorful [ OK ] status
messages at boot again, as known from most SysV implementations.
We merged udev into systemd.
We implemented and documented interfaces to container
managers and initrds
for passing execution data to systemd. We also implemented and
documented an
interface for storage daemons that are required to back the root file
system.
There are two new options in service files to propagate reload requests between several units.
systemd-cgls won’t show kernel threads by default anymore, or show empty control groups.
We added a new tool systemd-cgtop that shows resource
usage of whole services in a top(1) like fasion.
systemd may now supervise services in watchdog style. If enabled
for a service the daemon daemon has to ping PID 1 in regular intervals
or is otherwise considered failed (which might then result in
restarting it, or even rebooting the machine, as configured). Also,
PID 1 is capable of pinging a hardware watchdog. Putting this
together, the hardware watchdogs PID 1 and PID 1 then watchdogs
specific services. This is highly useful for high-availability servers
as well as embedded machines. Since watchdog hardware is noawadays
built into all modern chipsets (including desktop chipsets), this
should hopefully help to make this a more widely used
functionality.
We added support for a new kernel command line option
systemd.setenv= to set an environment variable
system-wide.
By default services which are started by systemd will have SIGPIPE
set to ignored. The Unix SIGPIPE logic is used to reliably implement
shell pipelines and when left enabled in services is usually just a
source of bugs and problems.
You may now configure the rate limiting that is applied to
restarts of specific services. Previously the rate limiting parameters
were hard-coded (similar to SysV).
There’s now support for loading the IMA integrity policy into the
kernel early in PID 1, similar to how we already did it with the
SELinux policy.
There’s now an official API to schedule and query scheduled shutdowns.
We changed the license from GPL2+ to LGPL2.1+.
We made systemd-detect-virt
an official tool in the tool set. Since we already had code to detect
certain VM and container environments we now added an official tool
for administrators to make use of in shell scripts and suchlike.
We documented numerous
interfaces systemd introduced.

Much of the stuff above is already available in Fedora 15 and 16,
or will be made available in the upcoming Fedora 17.

And that’s it for now. There’s a lot of other stuff in the git commits, but
most of it is smaller and I will it thus spare you.

I’d like to thank everybody who contributed to systemd over the past years.

Thanks for your interest!

Noise

systemd Status Update

The collective thoughts of the interwebz