Tag Archives: PCI

systemd For Administrators, Part XXI

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/systemd-for-administrators-part-xxi.html

Container Integration

Since a while containers have been one of the hot topics on
Linux. Container managers such as libvirt-lxc, LXC or Docker are
widely known and used these days. In this blog story I want to shed
some light on systemd‘s integration points with container managers, to
allow seamless management of services across container boundaries.

We’ll focus on OS containers here, i.e. the case where an init system
runs inside the container, and the container hence in most ways
appears like an independent system of its own. Much of what I describe
here is available on pretty much any container manager that implements
the logic described
here
,
including libvirt-lxc. However, to make things easy we’ll focus on
systemd-nspawn,
the mini-container manager that is shipped with systemd
itself. systemd-nspawn uses the same kernel interfaces as the other
container managers, however is less flexible as it is designed to be a
container manager that is as simple to use as possible and “just
works”, rather than trying to be a generic tool you can configure in
every low-level detail. We use systemd-nspawn extensively when
developing systemd.

Anyway, so let’s get started with our run-through. Let’s start by
creating a Fedora container tree in a subdirectory:

# yum -y --releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

This downloads a minimal Fedora system and installs it in in
/srv/mycontainer. This command line is Fedora-specific, but most
distributions provide similar functionality in one way or another. The
examples section in the systemd-nspawn(1) man
page

contains a list of the various command lines for other distribution.

We now have the new container installed, let’s set an initial root password:

# systemd-nspawn -D /srv/mycontainer
Spawning container mycontainer on /srv/mycontainer
Press ^] three times within 1s to kill container.
-bash-4.2# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
-bash-4.2# ^D
Container mycontainer exited successfully.
#

We use systemd-nspawn here to get a shell in the container, and then
use passwd to set the root password. After that the initial setup is done,
hence let’s boot it up and log in as root with our new password:

$ systemd-nspawn -D /srv/mycontainer -b
Spawning container mycontainer on /srv/mycontainer.
Press ^] three times within 1s to kill container.
systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'systemd-nspawn'.

Welcome to Fedora 20 (Heisenbug)!

[  OK  ] Reached target Remote File Systems.
[  OK  ] Created slice Root Slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Slices.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[  OK  ] Reached target Paths.
         Mounting Debug File System...
         Mounting Configuration File System...
         Mounting FUSE Control File System...
         Starting Create static device nodes in /dev...
         Mounting POSIX Message Queue File System...
         Mounting Huge Pages File System...
[  OK  ] Reached target Encrypted Volumes.
[  OK  ] Reached target Swap.
         Mounting Temporary Directory...
         Starting Load/Save Random Seed...
[  OK  ] Mounted Configuration File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Temporary Directory.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Create static device nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Trigger Flushing of Journal to Persistent Storage...
         Starting Recreate Volatile Files and Directories...
[  OK  ] Started Recreate Volatile Files and Directories.
         Starting Update UTMP about System Reboot/Shutdown...
[  OK  ] Started Trigger Flushing of Journal to Persistent Storage.
[  OK  ] Started Update UTMP about System Reboot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Login Service...
         Starting Permit User Sessions...
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
         Starting Cleanup of Temporary Directories...
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started Permit User Sessions.
         Starting Console Getty...
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Login Service.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.

Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (console)

mycontainer login: root
Password:
-bash-4.2#

Now we have everything ready to play around with the container
integration of systemd. Let’s have a look at the first tool,
machinectl. When run without parameters it shows a list of all
locally running containers:

$ machinectl
MACHINE                          CONTAINER SERVICE
mycontainer                      container nspawn

1 machines listed.

The “status” subcommand shows details about the container:

$ machinectl status mycontainer
mycontainer:
       Since: Mi 2014-11-12 16:47:19 CET; 51s ago
      Leader: 5374 (systemd)
     Service: nspawn; class container
        Root: /srv/mycontainer
     Address: 192.168.178.38
              10.36.6.162
              fd00::523f:56ff:fe00:4994
              fe80::523f:56ff:fe00:4994
          OS: Fedora 20 (Heisenbug)
        Unit: machine-mycontainer.scope
              ├─5374 /usr/lib/systemd/systemd
              └─system.slice
                ├─dbus.service
                │ └─5414 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-act...
                ├─systemd-journald.service
                │ └─5383 /usr/lib/systemd/systemd-journald
                ├─systemd-logind.service
                │ └─5411 /usr/lib/systemd/systemd-logind
                └─console-getty.service
                  └─5416 /sbin/agetty --noclear -s console 115200 38400 9600

With this we see some interesting information about the container,
including its control group tree (with processes), IP addresses and
root directory.

The “login” subcommand gets us a new login shell in the container:

# machinectl login mycontainer
Connected to container mycontainer. Press ^] three times within 1s to exit session.

Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (pts/0)

mycontainer login:

The “reboot” subcommand reboots the container:

# machinectl reboot mycontainer

The “poweroff” subcommand powers the container off:

# machinectl poweroff mycontainer

So much about the machinectl tool. The tool knows a couple of more
commands, please check the man
page

for details. Note again that even though we use systemd-nspawn as
container manager here the concepts apply to any container manager
that implements the logic described
here
,
including libvirt-lxc for example.

machinectl is not the only tool that is useful in conjunction with
containers. Many of systemd’s own tools have been updated to
explicitly support containers too! Let’s try this (after starting the
container up again first, repeating the systemd-nspawn command from
above.):

# hostnamectl -M mycontainer set-hostname "wuff"

This uses
hostnamectl(1)
on the local container and sets its hostname.

Similar, many other tools have been updated for connecting to local
containers. Here’s
systemctl(1)‘s -M switch
in action:

# systemctl -M mycontainer
UNIT                                 LOAD   ACTIVE SUB       DESCRIPTION
-.mount                              loaded active mounted   /
dev-hugepages.mount                  loaded active mounted   Huge Pages File System
dev-mqueue.mount                     loaded active mounted   POSIX Message Queue File System
proc-sys-kernel-random-boot_id.mount loaded active mounted   /proc/sys/kernel/random/boot_id
[...]
time-sync.target                     loaded active active    System Time Synchronized
timers.target                        loaded active active    Timers
systemd-tmpfiles-clean.timer         loaded active waiting   Daily Cleanup of Temporary Directories

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

49 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

As expected, this shows the list of active units on the specified
container, not the host. (Output is shortened here, the blog story is
already getting too long).

Let’s use this to restart a service within our container:

# systemctl -M mycontainer restart systemd-resolved.service

systemctl has more container support though than just the -M
switch. With the -r switch it shows the units running on the host,
plus all units of all local, running containers:

# systemctl -r
UNIT                                        LOAD   ACTIVE SUB       DESCRIPTION
boot.automount                              loaded active waiting   EFI System Partition Automount
proc-sys-fs-binfmt_misc.automount           loaded active waiting   Arbitrary Executable File Formats File Syst
sys-devices-pci0000:00-0000:00:02.0-drm-card0-card0x2dLVDSx2d1-intel_backlight.device loaded active plugged   /sys/devices/pci0000:00/0000:00:02.0/drm/ca
[...]
timers.target                                                                                       loaded active active    Timers
mandb.timer                                                                                         loaded active waiting   Daily man-db cache update
systemd-tmpfiles-clean.timer                                                                        loaded active waiting   Daily Cleanup of Temporary Directories
mycontainer:-.mount                                                                                 loaded active mounted   /
mycontainer:dev-hugepages.mount                                                                     loaded active mounted   Huge Pages File System
mycontainer:dev-mqueue.mount                                                                        loaded active mounted   POSIX Message Queue File System
[...]
mycontainer:time-sync.target                                                                        loaded active active    System Time Synchronized
mycontainer:timers.target                                                                           loaded active active    Timers
mycontainer:systemd-tmpfiles-clean.timer                                                            loaded active waiting   Daily Cleanup of Temporary Directories

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

191 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

We can see here first the units of the host, then followed by the
units of the one container we have currently running. The units of the
containers are prefixed with the container name, and a colon
(“:”). (The output is shortened again for brevity’s sake.)

The list-machines subcommand of systemctl shows a list of all
running containers, inquiring the system managers within the containers
about system state and health. More specifically it shows if
containers are properly booted up, or if there are any failed
services:

# systemctl list-machines
NAME         STATE   FAILED JOBS
delta (host) running      0    0
mycontainer  running      0    0
miau         degraded     1    0
waldi        running      0    0

4 machines listed.

To make things more interesting we have started two more containers in
parallel. One of them has a failed service, which results in the
machine state to be degraded.

Let’s have a look at
journalctl(1)‘s
container support. It too supports -M to show the logs of a specific
container:

# journalctl -M mycontainer -n 8
Nov 12 16:51:13 wuff systemd[1]: Starting Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Reached target Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Starting Update UTMP about System Runlevel Changes...
Nov 12 16:51:13 wuff systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
Nov 12 16:51:13 wuff systemd[1]: Started Update UTMP about System Runlevel Changes.
Nov 12 16:51:13 wuff systemd[1]: Startup finished in 399ms.
Nov 12 16:51:13 wuff sshd[35]: Server listening on 0.0.0.0 port 24.
Nov 12 16:51:13 wuff sshd[35]: Server listening on :: port 24.

However, it also supports -m to show the combined log stream of the
host and all local containers:

# journalctl -m -e

(Let’s skip the output here completely, I figure you can extrapolate
how this looks.)

But it’s not only systemd’s own tools that understand container
support these days, procps sports support for it, too:

# ps -eo pid,machine,args
 PID MACHINE                         COMMAND
   1 -                               /usr/lib/systemd/systemd --switched-root --system --deserialize 20
[...]
2915 -                               emacs contents/projects/containers.md
3403 -                               [kworker/u16:7]
3415 -                               [kworker/u16:9]
4501 -                               /usr/libexec/nm-vpnc-service
4519 -                               /usr/sbin/vpnc --non-inter --no-detach --pid-file /var/run/NetworkManager/nm-vpnc-bfda8671-f025-4812-a66b-362eb12e7f13.pid -
4749 -                               /usr/libexec/dconf-service
4980 -                               /usr/lib/systemd/systemd-resolved
5006 -                               /usr/lib64/firefox/firefox
5168 -                               [kworker/u16:0]
5192 -                               [kworker/u16:4]
5193 -                               [kworker/u16:5]
5497 -                               [kworker/u16:1]
5591 -                               [kworker/u16:8]
5711 -                               sudo -s
5715 -                               /bin/bash
5749 -                               /home/lennart/projects/systemd/systemd-nspawn -D /srv/mycontainer -b
5750 mycontainer                     /usr/lib/systemd/systemd
5799 mycontainer                     /usr/lib/systemd/systemd-journald
5862 mycontainer                     /usr/lib/systemd/systemd-logind
5863 mycontainer                     /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
5868 mycontainer                     /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102
5871 mycontainer                     /usr/sbin/sshd -D
6527 mycontainer                     /usr/lib/systemd/systemd-resolved
[...]

This shows a process list (shortened). The second column shows the
container a process belongs to. All processes shown with “-” belong to
the host itself.

But it doesn’t stop there. The new “sd-bus” D-Bus client library we
have been preparing in the systemd/kdbus context knows containers
too. While you use sd_bus_open_system() to connect to your local
host’s system bus sd_bus_open_system_container() may be used to
connect to the system bus of any local container, so that you can
execute bus methods on it.

sd-login.h
and machined’s bus
interface

provide a number of APIs to add container support to other programs
too. They support enumeration of containers as well as retrieving the
machine name from a PID and similar.

systemd-networkd also has support for containers. When run inside a
container it will by default run a DHCP client and IPv4LL on any veth
network interface named host0 (this interface is special under the
logic described here). When run on the host networkd will by default
provide a DHCP server and IPv4LL on veth network interface named ve-
followed by a container name.

Let’s have a look at one last facet of systemd’s container
integration: the hook-up with the name service switch. Recent systemd
versions contain a new NSS module nss-mymachines that make the names
of all local containers resolvable via gethostbyname() and
getaddrinfo(). This only applies to containers that run within their
own network namespace. With the systemd-nspawn command shown above the
the container shares the network configuration with the host however;
hence let’s restart the container, this time with a virtual veth
network link between host and container:

# machinectl poweroff mycontainer
# systemd-nspawn -D /srv/mycontainer --network-veth -b

Now, (assuming that networkd is used in the container and outside) we
can already ping the container using its name, due to the simple magic
of nss-mymachines:

# ping mycontainer
PING mycontainer (10.0.0.2) 56(84) bytes of data.
64 bytes from mycontainer (10.0.0.2): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from mycontainer (10.0.0.2): icmp_seq=2 ttl=64 time=0.078 ms

Of course, name resolution not only works with ping, it works with
all other tools that use libc gethostbyname() or getaddrinfo()
too, among them venerable ssh.

And this is pretty much all I want to cover for now. We briefly
touched a variety of integration points, and there’s a lot more still
if you look closely. We are working on even more container integration
all the time, so expect more new features in this area with every
systemd release.

Note that the whole machine concept is actually not limited to
containers, but covers VMs too to a certain degree. However, the
integration is not as close, as access to a VM’s internals is not as
easy as for containers, as it usually requires a network transport
instead of allowing direct syscall access.

Anyway, I hope this is useful. For further details, please have a look
at the linked man pages and other documentation.

On Passwords and Password Expiration

Post Syndicated from David original http://feedproxy.google.com/~r/DevilsAdvocateSecurity/~3/a-hYO-gQkP4/on-passwords-and-password-expiration.html

One of the things that I believe is an important part of my job is to answer user questions in a way that educates them about the topic they ask about in addition to providing the answer. At times, this can be frustrating, but it also challenges me to think about why I’m providing the answer that I do. It also means that I have to review the choices I, and my organization make about policy, process, and the reasons for both.I recently exchanged email with one of our users who questioned our password policy which requires periodic changes of passwords. The user contended that periodic password changes encourages poor password choice, that users who are forced to choose new passwords (even on a relatively infrequent basis) will choose poor passwords, and that in the end, that password changes serve no purpose.In my institution’s case, there are a number of reasons why password changes make sense, and I believe that these are a reasonable match for most companies, colleges, and other organizations – but not necessarily for your Amazon account, or your banking password. It is critical to understand the difference between a daily use password for institutional access that provides access to things like VPN access, email, licensed software, and the rest of the keys to the kingdom, and a single use password that accesses a service or site. Thinking about your password policy in the context of institutional risk while remaining aware of how your users will react is critical.The reasons that help drive password change for my institution, in no particular order are:Password changes help to prevent attackers who have breached accounts, but who have not used them, or who are quietly using them, from having continued access.Similarly, they can help prevent shared passwords from being useful for long term access.They can help prevent users from using the same password in multiple locations by driving changes that don’t match the previously set passwords elsewhere.They can help prevent brute forcing, although this is less common in environments where there are back-off algorithms in place. In many institutions, that central monitoring may not exist, or may not be easy to implement.Password changes continue to be recommended by most best practice documents (including PCI-DSS and others). Including password expiration in your password policy can be an element in proving due diligence as an organization.When you read the list from a user perspective, it is difficult to see a compelling reason for them to change their passwords. There isn’t a big, disaster level threat that is immediately obvious, and the “what’s in it for me” is hard to communicate. When you read it from an organizational perspective, you will likely see a set of reasons that when taken as a whole mean that a reasonable password expiration timeframe is useful at an organizational level. Here’s why: The environment in which most of us work now has two major external threats to passwords: malware and phishing. With malware targeting browsers and browser plugins, and institutional policies that accept that users will visit at least common sites like CNN, ESPN, and other staples of our online lives, we have to acknowledge that malware compromises that gather our user’s passwords are likely.Similarly, despite attempts we make at user education, phishing continues to seduce a portion of our user population into clicking that tempting link, or responding to the IT department that needs to know their password to ensure that their email isn’t turned off. Again, we know that passwords will be exposed.Bulk compromises of passwords are likely to involve captured hashes, which most organizations have spent years designing infrastructure to avoid as tools like Rainbow Tables and faster cracking hardware became available. Thus, we worry more about what access to our networks, and what individual accounts, or small groups of compromised accounts can do. In the event of a large-scale breach of central authentication, the organization will require a password change from every user, typically with immediate expiration of all passwords.In this environment, we will require our users to change their passwords when their account is compromised, but will we know to require that? We know that advanced persistent threats exist, and that some attackers are patient and will wait, gathering information and not abusing the accounts they collect. We can continue to fight those threats with periodic password changes for the accounts that provide access to our institutions.It would, of course, be preferable to use biometrics, or tokens, or some other two factor authentication system. It is also expensive, and difficult to adapt into a diverse environment where credentials are used across a variety of systems that are glued (or duct taped, bubble gummed, and bailing wired) together in a variety of ways. For now, passwords – or preferably passphrases – remain the way to make these heterogeneous systems authenticate and interoperate.In the end, I learned a lot from my exchange with the user. Over the next few months, I’ll be adding additional information to our awareness program reminding users that password changes that change from “Password1” to “Password2” aren’t serving a real use, we’ll add additional information about tools like Password Safe to our posters and awareness materials, and I’ll be working with our identity and access management staff to see if we can leverage their tools to prevent similar poor password practices. In addition, I’ve been using it as a learning opportunity for my staff, and as a challenge for my student employee.I’m aware that I won’t win with every user – I’ll still have the gentleman who resets his password once a day for as many days as our password history and minimum password age will allow so he can get back to his favorite password. I’ll still have the user who changes their password to “Password1!” and claims that yes, they have used a capital and a number and a symbol, and that thus they have met the requirements for a strong password. But I also know that our population continues to grow more security aware, and that many of our users do get the point.If you’re interested in this topic, you may enjoy this Microsoft research about users, security advice, and why they choose to ignore it, and NIST’s password guidance provides a well reasoned explanation of everything from password choice to mnemonics and password guessing.

_uacct = “UA-1423386-1”;
urchinTracker();

On Passwords and Password Expiration

Post Syndicated from David original http://feedproxy.google.com/~r/DevilsAdvocateSecurity/~3/a-hYO-gQkP4/on-passwords-and-password-expiration.html

One of the things that I believe is an important part of my job is to answer user questions in a way that educates them about the topic they ask about in addition to providing the answer. At times, this can be frustrating, but it also challenges me to think about why I’m providing the answer that I do. It also means that I have to review the choices I, and my organization make about policy, process, and the reasons for both.I recently exchanged email with one of our users who questioned our password policy which requires periodic changes of passwords. The user contended that periodic password changes encourages poor password choice, that users who are forced to choose new passwords (even on a relatively infrequent basis) will choose poor passwords, and that in the end, that password changes serve no purpose.In my institution’s case, there are a number of reasons why password changes make sense, and I believe that these are a reasonable match for most companies, colleges, and other organizations – but not necessarily for your Amazon account, or your banking password. It is critical to understand the difference between a daily use password for institutional access that provides access to things like VPN access, email, licensed software, and the rest of the keys to the kingdom, and a single use password that accesses a service or site. Thinking about your password policy in the context of institutional risk while remaining aware of how your users will react is critical.The reasons that help drive password change for my institution, in no particular order are:Password changes help to prevent attackers who have breached accounts, but who have not used them, or who are quietly using them, from having continued access.Similarly, they can help prevent shared passwords from being useful for long term access.They can help prevent users from using the same password in multiple locations by driving changes that don’t match the previously set passwords elsewhere.They can help prevent brute forcing, although this is less common in environments where there are back-off algorithms in place. In many institutions, that central monitoring may not exist, or may not be easy to implement.Password changes continue to be recommended by most best practice documents (including PCI-DSS and others). Including password expiration in your password policy can be an element in proving due diligence as an organization.When you read the list from a user perspective, it is difficult to see a compelling reason for them to change their passwords. There isn’t a big, disaster level threat that is immediately obvious, and the “what’s in it for me” is hard to communicate. When you read it from an organizational perspective, you will likely see a set of reasons that when taken as a whole mean that a reasonable password expiration timeframe is useful at an organizational level. Here’s why: The environment in which most of us work now has two major external threats to passwords: malware and phishing. With malware targeting browsers and browser plugins, and institutional policies that accept that users will visit at least common sites like CNN, ESPN, and other staples of our online lives, we have to acknowledge that malware compromises that gather our user’s passwords are likely.Similarly, despite attempts we make at user education, phishing continues to seduce a portion of our user population into clicking that tempting link, or responding to the IT department that needs to know their password to ensure that their email isn’t turned off. Again, we know that passwords will be exposed.Bulk compromises of passwords are likely to involve captured hashes, which most organizations have spent years designing infrastructure to avoid as tools like Rainbow Tables and faster cracking hardware became available. Thus, we worry more about what access to our networks, and what individual accounts, or small groups of compromised accounts can do. In the event of a large-scale breach of central authentication, the organization will require a password change from every user, typically with immediate expiration of all passwords.In this environment, we will require our users to change their passwords when their account is compromised, but will we know to require that? We know that advanced persistent threats exist, and that some attackers are patient and will wait, gathering information and not abusing the accounts they collect. We can continue to fight those threats with periodic password changes for the accounts that provide access to our institutions.It would, of course, be preferable to use biometrics, or tokens, or some other two factor authentication system. It is also expensive, and difficult to adapt into a diverse environment where credentials are used across a variety of systems that are glued (or duct taped, bubble gummed, and bailing wired) together in a variety of ways. For now, passwords – or preferably passphrases – remain the way to make these heterogeneous systems authenticate and interoperate.In the end, I learned a lot from my exchange with the user. Over the next few months, I’ll be adding additional information to our awareness program reminding users that password changes that change from “Password1” to “Password2” aren’t serving a real use, we’ll add additional information about tools like Password Safe to our posters and awareness materials, and I’ll be working with our identity and access management staff to see if we can leverage their tools to prevent similar poor password practices. In addition, I’ve been using it as a learning opportunity for my staff, and as a challenge for my student employee.I’m aware that I won’t win with every user – I’ll still have the gentleman who resets his password once a day for as many days as our password history and minimum password age will allow so he can get back to his favorite password. I’ll still have the user who changes their password to “Password1!” and claims that yes, they have used a capital and a number and a symbol, and that thus they have met the requirements for a strong password. But I also know that our population continues to grow more security aware, and that many of our users do get the point.If you’re interested in this topic, you may enjoy this Microsoft research about users, security advice, and why they choose to ignore it, and NIST’s password guidance provides a well reasoned explanation of everything from password choice to mnemonics and password guessing.

_uacct = “UA-1423386-1”;
urchinTracker();

systemd for Administrators, Part X

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/instances.html

Here’s the tenth installment
of
my ongoing series
on
systemd
for
Administrators:

Instantiated Services

Most services on Linux/Unix are singleton services: there’s
usually only one instance of Syslog, Postfix, or Apache running on a
specific system at the same time. On the other hand some select
services may run in multiple instances on the same host. For example,
an Internet service like the Dovecot IMAP service could run in
multiple instances on different IP ports or different local IP
addresses. A more common example that exists on all installations is
getty, the mini service that runs once for each TTY and
presents a login prompt on it. On most systems this service is
instantiated once for each of the first six virtual consoles
tty1 to tty6. On some servers depending on
administrator configuration or boot-time parameters an additional
getty is instantiated for a serial or virtualizer console. Another
common instantiated service in the systemd world is fsck, the
file system checker that is instantiated once for each block device
that needs to be checked. Finally, in systemd socket activated
per-connection services (think classic inetd!) are also implemented
via instantiated services: a new instance is created for each incoming
connection. In this installment I hope to explain a bit how systemd
implements instantiated services and how to take advantage of them as
an administrator.

If you followed the previous episodes of this series you are
probably aware that services in systemd are named according to the
pattern foobar.service, where foobar is an
identification string for the service, and .service simply a
fixed suffix that is identical for all service units. The definition files
for these services are searched for in /etc/systemd/system
and /lib/systemd/system (and possibly other directories) under this name. For
instantiated services this pattern is extended a bit: the service name becomes
foobar@quux.service where foobar is the
common service identifier, and quux the instance
identifier. Example: [email protected] is the serial
getty service instantiated for ttyS2.

Service instances can be created dynamically as needed. Without
further configuration you may easily start a new getty on a serial
port simply by invoking a systemctl start command for the new
instance:

# systemctl start [email protected]

If a command like the above is run systemd will first look for a
unit configuration file by the exact name you requested. If this
service file is not found (and usually it isn’t if you use
instantiated services like this) then the instance id is removed from
the name and a unit configuration file by the resulting
template name searched. In other words, in the above example,
if the precise [email protected] unit file cannot
be found, [email protected] is loaded instead. This unit
template file will hence be common for all instances of this
service. For the serial getty we ship a template unit file in systemd
(/lib/systemd/system/[email protected]) that looks
something like this:

[Unit]
Description=Serial Getty on %I
BindTo=dev-%i.device
After=dev-%i.device systemd-user-sessions.service

[Service]
ExecStart=-/sbin/agetty -s %I 115200,38400,9600
Restart=always
RestartSec=0

(Note that the unit template file we actually ship along with
systemd for the serial gettys is a bit longer. If you are interested,
have a look at the actual
file
which includes additional directives for compatibility with
SysV, to clear the screen and remove previous users from the TTY
device. To keep things simple I have shortened the unit file to the
relevant lines here.)

This file looks mostly like any other unit file, with one
distinction: the specifiers %I and %i are used at
multiple locations. At unit load time %I and %i are
replaced by systemd with the instance identifier of the service. In
our example above, if a service is instantiated as
[email protected] the specifiers %I and
%i will be replaced by ttyUSB0. If you introspect
the instanciated unit with systemctl status
[email protected]
you will see these replacements
having taken place:

$ systemctl status [email protected]
[email protected] - Getty on ttyUSB0
	  Loaded: loaded (/lib/systemd/system/[email protected]; static)
	  Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago
	Main PID: 5443 (agetty)
	  CGroup: name=systemd:/system/[email protected]/ttyUSB0
		  └ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600

And that is already the core idea of instantiated services in
systemd. As you can see systemd provides a very simple templating
system, which can be used to dynamically instantiate services as
needed. To make effective use of this, a few more notes:

You may instantiate these services on-the-fly in
.wants/ symbolic links in the file system. For example, to
make sure the serial getty on ttyUSB0 is started
automatically at every boot, create a symlink like this:

# ln -s /lib/systemd/system/[email protected] /etc/systemd/system/getty.target.wants/[email protected]ttyUSB0.service

systemd will instantiate the symlinked unit file with the
instance name specified in the symlink name.

You cannot instantiate a unit template without specifying an
instance identifier. In other words systemctl start
[email protected]
will necessarily fail since the instance
name was left unspecified.

Sometimes it is useful to opt-out of the generic template
for one specific instance. For these cases make use of the fact that
systemd always searches first for the full instance file name before
falling back to the template file name: make sure to place a unit file
under the fully instantiated name in /etc/systemd/system and
it will override the generic templated version for this specific
instance.

The unit file shown above uses %i at some places and
%I at others. You may wonder what the difference between
these specifiers are. %i is replaced by the exact characters
of the instance identifier. For %I on the other hand the
instance identifier is first passed through a simple unescaping
algorithm. In the case of a simple instance identifier like
ttyUSB0 there is no effective difference. However, if the
device name includes one or more slashes (“/“) this cannot be
part of a unit name (or Unix file name). Before such a device name can
be used as instance identifier it needs to be escaped so that “/”
becomes “-” and most other special characters (including “-“) are
replaced by “xAB” where AB is the ASCII code of the character in
hexadecimal notation[1]. Example: to refer to a USB serial port by its
bus path we want to use a port name like
serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0. The
escaped version of this name is
serial-byx2dpath-pcix2d0000:00:1d.0x2dusbx2d0:1.4:1.1x2dport0. %I
will then refer to former, %i to the latter. Effectively this
means %i is useful wherever it is necessary to refer to other
units, for example to express additional dependencies. On the other
hand %I is useful for usage in command lines, or inclusion in
pretty description strings. Let’s check how this looks with the above unit file:

# systemctl start '[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service'
# systemctl status '[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service'
[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service - Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0
	  Loaded: loaded (/lib/systemd/system/[email protected]; static)
	  Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago
	Main PID: 5788 (agetty)
	  CGroup: name=systemd:/system/[email protected]/serial-byx2dpath-pcix2d0000:00:1d.0x2dusbx2d0:1.4:1.1x2dport0
		  └ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600

As we can see the while the instance identifier is the escaped
string the command line and the description string actually use the
unescaped version, as expected.

(Side note: there are more specifiers available than just
%i and %I, and many of them are actually
available in all unit files, not just templates for service
instances. For more details see the man
page
which includes a full list and terse explanations.)

And at this point this shall be all for now. Stay tuned for a
follow-up article on how instantiated services are used for
inetd-style socket activation.

Footnotes

[1] Yupp, this escaping algorithm doesn’t really result in
particularly pretty escaped strings, but then again, most escaping
algorithms don’t help readability. The algorithm we used here is
inspired by what udev does in a similar case, with one change. In the
end, we had to pick something. If you’ll plan to comment on the
escaping algorithm please also mention where you live so that I can
come around and paint your bike shed yellow with blue stripes. Thanks!

systemd for Administrators, Part X

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/instances.html

Here’s the tenth installment
of
my ongoing series
on
systemd
for
Administrators:

Instantiated Services

Most services on Linux/Unix are singleton services: there’s
usually only one instance of Syslog, Postfix, or Apache running on a
specific system at the same time. On the other hand some select
services may run in multiple instances on the same host. For example,
an Internet service like the Dovecot IMAP service could run in
multiple instances on different IP ports or different local IP
addresses. A more common example that exists on all installations is
getty, the mini service that runs once for each TTY and
presents a login prompt on it. On most systems this service is
instantiated once for each of the first six virtual consoles
tty1 to tty6. On some servers depending on
administrator configuration or boot-time parameters an additional
getty is instantiated for a serial or virtualizer console. Another
common instantiated service in the systemd world is fsck, the
file system checker that is instantiated once for each block device
that needs to be checked. Finally, in systemd socket activated
per-connection services (think classic inetd!) are also implemented
via instantiated services: a new instance is created for each incoming
connection. In this installment I hope to explain a bit how systemd
implements instantiated services and how to take advantage of them as
an administrator.

If you followed the previous episodes of this series you are
probably aware that services in systemd are named according to the
pattern foobar.service, where foobar is an
identification string for the service, and .service simply a
fixed suffix that is identical for all service units. The definition files
for these services are searched for in /etc/systemd/system
and /lib/systemd/system (and possibly other directories) under this name. For
instantiated services this pattern is extended a bit: the service name becomes
[email protected] where foobar is the
common service identifier, and quux the instance
identifier. Example: [email protected] is the serial
getty service instantiated for ttyS2.

Service instances can be created dynamically as needed. Without
further configuration you may easily start a new getty on a serial
port simply by invoking a systemctl start command for the new
instance:

# systemctl start [email protected]

If a command like the above is run systemd will first look for a
unit configuration file by the exact name you requested. If this
service file is not found (and usually it isn’t if you use
instantiated services like this) then the instance id is removed from
the name and a unit configuration file by the resulting
template name searched. In other words, in the above example,
if the precise [email protected] unit file cannot
be found, [email protected] is loaded instead. This unit
template file will hence be common for all instances of this
service. For the serial getty we ship a template unit file in systemd
(/lib/systemd/system/[email protected]) that looks
something like this:

[Unit]
Description=Serial Getty on %I
BindTo=dev-%i.device
After=dev-%i.device systemd-user-sessions.service

[Service]
ExecStart=-/sbin/agetty -s %I 115200,38400,9600
Restart=always
RestartSec=0

(Note that the unit template file we actually ship along with
systemd for the serial gettys is a bit longer. If you are interested,
have a look at the actual
file
which includes additional directives for compatibility with
SysV, to clear the screen and remove previous users from the TTY
device. To keep things simple I have shortened the unit file to the
relevant lines here.)

This file looks mostly like any other unit file, with one
distinction: the specifiers %I and %i are used at
multiple locations. At unit load time %I and %i are
replaced by systemd with the instance identifier of the service. In
our example above, if a service is instantiated as
[email protected] the specifiers %I and
%i will be replaced by ttyUSB0. If you introspect
the instanciated unit with systemctl status
[email protected] you will see these replacements
having taken place:

$ systemctl status [email protected]
[email protected] – Getty on ttyUSB0
Loaded: loaded (/lib/systemd/system/[email protected]; static)
Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago
Main PID: 5443 (agetty)
CGroup: name=systemd:/system/[email protected]/ttyUSB0
└ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600

And that is already the core idea of instantiated services in
systemd. As you can see systemd provides a very simple templating
system, which can be used to dynamically instantiate services as
needed. To make effective use of this, a few more notes:

You may instantiate these services on-the-fly in
.wants/ symbolic links in the file system. For example, to
make sure the serial getty on ttyUSB0 is started
automatically at every boot, create a symlink like this:

# ln -s /lib/systemd/system/[email protected] /etc/systemd/system/getty.target.wants/[email protected]

systemd will instantiate the symlinked unit file with the
instance name specified in the symlink name.

You cannot instantiate a unit template without specifying an
instance identifier. In other words systemctl start
[email protected] will necessarily fail since the instance
name was left unspecified.

Sometimes it is useful to opt-out of the generic template
for one specific instance. For these cases make use of the fact that
systemd always searches first for the full instance file name before
falling back to the template file name: make sure to place a unit file
under the fully instantiated name in /etc/systemd/system and
it will override the generic templated version for this specific
instance.

The unit file shown above uses %i at some places and
%I at others. You may wonder what the difference between
these specifiers are. %i is replaced by the exact characters
of the instance identifier. For %I on the other hand the
instance identifier is first passed through a simple unescaping
algorithm. In the case of a simple instance identifier like
ttyUSB0 there is no effective difference. However, if the
device name includes one or more slashes (“/”) this cannot be
part of a unit name (or Unix file name). Before such a device name can
be used as instance identifier it needs to be escaped so that “/”
becomes “-” and most other special characters (including “-“) are
replaced by “xAB” where AB is the ASCII code of the character in
hexadecimal notation[1]. Example: to refer to a USB serial port by its
bus path we want to use a port name like
serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0. The
escaped version of this name is
serial-byx2dpath-pcix2d0000:00:1d.0x2dusbx2d0:1.4:1.1x2dport0. %I
will then refer to former, %i to the latter. Effectively this
means %i is useful wherever it is necessary to refer to other
units, for example to express additional dependencies. On the other
hand %I is useful for usage in command lines, or inclusion in
pretty description strings. Let’s check how this looks with the above unit file:

# systemctl start ‘[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service’
# systemctl status ‘[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service’
[email protected]:00:1d.0x2dusbx2d0:1.4:1.1x2dport0.service – Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0
Loaded: loaded (/lib/systemd/system/[email protected]; static)
Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago
Main PID: 5788 (agetty)
CGroup: name=systemd:/system/[email protected]/serial-byx2dpath-pcix2d0000:00:1d.0x2dusbx2d0:1.4:1.1x2dport0
└ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600

As we can see the while the instance identifier is the escaped
string the command line and the description string actually use the
unescaped version, as expected.

(Side note: there are more specifiers available than just
%i and %I, and many of them are actually
available in all unit files, not just templates for service
instances. For more details see the man
page
which includes a full list and terse explanations.)

And at this point this shall be all for now. Stay tuned for a
follow-up article on how instantiated services are used for
inetd-style socket activation.

Footnotes

[1] Yupp, this escaping algorithm doesn’t really result in
particularly pretty escaped strings, but then again, most escaping
algorithms don’t help readability. The algorithm we used here is
inspired by what udev does in a similar case, with one change. In the
end, we had to pick something. If you’ll plan to comment on the
escaping algorithm please also mention where you live so that I can
come around and paint your bike shed yellow with blue stripes. Thanks!

systemd for Administrators, Part X

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/instances.html

Here’s the tenth installment
of
my ongoing series
on
systemd
for
Administrators:

Instantiated Services

Most services on Linux/Unix are singleton services: there’s
usually only one instance of Syslog, Postfix, or Apache running on a
specific system at the same time. On the other hand some select
services may run in multiple instances on the same host. For example,
an Internet service like the Dovecot IMAP service could run in
multiple instances on different IP ports or different local IP
addresses. A more common example that exists on all installations is
getty, the mini service that runs once for each TTY and
presents a login prompt on it. On most systems this service is
instantiated once for each of the first six virtual consoles
tty1 to tty6. On some servers depending on
administrator configuration or boot-time parameters an additional
getty is instantiated for a serial or virtualizer console. Another
common instantiated service in the systemd world is fsck, the
file system checker that is instantiated once for each block device
that needs to be checked. Finally, in systemd socket activated
per-connection services (think classic inetd!) are also implemented
via instantiated services: a new instance is created for each incoming
connection. In this installment I hope to explain a bit how systemd
implements instantiated services and how to take advantage of them as
an administrator.

If you followed the previous episodes of this series you are
probably aware that services in systemd are named according to the
pattern foobar.service, where foobar is an
identification string for the service, and .service simply a
fixed suffix that is identical for all service units. The definition files
for these services are searched for in /etc/systemd/system
and /lib/systemd/system (and possibly other directories) under this name. For
instantiated services this pattern is extended a bit: the service name becomes
foobar@quux.service where foobar is the
common service identifier, and quux the instance
identifier. Example: [email protected] is the serial
getty service instantiated for ttyS2.

Service instances can be created dynamically as needed. Without
further configuration you may easily start a new getty on a serial
port simply by invoking a systemctl start command for the new
instance:

# systemctl start [email protected]

If a command like the above is run systemd will first look for a
unit configuration file by the exact name you requested. If this
service file is not found (and usually it isn’t if you use
instantiated services like this) then the instance id is removed from
the name and a unit configuration file by the resulting
template name searched. In other words, in the above example,
if the precise [email protected] unit file cannot
be found, [email protected] is loaded instead. This unit
template file will hence be common for all instances of this
service. For the serial getty we ship a template unit file in systemd
(/lib/systemd/system/[email protected]) that looks
something like this:

[Unit]
Description=Serial Getty on %I
BindTo=dev-%i.device
After=dev-%i.device systemd-user-sessions.service

[Service]
ExecStart=-/sbin/agetty -s %I 115200,38400,9600
Restart=always
RestartSec=0

(Note that the unit template file we actually ship along with
systemd for the serial gettys is a bit longer. If you are interested,
have a look at the actual
file
which includes additional directives for compatibility with
SysV, to clear the screen and remove previous users from the TTY
device. To keep things simple I have shortened the unit file to the
relevant lines here.)

This file looks mostly like any other unit file, with one
distinction: the specifiers %I and %i are used at
multiple locations. At unit load time %I and %i are
replaced by systemd with the instance identifier of the service. In
our example above, if a service is instantiated as
[email protected] the specifiers %I and
%i will be replaced by ttyUSB0. If you introspect
the instanciated unit with systemctl status
[email protected]
you will see these replacements
having taken place:

$ systemctl status [email protected]
[email protected] - Getty on ttyUSB0
	  Loaded: loaded (/lib/systemd/system/[email protected]; static)
	  Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago
	Main PID: 5443 (agetty)
	  CGroup: name=systemd:/system/[email protected]/ttyUSB0
		  └ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600

And that is already the core idea of instantiated services in
systemd. As you can see systemd provides a very simple templating
system, which can be used to dynamically instantiate services as
needed. To make effective use of this, a few more notes:

You may instantiate these services on-the-fly in
.wants/ symbolic links in the file system. For example, to
make sure the serial getty on ttyUSB0 is started
automatically at every boot, create a symlink like this:

# ln -s /lib/systemd/system/[email protected] /etc/systemd/system/getty.target.wants/[email protected]ttyUSB0.service

systemd will instantiate the symlinked unit file with the
instance name specified in the symlink name.

You cannot instantiate a unit template without specifying an
instance identifier. In other words systemctl start
[email protected]
will necessarily fail since the instance
name was left unspecified.

Sometimes it is useful to opt-out of the generic template
for one specific instance. For these cases make use of the fact that
systemd always searches first for the full instance file name before
falling back to the template file name: make sure to place a unit file
under the fully instantiated name in /etc/systemd/system and
it will override the generic templated version for this specific
instance.

The unit file shown above uses %i at some places and
%I at others. You may wonder what the difference between
these specifiers are. %i is replaced by the exact characters
of the instance identifier. For %I on the other hand the
instance identifier is first passed through a simple unescaping
algorithm. In the case of a simple instance identifier like
ttyUSB0 there is no effective difference. However, if the
device name includes one or more slashes (“/“) this cannot be
part of a unit name (or Unix file name). Before such a device name can
be used as instance identifier it needs to be escaped so that “/”
becomes “-” and most other special characters (including “-“) are
replaced by “\xAB” where AB is the ASCII code of the character in
hexadecimal notation[1]. Example: to refer to a USB serial port by its
bus path we want to use a port name like
serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0. The
escaped version of this name is
serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0. %I
will then refer to former, %i to the latter. Effectively this
means %i is useful wherever it is necessary to refer to other
units, for example to express additional dependencies. On the other
hand %I is useful for usage in command lines, or inclusion in
pretty description strings. Let’s check how this looks with the above unit file:

# systemctl start '[email protected]\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service'
# systemctl status '[email protected]\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service'
[email protected]\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service - Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0
	  Loaded: loaded (/lib/systemd/system/[email protected]; static)
	  Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago
	Main PID: 5788 (agetty)
	  CGroup: name=systemd:/system/[email protected]/serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0
		  └ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600

As we can see the while the instance identifier is the escaped
string the command line and the description string actually use the
unescaped version, as expected.

(Side note: there are more specifiers available than just
%i and %I, and many of them are actually
available in all unit files, not just templates for service
instances. For more details see the man
page
which includes a full list and terse explanations.)

And at this point this shall be all for now. Stay tuned for a
follow-up article on how instantiated services are used for
inetd-style socket activation.

Footnotes

[1] Yupp, this escaping algorithm doesn’t really result in
particularly pretty escaped strings, but then again, most escaping
algorithms don’t help readability. The algorithm we used here is
inspired by what udev does in a similar case, with one change. In the
end, we had to pick something. If you’ll plan to comment on the
escaping algorithm please also mention where you live so that I can
come around and paint your bike shed yellow with blue stripes. Thanks!

systemd for Administrators, Part VII

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/blame-game.html

Here’s yet another installment of my ongoing
series
on
systemd
for
Administrators:

The Blame Game

Fedora 15[1] is the first Fedora release to sport systemd. Our
primary goal for F15 was to get everything integrated and working
well. One focus for Fedora 16 will be to further polish and speed up
what we have in the distribution now. To prepare for this cycle we
have implemented a few tools (which are already available in F15),
which can help us pinpoint where exactly the biggest problems in our
boot-up remain. With this blog story I hope to shed some light on how
to figure out what to blame for your slow boot-up, and what to do
about it. We want to allow you to put the blame where the blame
belongs: on the system component responsible.

The first utility is a very simple one: systemd will automatically
write a log message with the time it needed to syslog/kmsg when it
finished booting up.

systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.

And here’s how you read this: 2s have been spent for kernel
initialization, until the time where the initial RAM disk (initrd,
i.e. dracut) was started. A bit less than 3s have then been spent in
the initrd. Finally, a bit less than 12s have been spent after the
actual system init daemon (systemd) has been invoked by the initrd to
bring up userspace. Summing this up the time that passed since the
boot loader jumped into the kernel code until systemd was finished
doing everything it needed to do at boot was a bit less than 17s. This
number is nice and simple to understand — and also easy to
misunderstand: it does not include the time that is spent initializing
your GNOME session, as that is outside of the scope of the init
system. Also, in many cases this is just where systemd finished doing
everything it needed to do. Very likely some daemons are still busy
doing whatever they need to do to finish startup when this time
is elapsed. Hence: while the time logged here is a good indication on
the general boot speed, it is not the time the user might feel
the boot actually takes.

Also, it is a pretty superficial value: it gives no insight which
system component systemd was waiting for all the time. To break this
up, we introduced the tool systemd-analyze blame:

$ systemd-analyze blame
  6207ms udev-settle.service
  5228ms [email protected]ervice
   735ms NetworkManager.service
   642ms avahi-daemon.service
   600ms abrtd.service
   517ms rtkit-daemon.service
   478ms fedora-storage-init.service
   396ms dbus.service
   390ms rpcidmapd.service
   346ms systemd-tmpfiles-setup.service
   322ms fedora-sysinit-unhack.service
   316ms cups.service
   310ms console-kit-log-system-start.service
   309ms libvirtd.service
   303ms rpcbind.service
   298ms ksmtuned.service
   288ms lvm2-monitor.service
   281ms rpcgssd.service
   277ms sshd.service
   276ms livesys.service
   267ms iscsid.service
   236ms mdmonitor.service
   234ms nfslock.service
   223ms ksm.service
   218ms mcelog.service
...

This tool lists which systemd unit needed how much time to finish
initialization at boot, the worst offenders listed first. What we can
see here is that on this boot two services required more than 1s of
boot time: udev-settle.service and
[email protected]ervice. This
tool’s output is easily misunderstood as well, it does not shed any
light on why the services in question actually need this much time, it
just determines that they did. Also note that the times listed here
might be spent “in parallel”, i.e. two services might be initializing
at the same time and thus the time spent to initialize them both is
much less than the sum of both individual times combined.

Let’s have a closer look at the worst offender on this boot: a
service by the name of udev-settle.service. So why does it
take that much time to initialize, and what can we do about it? This
service actually does very little: it just waits for the device
probing being done by udev to finish and then exits. Device probing
can be slow. In this instance for example, the reason for the device
probing to take more than 6s is the 3G modem built into the machine,
which when not having an inserted SIM card takes this long to respond
to software probe requests. The software probing is part of the logic
that makes ModemManager work and enables NetworkManager to offer easy
3G setup. An obvious reflex might now be to blame ModemManager for
having such a slow prober. But that’s actually ill-directed: hardware
probing quite frequently is this slow, and in the case of ModemManager
it’s a simple fact that the 3G hardware takes this long. It is an
essential requirement for a proper hardware probing solution that
individual probers can take this much time to finish probing. The
actual culprit is something else: the fact that we actually wait for
the probing, in other words: that udev-settle.service is part
of our boot process.

So, why is udev-settle.service part of our boot process?
Well, it actually doesn’t need to be. It is pulled in by the storage
setup logic of Fedora: to be precise, by the LVM, RAID and Multipath
setup script. These storage services have not been implemented in the
way hardware detection and probing work today: they expect to be
initialized at a point in time where “all devices have been probed”,
so that they can simply iterate through the list of available disks
and do their work on it. However, on modern machinery this is not how
things actually work: hardware can come and hardware can go all the
time, during boot and during runtime. For some technologies it is not
even possible to know when the device enumeration is complete
(example: USB, or iSCSI), thus waiting for all storage devices to show
up and be probed must necessarily include a fixed delay when it is
assumed that all devices that can show up have shown up, and got
probed. In this case all this shows very negatively in the boot time: the
storage scripts force us to delay bootup until all potential devices
have shown up and all devices that did got probed — and all that even
though we don’t actually need most devices for anything. In particular
since this machine actually does not make use of LVM, RAID or
Multipath![2]

Knowing what we know now we can go and disable
udev-settle.service for the next boots: since neither LVM,
RAID nor Multipath is used we can mask the services in question and
thus speed up our boot a little:

# ln -s /dev/null /etc/systemd/system/udev-settle.service
# ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service
# ln -s /dev/null /etc/systemd/system/fedora-storage-init.service
# systemctl daemon-reload

After restarting we can measure that the boot is now about 1s
faster. Why just 1s? Well, the second worst offender is cryptsetup
here: the machine in question has an encrypted
/home directory. For testing purposes I have stored the
passphrase in a file on disk, so that the boot-up is not delayed
because I as the user am a slow typer. The cryptsetup tool
unfortunately still takes more han 5s to set up the encrypted
partition. Being lazy instead of trying to fix
cryptsetup[3] we’ll just tape over it here [4]:
systemd will normally wait for all file systems not marked with the
noauto option in /etc/fstab to show up, to be fscked and to
be mounted before proceeding bootup and starting the usual system
services. In the case of /home (unlike for example
/var) we know that it is needed only very late (i.e. when the
user actually logs in). An easy fix is hence to make the mount point
available already during boot, but not actually wait until cryptsetup,
fsck and mount finished running for it. You ask how we can make a
mount point available before actually mounting the file system behind
it? Well, systemd possesses magic powers, in form of the
comment=systemd.automount mount option in
/etc/fstab. If you specify it, systemd will create an
automount point at /home and when at the time of the first
access to the file system it still isn’t backed by a proper file
system systemd will wait for the device, fsck and mount it.

And here’s the result with this change to /etc/fstab
made:

systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.

Nice! With a few fixes we took almost 7s off our boot-time. And
these two changes are only fixes for the two most superficial
problems. With a bit of love and detail work there’s a lot of
additional room for improvements. In fact, on a different machine, a
more than two year old X300 laptop (which even back then wasn’t the
fastest machine on earth) and a bit of decrufting we have boot times
of around 4s (total) now, with a resonably complete GNOME system. And there’s
still a lot of room in it.

systemd-analyze blame is a nice and simple tool for
tracking down slow services. However, it suffers by a big problem: it
does not visualize how the parallel execution of the services actually
diminishes the price one pays for slow starting services. For that we
have prepared systemd-analyize plot for you. Use it like
this:

$ systemd-analyze plot > plot.svg
$ eog plot.svg

It creates pretty graphs, showing the time services spent to start
up in relation to the other services. It currently doesn’t visualize
explicitly which services wait for which ones, but with a bit of guess
work this is easily seen nonetheless.

To see the effect of our two little optimizations here are two
graphs generated with systemd-analyze plot, the first before
and the other after our change:

Before After

(For the sake of completeness, here are the two complete outputs of
systemd-analyze blame for these two boots: before and after.)

The well-informed reader probably wonders how this relates to Michael Meeks’
bootchart
. This plot and bootchart do show similar graphs, that is
true. Bootchart is by far the more powerful tool. It plots in all
detail what is happening during the boot, how much CPU and IO is
used. systemd-analyze plot shows more high-level data: which
service took how much time to initialize, and what needed to wait for
it. If you use them both together you’ll have a wonderful toolset to
figure out why your boot is not as fast as it could be.

Now, before you now take these tools and start filing bugs against
the worst boot-up time offenders on your system: think twice. These
tools give you raw data, don’t misread it. As my optimization example
above hopefully shows, the blame for the slow bootup was not actually
with udev-settle.service, and not with the ModemManager
prober run by it either. It is with the subsystem that pulled this
service in in the first place. And that’s where the problem needs to
be fixed. So, file the bugs at the right places. Put the blame where
the blame belongs.

As mentioned, these three utilities are available on your Fedora 15
system out-of-the-box.

And here’s what to take home from this little blog story:

  • systemd-analyze is a wonderful tool and systemd comes
    with profiling built in.
  • Don’t misread the data these tools generate!
  • With two simple changes you might be able to speed up your system
    by 7s!
  • Fix your software if it can’t handle dynamic hardware
    properly!
  • The Fedora default of installing the OS on an enterprise-level
    storage managing system might be something to rethink.

And that’s all for now. Thank you for your interest.

Footnotes

[1] Also known as the greatest Free Software OS release
ever.

[2] The right fix here is to improve the services in
question to actively listen to hotplug events via libudev or similar
and act on the devices showing up as they show up, so that we can
continue with the bootup the instant everything we really need to go
on has shown up. To get a quick bootup we should wait for what we
actually need to proceed, not for everything. Also note that the
storage services are not the only services which do not cope well with
modern dynamic hardware, and assume that the device list is static and
stays unchanged. For example, in this example the reason the initrd is
actually as slow as it is is mostly due to the fact that Plymouth
expects to be executed when all video devices have shown up and have
been probed. For an unknown reason (at least unknown to me) loading
the video kernel modules for my Intel graphics cards takes multiple
seconds, and hence the entire boot is delayed unnecessarily. (Here too
I’d not put the blame on the probing but on the fact that we
wait for it to complete before going on.)

[3] Well, to be precise, I actually did try to get this
fixed. Most of the delay of crypsetup stems from the — in my eyes —
unnecessarily high default values for --iter-time in
cryptsetup. I tried to convince our cryptsetup maintainers that 100ms
as a default here are not really less secure than 1s, but well, I
failed.

[4] Of course, it’s usually not our style to just tape over
problems instead of fixing them, but this is such a nice occasion to
show off yet another cool systemd feature…

systemd for Administrators, Part VII

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/blame-game.html

Here’s yet another installment of my ongoing
series
on
systemd
for
Administrators:

The Blame Game

Fedora 15[1] is the first Fedora release to sport systemd. Our
primary goal for F15 was to get everything integrated and working
well. One focus for Fedora 16 will be to further polish and speed up
what we have in the distribution now. To prepare for this cycle we
have implemented a few tools (which are already available in F15),
which can help us pinpoint where exactly the biggest problems in our
boot-up remain. With this blog story I hope to shed some light on how
to figure out what to blame for your slow boot-up, and what to do
about it. We want to allow you to put the blame where the blame
belongs: on the system component responsible.

The first utility is a very simple one: systemd will automatically
write a log message with the time it needed to syslog/kmsg when it
finished booting up.

systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.

And here’s how you read this: 2s have been spent for kernel
initialization, until the time where the initial RAM disk (initrd,
i.e. dracut) was started. A bit less than 3s have then been spent in
the initrd. Finally, a bit less than 12s have been spent after the
actual system init daemon (systemd) has been invoked by the initrd to
bring up userspace. Summing this up the time that passed since the
boot loader jumped into the kernel code until systemd was finished
doing everything it needed to do at boot was a bit less than 17s. This
number is nice and simple to understand — and also easy to
misunderstand: it does not include the time that is spent initializing
your GNOME session, as that is outside of the scope of the init
system. Also, in many cases this is just where systemd finished doing
everything it needed to do. Very likely some daemons are still busy
doing whatever they need to do to finish startup when this time
is elapsed. Hence: while the time logged here is a good indication on
the general boot speed, it is not the time the user might feel
the boot actually takes.

Also, it is a pretty superficial value: it gives no insight which
system component systemd was waiting for all the time. To break this
up, we introduced the tool systemd-analyze blame:

$ systemd-analyze blame
6207ms udev-settle.service
5228ms [email protected]ervice
735ms NetworkManager.service
642ms avahi-daemon.service
600ms abrtd.service
517ms rtkit-daemon.service
478ms fedora-storage-init.service
396ms dbus.service
390ms rpcidmapd.service
346ms systemd-tmpfiles-setup.service
322ms fedora-sysinit-unhack.service
316ms cups.service
310ms console-kit-log-system-start.service
309ms libvirtd.service
303ms rpcbind.service
298ms ksmtuned.service
288ms lvm2-monitor.service
281ms rpcgssd.service
277ms sshd.service
276ms livesys.service
267ms iscsid.service
236ms mdmonitor.service
234ms nfslock.service
223ms ksm.service
218ms mcelog.service

This tool lists which systemd unit needed how much time to finish
initialization at boot, the worst offenders listed first. What we can
see here is that on this boot two services required more than 1s of
boot time: udev-settle.service and
[email protected]ervice. This
tool’s output is easily misunderstood as well, it does not shed any
light on why the services in question actually need this much time, it
just determines that they did. Also note that the times listed here
might be spent “in parallel”, i.e. two services might be initializing
at the same time and thus the time spent to initialize them both is
much less than the sum of both individual times combined.

Let’s have a closer look at the worst offender on this boot: a
service by the name of udev-settle.service. So why does it
take that much time to initialize, and what can we do about it? This
service actually does very little: it just waits for the device
probing being done by udev to finish and then exits. Device probing
can be slow. In this instance for example, the reason for the device
probing to take more than 6s is the 3G modem built into the machine,
which when not having an inserted SIM card takes this long to respond
to software probe requests. The software probing is part of the logic
that makes ModemManager work and enables NetworkManager to offer easy
3G setup. An obvious reflex might now be to blame ModemManager for
having such a slow prober. But that’s actually ill-directed: hardware
probing quite frequently is this slow, and in the case of ModemManager
it’s a simple fact that the 3G hardware takes this long. It is an
essential requirement for a proper hardware probing solution that
individual probers can take this much time to finish probing. The
actual culprit is something else: the fact that we actually wait for
the probing, in other words: that udev-settle.service is part
of our boot process.

So, why is udev-settle.service part of our boot process?
Well, it actually doesn’t need to be. It is pulled in by the storage
setup logic of Fedora: to be precise, by the LVM, RAID and Multipath
setup script. These storage services have not been implemented in the
way hardware detection and probing work today: they expect to be
initialized at a point in time where “all devices have been probed”,
so that they can simply iterate through the list of available disks
and do their work on it. However, on modern machinery this is not how
things actually work: hardware can come and hardware can go all the
time, during boot and during runtime. For some technologies it is not
even possible to know when the device enumeration is complete
(example: USB, or iSCSI), thus waiting for all storage devices to show
up and be probed must necessarily include a fixed delay when it is
assumed that all devices that can show up have shown up, and got
probed. In this case all this shows very negatively in the boot time: the
storage scripts force us to delay bootup until all potential devices
have shown up and all devices that did got probed — and all that even
though we don’t actually need most devices for anything. In particular
since this machine actually does not make use of LVM, RAID or
Multipath![2]

Knowing what we know now we can go and disable
udev-settle.service for the next boots: since neither LVM,
RAID nor Multipath is used we can mask the services in question and
thus speed up our boot a little:

# ln -s /dev/null /etc/systemd/system/udev-settle.service
# ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service
# ln -s /dev/null /etc/systemd/system/fedora-storage-init.service
# systemctl daemon-reload

After restarting we can measure that the boot is now about 1s
faster. Why just 1s? Well, the second worst offender is cryptsetup
here: the machine in question has an encrypted
/home directory. For testing purposes I have stored the
passphrase in a file on disk, so that the boot-up is not delayed
because I as the user am a slow typer. The cryptsetup tool
unfortunately still takes more han 5s to set up the encrypted
partition. Being lazy instead of trying to fix
cryptsetup[3] we’ll just tape over it here [4]:
systemd will normally wait for all file systems not marked with the
noauto option in /etc/fstab to show up, to be fscked and to
be mounted before proceeding bootup and starting the usual system
services. In the case of /home (unlike for example
/var) we know that it is needed only very late (i.e. when the
user actually logs in). An easy fix is hence to make the mount point
available already during boot, but not actually wait until cryptsetup,
fsck and mount finished running for it. You ask how we can make a
mount point available before actually mounting the file system behind
it? Well, systemd possesses magic powers, in form of the
comment=systemd.automount mount option in
/etc/fstab. If you specify it, systemd will create an
automount point at /home and when at the time of the first
access to the file system it still isn’t backed by a proper file
system systemd will wait for the device, fsck and mount it.

And here’s the result with this change to /etc/fstab
made:

systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.

Nice! With a few fixes we took almost 7s off our boot-time. And
these two changes are only fixes for the two most superficial
problems. With a bit of love and detail work there’s a lot of
additional room for improvements. In fact, on a different machine, a
more than two year old X300 laptop (which even back then wasn’t the
fastest machine on earth) and a bit of decrufting we have boot times
of around 4s (total) now, with a resonably complete GNOME system. And there’s
still a lot of room in it.

systemd-analyze blame is a nice and simple tool for
tracking down slow services. However, it suffers by a big problem: it
does not visualize how the parallel execution of the services actually
diminishes the price one pays for slow starting services. For that we
have prepared systemd-analyize plot for you. Use it like
this:

$ systemd-analyze plot > plot.svg
$ eog plot.svg

It creates pretty graphs, showing the time services spent to start
up in relation to the other services. It currently doesn’t visualize
explicitly which services wait for which ones, but with a bit of guess
work this is easily seen nonetheless.

To see the effect of our two little optimizations here are two
graphs generated with systemd-analyze plot, the first before
and the other after our change:

Before After

(For the sake of completeness, here are the two complete outputs of
systemd-analyze blame for these two boots: before and after.)

The well-informed reader probably wonders how this relates to Michael Meeks’
bootchart
. This plot and bootchart do show similar graphs, that is
true. Bootchart is by far the more powerful tool. It plots in all
detail what is happening during the boot, how much CPU and IO is
used. systemd-analyze plot shows more high-level data: which
service took how much time to initialize, and what needed to wait for
it. If you use them both together you’ll have a wonderful toolset to
figure out why your boot is not as fast as it could be.

Now, before you now take these tools and start filing bugs against
the worst boot-up time offenders on your system: think twice. These
tools give you raw data, don’t misread it. As my optimization example
above hopefully shows, the blame for the slow bootup was not actually
with udev-settle.service, and not with the ModemManager
prober run by it either. It is with the subsystem that pulled this
service in in the first place. And that’s where the problem needs to
be fixed. So, file the bugs at the right places. Put the blame where
the blame belongs.

As mentioned, these three utilities are available on your Fedora 15
system out-of-the-box.

And here’s what to take home from this little blog story:

systemd-analyze is a wonderful tool and systemd comes
with profiling built in.

Don’t misread the data these tools generate!

With two simple changes you might be able to speed up your system
by 7s!

Fix your software if it can’t handle dynamic hardware
properly!

The Fedora default of installing the OS on an enterprise-level
storage managing system might be something to rethink.

And that’s all for now. Thank you for your interest.

Footnotes

[1] Also known as the greatest Free Software OS release
ever.

[2] The right fix here is to improve the services in
question to actively listen to hotplug events via libudev or similar
and act on the devices showing up as they show up, so that we can
continue with the bootup the instant everything we really need to go
on has shown up. To get a quick bootup we should wait for what we
actually need to proceed, not for everything. Also note that the
storage services are not the only services which do not cope well with
modern dynamic hardware, and assume that the device list is static and
stays unchanged. For example, in this example the reason the initrd is
actually as slow as it is is mostly due to the fact that Plymouth
expects to be executed when all video devices have shown up and have
been probed. For an unknown reason (at least unknown to me) loading
the video kernel modules for my Intel graphics cards takes multiple
seconds, and hence the entire boot is delayed unnecessarily. (Here too
I’d not put the blame on the probing but on the fact that we
wait for it to complete before going on.)

[3] Well, to be precise, I actually did try to get this
fixed. Most of the delay of crypsetup stems from the — in my eyes —
unnecessarily high default values for –iter-time in
cryptsetup. I tried to convince our cryptsetup maintainers that 100ms
as a default here are not really less secure than 1s, but well, I
failed.

[4] Of course, it’s usually not our style to just tape over
problems instead of fixing them, but this is such a nice occasion to
show off yet another cool systemd feature…

systemd for Administrators, Part VII

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/blame-game.html

Here’s yet another installment of my ongoing
series
on
systemd
for
Administrators:

The Blame Game

Fedora 15[1] is the first Fedora release to sport systemd. Our
primary goal for F15 was to get everything integrated and working
well. One focus for Fedora 16 will be to further polish and speed up
what we have in the distribution now. To prepare for this cycle we
have implemented a few tools (which are already available in F15),
which can help us pinpoint where exactly the biggest problems in our
boot-up remain. With this blog story I hope to shed some light on how
to figure out what to blame for your slow boot-up, and what to do
about it. We want to allow you to put the blame where the blame
belongs: on the system component responsible.

The first utility is a very simple one: systemd will automatically
write a log message with the time it needed to syslog/kmsg when it
finished booting up.

systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.

And here’s how you read this: 2s have been spent for kernel
initialization, until the time where the initial RAM disk (initrd,
i.e. dracut) was started. A bit less than 3s have then been spent in
the initrd. Finally, a bit less than 12s have been spent after the
actual system init daemon (systemd) has been invoked by the initrd to
bring up userspace. Summing this up the time that passed since the
boot loader jumped into the kernel code until systemd was finished
doing everything it needed to do at boot was a bit less than 17s. This
number is nice and simple to understand — and also easy to
misunderstand: it does not include the time that is spent initializing
your GNOME session, as that is outside of the scope of the init
system. Also, in many cases this is just where systemd finished doing
everything it needed to do. Very likely some daemons are still busy
doing whatever they need to do to finish startup when this time
is elapsed. Hence: while the time logged here is a good indication on
the general boot speed, it is not the time the user might feel
the boot actually takes.

Also, it is a pretty superficial value: it gives no insight which
system component systemd was waiting for all the time. To break this
up, we introduced the tool systemd-analyze blame:

$ systemd-analyze blame
  6207ms udev-settle.service
  5228ms [email protected]\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service
   735ms NetworkManager.service
   642ms avahi-daemon.service
   600ms abrtd.service
   517ms rtkit-daemon.service
   478ms fedora-storage-init.service
   396ms dbus.service
   390ms rpcidmapd.service
   346ms systemd-tmpfiles-setup.service
   322ms fedora-sysinit-unhack.service
   316ms cups.service
   310ms console-kit-log-system-start.service
   309ms libvirtd.service
   303ms rpcbind.service
   298ms ksmtuned.service
   288ms lvm2-monitor.service
   281ms rpcgssd.service
   277ms sshd.service
   276ms livesys.service
   267ms iscsid.service
   236ms mdmonitor.service
   234ms nfslock.service
   223ms ksm.service
   218ms mcelog.service
...

This tool lists which systemd unit needed how much time to finish
initialization at boot, the worst offenders listed first. What we can
see here is that on this boot two services required more than 1s of
boot time: udev-settle.service and
[email protected]\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service. This
tool’s output is easily misunderstood as well, it does not shed any
light on why the services in question actually need this much time, it
just determines that they did. Also note that the times listed here
might be spent “in parallel”, i.e. two services might be initializing
at the same time and thus the time spent to initialize them both is
much less than the sum of both individual times combined.

Let’s have a closer look at the worst offender on this boot: a
service by the name of udev-settle.service. So why does it
take that much time to initialize, and what can we do about it? This
service actually does very little: it just waits for the device
probing being done by udev to finish and then exits. Device probing
can be slow. In this instance for example, the reason for the device
probing to take more than 6s is the 3G modem built into the machine,
which when not having an inserted SIM card takes this long to respond
to software probe requests. The software probing is part of the logic
that makes ModemManager work and enables NetworkManager to offer easy
3G setup. An obvious reflex might now be to blame ModemManager for
having such a slow prober. But that’s actually ill-directed: hardware
probing quite frequently is this slow, and in the case of ModemManager
it’s a simple fact that the 3G hardware takes this long. It is an
essential requirement for a proper hardware probing solution that
individual probers can take this much time to finish probing. The
actual culprit is something else: the fact that we actually wait for
the probing, in other words: that udev-settle.service is part
of our boot process.

So, why is udev-settle.service part of our boot process?
Well, it actually doesn’t need to be. It is pulled in by the storage
setup logic of Fedora: to be precise, by the LVM, RAID and Multipath
setup script. These storage services have not been implemented in the
way hardware detection and probing work today: they expect to be
initialized at a point in time where “all devices have been probed”,
so that they can simply iterate through the list of available disks
and do their work on it. However, on modern machinery this is not how
things actually work: hardware can come and hardware can go all the
time, during boot and during runtime. For some technologies it is not
even possible to know when the device enumeration is complete
(example: USB, or iSCSI), thus waiting for all storage devices to show
up and be probed must necessarily include a fixed delay when it is
assumed that all devices that can show up have shown up, and got
probed. In this case all this shows very negatively in the boot time: the
storage scripts force us to delay bootup until all potential devices
have shown up and all devices that did got probed — and all that even
though we don’t actually need most devices for anything. In particular
since this machine actually does not make use of LVM, RAID or
Multipath![2]

Knowing what we know now we can go and disable
udev-settle.service for the next boots: since neither LVM,
RAID nor Multipath is used we can mask the services in question and
thus speed up our boot a little:

# ln -s /dev/null /etc/systemd/system/udev-settle.service
# ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service
# ln -s /dev/null /etc/systemd/system/fedora-storage-init.service
# systemctl daemon-reload

After restarting we can measure that the boot is now about 1s
faster. Why just 1s? Well, the second worst offender is cryptsetup
here: the machine in question has an encrypted
/home directory. For testing purposes I have stored the
passphrase in a file on disk, so that the boot-up is not delayed
because I as the user am a slow typer. The cryptsetup tool
unfortunately still takes more han 5s to set up the encrypted
partition. Being lazy instead of trying to fix
cryptsetup[3] we’ll just tape over it here [4]:
systemd will normally wait for all file systems not marked with the
noauto option in /etc/fstab to show up, to be fscked and to
be mounted before proceeding bootup and starting the usual system
services. In the case of /home (unlike for example
/var) we know that it is needed only very late (i.e. when the
user actually logs in). An easy fix is hence to make the mount point
available already during boot, but not actually wait until cryptsetup,
fsck and mount finished running for it. You ask how we can make a
mount point available before actually mounting the file system behind
it? Well, systemd possesses magic powers, in form of the
comment=systemd.automount mount option in
/etc/fstab. If you specify it, systemd will create an
automount point at /home and when at the time of the first
access to the file system it still isn’t backed by a proper file
system systemd will wait for the device, fsck and mount it.

And here’s the result with this change to /etc/fstab
made:

systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.

Nice! With a few fixes we took almost 7s off our boot-time. And
these two changes are only fixes for the two most superficial
problems. With a bit of love and detail work there’s a lot of
additional room for improvements. In fact, on a different machine, a
more than two year old X300 laptop (which even back then wasn’t the
fastest machine on earth) and a bit of decrufting we have boot times
of around 4s (total) now, with a resonably complete GNOME system. And there’s
still a lot of room in it.

systemd-analyze blame is a nice and simple tool for
tracking down slow services. However, it suffers by a big problem: it
does not visualize how the parallel execution of the services actually
diminishes the price one pays for slow starting services. For that we
have prepared systemd-analyize plot for you. Use it like
this:

$ systemd-analyze plot > plot.svg
$ eog plot.svg

It creates pretty graphs, showing the time services spent to start
up in relation to the other services. It currently doesn’t visualize
explicitly which services wait for which ones, but with a bit of guess
work this is easily seen nonetheless.

To see the effect of our two little optimizations here are two
graphs generated with systemd-analyze plot, the first before
and the other after our change:

Before After

(For the sake of completeness, here are the two complete outputs of
systemd-analyze blame for these two boots: before and after.)

The well-informed reader probably wonders how this relates to Michael Meeks’
bootchart
. This plot and bootchart do show similar graphs, that is
true. Bootchart is by far the more powerful tool. It plots in all
detail what is happening during the boot, how much CPU and IO is
used. systemd-analyze plot shows more high-level data: which
service took how much time to initialize, and what needed to wait for
it. If you use them both together you’ll have a wonderful toolset to
figure out why your boot is not as fast as it could be.

Now, before you now take these tools and start filing bugs against
the worst boot-up time offenders on your system: think twice. These
tools give you raw data, don’t misread it. As my optimization example
above hopefully shows, the blame for the slow bootup was not actually
with udev-settle.service, and not with the ModemManager
prober run by it either. It is with the subsystem that pulled this
service in in the first place. And that’s where the problem needs to
be fixed. So, file the bugs at the right places. Put the blame where
the blame belongs.

As mentioned, these three utilities are available on your Fedora 15
system out-of-the-box.

And here’s what to take home from this little blog story:

  • systemd-analyze is a wonderful tool and systemd comes
    with profiling built in.
  • Don’t misread the data these tools generate!
  • With two simple changes you might be able to speed up your system
    by 7s!
  • Fix your software if it can’t handle dynamic hardware
    properly!
  • The Fedora default of installing the OS on an enterprise-level
    storage managing system might be something to rethink.

And that’s all for now. Thank you for your interest.

Footnotes

[1] Also known as the greatest Free Software OS release
ever.

[2] The right fix here is to improve the services in
question to actively listen to hotplug events via libudev or similar
and act on the devices showing up as they show up, so that we can
continue with the bootup the instant everything we really need to go
on has shown up. To get a quick bootup we should wait for what we
actually need to proceed, not for everything. Also note that the
storage services are not the only services which do not cope well with
modern dynamic hardware, and assume that the device list is static and
stays unchanged. For example, in this example the reason the initrd is
actually as slow as it is is mostly due to the fact that Plymouth
expects to be executed when all video devices have shown up and have
been probed. For an unknown reason (at least unknown to me) loading
the video kernel modules for my Intel graphics cards takes multiple
seconds, and hence the entire boot is delayed unnecessarily. (Here too
I’d not put the blame on the probing but on the fact that we
wait for it to complete before going on.)

[3] Well, to be precise, I actually did try to get this
fixed. Most of the delay of crypsetup stems from the — in my eyes —
unnecessarily high default values for --iter-time in
cryptsetup. I tried to convince our cryptsetup maintainers that 100ms
as a default here are not really less secure than 1s, but well, I
failed.

[4] Of course, it’s usually not our style to just tape over
problems instead of fixing them, but this is such a nice occasion to
show off yet another cool systemd feature…

A Guide Through The Linux Sound API Jungle

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/guide-to-sound-apis.html

At the Audio MC at the Linux Plumbers Conference one
thing became very clear: it is very difficult for programmers to
figure out which audio API to use for which purpose and which API not
to use when doing audio programming on Linux. So here’s my try to
guide you through this jungle:

What do you want to do?

I want to write a media-player-like application!
Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)

I want to add event sounds to my application!
Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)

I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!
Use JACK and/or the full ALSA interface.

I want to do basic PCM audio playback/capturing!
Use the safe ALSA subset.

I want to add sound to my game!
Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.

I want to write a mixer application!
Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.

I want to write audio software for the plumbing layer!
Use the full ALSA stack.

I want to write audio software for embedded applications!
For technical appliances usually the safe ALSA subset is a good choice, this however depends highly on your use-case.

You want to know more about the different sound APIs?

GStreamer
GStreamer is the de-facto
standard media streaming system for Linux desktops. It supports decoding and
encoding of audio and video streams. You can use it for a wide range of
purposes from simple audio file playback to elaborate network
streaming setups. GStreamer supports a wide range of CODECs and audio
backends. GStreamer is not particularly suited for basic PCM playback
or low-latency/realtime applications. GStreamer is portable and not
limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [Programming Manuals and References]

libcanberra
libcanberra
is an abstract event sound API. It implements the XDG
Sound Theme and Naming Specifications
. libcanberra is a blessed
GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be
used with other desktop environments as well. In addition to an easy
interface for playing sound files, libcanberra provides caching
(which is very useful for networked thin clients) and allows passing
of various meta data to the underlying audio system which then can be
used to enhance user experience (such as positional event sounds) and
for improving accessibility. libcanberra supports multiple backends
and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [API Reference]

JACK

JACK is a sound system for
connecting professional audio production applications and hardware
output. It’s focus is low-latency and application interconnection. It
is not useful for normal desktop or embedded use. It is not an API
that is particularly useful if all you want to do is simple PCM
playback. JACK supports multiple backends, although ALSA is best
supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [API Reference]

Full ALSA

ALSA is the Linux API
for doing PCM playback and recording. ALSA is very focused on
hardware devices, although other backends are supported as well (to a
limit degree, see below). ALSA as a name is used both for the Linux
audio kernel drivers and a user-space library that wraps these. ALSA — the library — is
comprehensive, and portable (to a limited degree). The full ALSA API
can appear very complex and is large. However it supports almost
everything modern sound hardware can provide. Some of the
functionality of the ALSA API is limited in its use to actual hardware
devices supported by the Linux kernel (in contrast to software sound
servers and sound drivers implemented in user-space such as those for
Bluetooth and FireWire audio — among others) and Linux specific
drivers. [API
Reference
]

Safe ALSA

Only a subset of the full ALSA API works on all backends ALSA
supports. It is highly recommended to stick to this safe subset
if you do ALSA programming to keep programs portable, future-proof and
compatible with sound servers, Bluetooth audio and FireWire audio. See
below for more details about which functions of ALSA are considered
safe. The safe ALSA API is a suitable abstraction for basic,
portable PCM playback and recording — not just for ALSA kernel driver
supported devices. Among the supported backends are ALSA kernel driver
devices, OSS, PulseAudio, JACK.

Phonon and KNotify

Phonon is high-level
abstraction for media streaming systems such as GStreamer, but goes a
bit further than that. It supports multiple backends. KNotify is a
system for “notifications”, which goes beyond mere event
sounds. However it does not support the XDG Sound Theming/Naming
Specifications at this point, and also doesn’t support caching or
passing of event meta-data to an underlying sound system. KNotify
supports multiple backends for audio playback via Phonon. Both APIs
are KDE/Qt specific and should not be used outside of KDE/Qt
applications. [Phonon API Reference] [KNotify API Reference]

SDL

SDL is a portable API
primarily used for full-screen game development. Among other stuff it
includes a portable audio interface. Among others SDL support OSS,
PulseAudio, ALSA as backends. [API Reference]

PulseAudio

PulseAudio is a sound system
for Linux desktops and embedded environments that runs in user-space
and (usually) on top of ALSA. PulseAudio supports network
transparency, per-application volumes, spatial events sounds, allows
switching of sound streams between devices on-the-fly, policy
decisions, and many other high-level operations. PulseAudio adds a glitch-free
audio playback model to the Linux audio stack. PulseAudio is not
useful in professional audio production environments. PulseAudio is
portable beyond Linux. PulseAudio has a native API and also supports
the safe subset of ALSA, in addition to limited,
LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports
OSS and ALSA as backends and provides connectivity to JACK. [API
Reference
]

OSS

The Open Sound System is a
low-level PCM API supported by a variety of Unixes including Linux. It
started out as the standard Linux audio system and is supported on
current Linux kernels in the API version 3 as OSS3. OSS3 is considered
obsolete and has been fully replaced by ALSA. A successor to OSS3
called OSS4 is available but plays virtually no role on Linux and is
not supported in standard kernels or by any of the relevant
distributions. The OSS API is very low-level, based around direct
kernel interfacing using ioctl()s. It it is hence awkward to use and
can practically not be virtualized for usage on non-kernel audio
systems like sound servers (such as PulseAudio) or user-space sound
drivers (such as Bluetooth or FireWire audio). OSS3’s timing model
cannot properly be mapped to software sound servers at all, and is
also problematic on non-PCI hardware such as USB audio. Also, OSS does
not do sample type conversion, remapping or resampling if
necessary. This means that clients that properly want to support OSS
need to include a complete set of converters/remappers/resamplers for
the case when the hardware does not natively support the requested
sampling parameters. With modern sound cards it is very common to
support only S32LE samples at 48KHz and nothing else. If an OSS client
assumes it can always play back S16LE samples at 44.1KHz it will thus
fail. OSS3 is portable to other Unix-like systems, various differences
however apply. OSS also doesn’t support surround sound and other
functionality of modern sounds systems properly. OSS should be
considered obsolete and not be used in new applications. ALSA and
PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [Programming Guide]

All sound systems and APIs listed above are supported in all
relevant current distributions. For libcanberra support the newest
development release of your distribution might be necessary.

All sound systems and APIs listed above are suitable for
development for commercial (read: closed source) applications, since
they are licensed under LGPL or more liberal licenses or no client
library is involved.

You want to know why and when you should use a specific sound API?

GStreamer

GStreamer is best used for very high-level needs: i.e. you want to
play an audio file or video stream and do not care about all the tiny
details down to the PCM or codec level.

libcanberra

libcanberra is best used when adding sound feedback to user input
in UIs. It can also be used to play simple sound files for
notification purposes.

JACK

JACK is best used in professional audio production and where interconnecting applications is required.

Full ALSA

The full ALSA interface is best used for software on “plumbing layer” or when you want to make use of very specific hardware features, which might be need for audio production purposes.

Safe ALSA

The safe ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.

Phonon and KNotify

Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.

SDL

SDL is best used in full-screen games.

PulseAudio

For now, the PulseAudio API should be used only for applications
that want to expose sound-server-specific functionality (such as
mixers) or when a PCM output abstraction layer is already available in
your application and it thus makes sense to add an additional backend
to it for PulseAudio to keep the stack of audio layers minimal.

OSS

OSS should not be used for new programs.

You want to know more about the safe ALSA subset?

Here’s a list of DOS and DONTS in the ALSA API if you care about
that you application stays future-proof and works fine with
non-hardware backends or backends for user-space sound drivers such as
Bluetooth and FireWire audio. Some of these recommendations apply for
people using the full ALSA API as well, since some functionality
should be considered obsolete for all cases.

If your application’s code does not follow these rules, you must have
a very good reason for that. Otherwise your code should simply be considered
broken!

DONTS:

Do not use “async handlers”, e.g. via
snd_async_add_pcm_handler() and friends. Asynchronous
handlers are implemented using POSIX signals, which is a very
questionable use of them, especially from libraries and plugins. Even
when you don’t want to limit yourself to the safe ALSA subset
it is highly recommended not to use this functionality. Read
this for a longer explanation why signals for audio IO are
evil.

Do not parse the ALSA configuration file yourself or with
any of the ALSA functions such as snd_config_xxx(). If you
need to enumerate audio devices use snd_device_name_hint()
(and related functions). That
is the only API that also supports enumerating non-hardware audio
devices and audio devices with drivers implemented in userspace.

Do not parse any of the files from
/proc/asound/. Those files only include information about
kernel sound drivers — user-space plugins are not listed there. Also,
the set of kernel devices might differ from the way they are presented
in user-space. (i.e. sub-devices are mapped in different ways to
actual user-space devices such as surround51 an suchlike.

Do not rely on stable device indexes from ALSA. Nowadays
they depend on the initialization order of the drivers during boot-up
time and are thus not stable.

Do not use the snd_card_xxx() APIs. For
enumerating use snd_device_name_hint() (and related
functions). snd_card_xxx() is obsolete. It will only list
kernel hardware devices. User-space devices such as sound servers,
Bluetooth audio are not included. snd_card_load() is
completely obsolete in these days.

Do not hard-code device strings, especially not
hw:0 or plughw:0 or even dmix — these devices define no channel
mapping and are mapped to raw kernel devices. It is highly recommended
to use exclusively default as device string. If specific
channel mappings are required the correct device strings should be
front for stereo, surround40 for Surround 4.0,
surround41, surround51, and so on. Unfortunately at
this point ALSA does not define standard device names with channel
mappings for non-kernel devices. This means default may only
be used safely for mono and stereo streams. You should probably prefix
your device string with plug: to make sure ALSA transparently
reformats/remaps/resamples your PCM stream for you if the
hardware/backend does not support your sampling parameters
natively.

Do not assume that any particular sample type is supported
except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE,
FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.

Do not use snd_pcm_avail_update() for
synchronization purposes. It should be used exclusively to query the
amount of bytes that may be written/read right now. Do not use
snd_pcm_delay() to query the fill level of your playback
buffer. It should be used exclusively for synchronisation
purposes. Make sure you fully understand the difference, and note that
the two functions return values that are not necessarily directly
connected!

Do not assume that the mixer controls always know dB information.

Do not assume that all devices support MMAP style buffer access.

Do not assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.

Do not try to recover with your own code from ALSA error conditions such as buffer under-runs. Use snd_pcm_recover() instead.

Do not touch buffering/period metrics unless you have
specific latency needs. Develop defensively, handling correctly the
case when the backend cannot fulfill your buffering metrics
requests. Be aware that the buffering metrics of the playback buffer
only indirectly influence the overall latency in many
cases. i.e. setting the buffer size to a fixed value might actually result in
practical latencies that are much higher.

Do not assume that snd_pcm_rewind() is available and works and to which degree.

Do not assume that the time when a PCM stream can receive
new data is strictly dependant on the sampling and buffering
parameters and the resulting average throughput. Always make sure to
supply new audio data to the device when it asks for it by signalling
“writability” on the fd. (And similarly for capturing)

Do not use the “simple” interface snd_spcm_xxx().

Do not use any of the functions marked as “obsolete”.

Do not use the timer, midi, rawmidi, hwdep subsystems.

DOS:

Use snd_device_name_hint() for enumerating audio devices.

Use snd_smixer_xx() instead of raw snd_ctl_xxx()

For synchronization purposes use snd_pcm_delay().

For checking buffer playback/capture fill level use snd_pcm_update_avail().

Use snd_pcm_recover() to recover from errors returned by any of the ALSA functions.

If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use snd_pcm_rewind() if you need to react to user input quickly.

FAQ

What about ESD and NAS?

ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.

ALSA isn’t portable!

That’s not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.

Portability is key to me! What can I do?

Unfortunately no truly portable (i.e. to Win32) PCM API is
available right now that I could truly recommend. The systems shown
above are more or less portable at least to Unix-like operating
systems. That does not mean however that there are suitable backends
for all of them available. If you care about portability to Win32 and
MacOS you probably have to find a solution outside of the
recommendations above, or contribute the necessary
backends/portability fixes. None of the systems (with the exception of
OSS) is truly bound to Linux or Unix-like kernels.

What about PortAudio?

I don’t think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it’s your choice.

Oh, why do you hate OSS4 so much?

I don’t hate anything or anyone. I just don’t think OSS4 is a
serious option, especially not on Linux. On Linux, it is also
completely redundant due to ALSA.

You idiot, you have no clue!

You are right, I totally don’t. But that doesn’t hinder me from recommending things. Ha!

Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!

Sorry, that’s not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.

Final Words

Of course these recommendations are very basic and are only intended to
lead into the right direction. For each use-case different necessities
apply and hence options that I did not consider here might become
viable. It’s up to you to decide how much of what I wrote here
actually applies to your application.

This summary only includes software systems that are considered
stable and universally available at the time of writing. In the
future I hope to introduce a more suitable and portable replacement
for the safe ALSA subset of functions. I plan to update this text
from time to time to keep things up-to-date.

If you feel that I forgot a use case or an important API, then
please contact me or leave a comment. However, I think the summary
above is sufficiently comprehensive and if an entry is missing I most
likely deliberately left it out.

(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)

Oh, and please syndicate this, digg it. I’d like to see this guide to be well-known all around the Linux community. Thank you!

A Guide Through The Linux Sound API Jungle

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/guide-to-sound-apis.html

At the Audio MC at the Linux Plumbers Conference one
thing became very clear: it is very difficult for programmers to
figure out which audio API to use for which purpose and which API not
to use when doing audio programming on Linux. So here’s my try to
guide you through this jungle:

What do you want to do?

I want to write a media-player-like application!
Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)
I want to add event sounds to my application!
Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)
I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!
Use JACK and/or the full ALSA interface.
I want to do basic PCM audio playback/capturing!
Use the safe ALSA subset.
I want to add sound to my game!
Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.
I want to write a mixer application!
Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.
I want to write audio software for the plumbing layer!
Use the full ALSA stack.
I want to write audio software for embedded applications!
For technical appliances usually the safe ALSA subset is a good choice, this however depends highly on your use-case.

You want to know more about the different sound APIs?

GStreamer
GStreamer is the de-facto
standard media streaming system for Linux desktops. It supports decoding and
encoding of audio and video streams. You can use it for a wide range of
purposes from simple audio file playback to elaborate network
streaming setups. GStreamer supports a wide range of CODECs and audio
backends. GStreamer is not particularly suited for basic PCM playback
or low-latency/realtime applications. GStreamer is portable and not
limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [Programming Manuals and References]
libcanberra
libcanberra
is an abstract event sound API. It implements the XDG
Sound Theme and Naming Specifications
. libcanberra is a blessed
GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be
used with other desktop environments as well. In addition to an easy
interface for playing sound files, libcanberra provides caching
(which is very useful for networked thin clients) and allows passing
of various meta data to the underlying audio system which then can be
used to enhance user experience (such as positional event sounds) and
for improving accessibility. libcanberra supports multiple backends
and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [API Reference]
JACK
JACK is a sound system for
connecting professional audio production applications and hardware
output. It’s focus is low-latency and application interconnection. It
is not useful for normal desktop or embedded use. It is not an API
that is particularly useful if all you want to do is simple PCM
playback. JACK supports multiple backends, although ALSA is best
supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [API Reference]
Full ALSA
ALSA is the Linux API
for doing PCM playback and recording. ALSA is very focused on
hardware devices, although other backends are supported as well (to a
limit degree, see below). ALSA as a name is used both for the Linux
audio kernel drivers and a user-space library that wraps these. ALSA — the library — is
comprehensive, and portable (to a limited degree). The full ALSA API
can appear very complex and is large. However it supports almost
everything modern sound hardware can provide. Some of the
functionality of the ALSA API is limited in its use to actual hardware
devices supported by the Linux kernel (in contrast to software sound
servers and sound drivers implemented in user-space such as those for
Bluetooth and FireWire audio — among others) and Linux specific
drivers. [API
Reference
]
Safe ALSA
Only a subset of the full ALSA API works on all backends ALSA
supports. It is highly recommended to stick to this safe subset
if you do ALSA programming to keep programs portable, future-proof and
compatible with sound servers, Bluetooth audio and FireWire audio. See
below for more details about which functions of ALSA are considered
safe. The safe ALSA API is a suitable abstraction for basic,
portable PCM playback and recording — not just for ALSA kernel driver
supported devices. Among the supported backends are ALSA kernel driver
devices, OSS, PulseAudio, JACK.
Phonon and KNotify
Phonon is high-level
abstraction for media streaming systems such as GStreamer, but goes a
bit further than that. It supports multiple backends. KNotify is a
system for “notifications”, which goes beyond mere event
sounds. However it does not support the XDG Sound Theming/Naming
Specifications at this point, and also doesn’t support caching or
passing of event meta-data to an underlying sound system. KNotify
supports multiple backends for audio playback via Phonon. Both APIs
are KDE/Qt specific and should not be used outside of KDE/Qt
applications. [Phonon API Reference] [KNotify API Reference]
SDL
SDL is a portable API
primarily used for full-screen game development. Among other stuff it
includes a portable audio interface. Among others SDL support OSS,
PulseAudio, ALSA as backends. [API Reference]
PulseAudio
PulseAudio is a sound system
for Linux desktops and embedded environments that runs in user-space
and (usually) on top of ALSA. PulseAudio supports network
transparency, per-application volumes, spatial events sounds, allows
switching of sound streams between devices on-the-fly, policy
decisions, and many other high-level operations. PulseAudio adds a glitch-free
audio playback model to the Linux audio stack. PulseAudio is not
useful in professional audio production environments. PulseAudio is
portable beyond Linux. PulseAudio has a native API and also supports
the safe subset of ALSA, in addition to limited,
LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports
OSS and ALSA as backends and provides connectivity to JACK. [API
Reference
]
OSS
The Open Sound System is a
low-level PCM API supported by a variety of Unixes including Linux. It
started out as the standard Linux audio system and is supported on
current Linux kernels in the API version 3 as OSS3. OSS3 is considered
obsolete and has been fully replaced by ALSA. A successor to OSS3
called OSS4 is available but plays virtually no role on Linux and is
not supported in standard kernels or by any of the relevant
distributions. The OSS API is very low-level, based around direct
kernel interfacing using ioctl()s. It it is hence awkward to use and
can practically not be virtualized for usage on non-kernel audio
systems like sound servers (such as PulseAudio) or user-space sound
drivers (such as Bluetooth or FireWire audio). OSS3’s timing model
cannot properly be mapped to software sound servers at all, and is
also problematic on non-PCI hardware such as USB audio. Also, OSS does
not do sample type conversion, remapping or resampling if
necessary. This means that clients that properly want to support OSS
need to include a complete set of converters/remappers/resamplers for
the case when the hardware does not natively support the requested
sampling parameters. With modern sound cards it is very common to
support only S32LE samples at 48KHz and nothing else. If an OSS client
assumes it can always play back S16LE samples at 44.1KHz it will thus
fail. OSS3 is portable to other Unix-like systems, various differences
however apply. OSS also doesn’t support surround sound and other
functionality of modern sounds systems properly. OSS should be
considered obsolete and not be used in new applications.
ALSA and
PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [Programming Guide]

All sound systems and APIs listed above are supported in all
relevant current distributions. For libcanberra support the newest
development release of your distribution might be necessary.

All sound systems and APIs listed above are suitable for
development for commercial (read: closed source) applications, since
they are licensed under LGPL or more liberal licenses or no client
library is involved.

You want to know why and when you should use a specific sound API?

GStreamer
GStreamer is best used for very high-level needs: i.e. you want to
play an audio file or video stream and do not care about all the tiny
details down to the PCM or codec level.
libcanberra
libcanberra is best used when adding sound feedback to user input
in UIs. It can also be used to play simple sound files for
notification purposes.
JACK
JACK is best used in professional audio production and where interconnecting applications is required.
Full ALSA
The full ALSA interface is best used for software on “plumbing layer” or when you want to make use of very specific hardware features, which might be need for audio production purposes.
Safe ALSA
The safe ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.
Phonon and KNotify
Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.
SDL
SDL is best used in full-screen games.
PulseAudio
For now, the PulseAudio API should be used only for applications
that want to expose sound-server-specific functionality (such as
mixers) or when a PCM output abstraction layer is already available in
your application and it thus makes sense to add an additional backend
to it for PulseAudio to keep the stack of audio layers minimal.
OSS
OSS should not be used for new programs.

You want to know more about the safe ALSA subset?

Here’s a list of DOS and DONTS in the ALSA API if you care about
that you application stays future-proof and works fine with
non-hardware backends or backends for user-space sound drivers such as
Bluetooth and FireWire audio. Some of these recommendations apply for
people using the full ALSA API as well, since some functionality
should be considered obsolete for all cases.

If your application’s code does not follow these rules, you must have
a very good reason for that. Otherwise your code should simply be considered
broken!

DONTS:

  • Do not use “async handlers”, e.g. via
    snd_async_add_pcm_handler() and friends. Asynchronous
    handlers are implemented using POSIX signals, which is a very
    questionable use of them, especially from libraries and plugins. Even
    when you don’t want to limit yourself to the safe ALSA subset
    it is highly recommended not to use this functionality. Read
    this for a longer explanation why signals for audio IO are
    evil.
  • Do not parse the ALSA configuration file yourself or with
    any of the ALSA functions such as snd_config_xxx(). If you
    need to enumerate audio devices use snd_device_name_hint()
    (and related functions). That
    is the only API that also supports enumerating non-hardware audio
    devices and audio devices with drivers implemented in userspace.
  • Do not parse any of the files from
    /proc/asound/. Those files only include information about
    kernel sound drivers — user-space plugins are not listed there. Also,
    the set of kernel devices might differ from the way they are presented
    in user-space. (i.e. sub-devices are mapped in different ways to
    actual user-space devices such as surround51 an suchlike.
  • Do not rely on stable device indexes from ALSA. Nowadays
    they depend on the initialization order of the drivers during boot-up
    time and are thus not stable.
  • Do not use the snd_card_xxx() APIs. For
    enumerating use snd_device_name_hint() (and related
    functions). snd_card_xxx() is obsolete. It will only list
    kernel hardware devices. User-space devices such as sound servers,
    Bluetooth audio are not included. snd_card_load() is
    completely obsolete in these days.
  • Do not hard-code device strings, especially not
    hw:0 or plughw:0 or even dmix — these devices define no channel
    mapping and are mapped to raw kernel devices. It is highly recommended
    to use exclusively default as device string. If specific
    channel mappings are required the correct device strings should be
    front for stereo, surround40 for Surround 4.0,
    surround41, surround51, and so on. Unfortunately at
    this point ALSA does not define standard device names with channel
    mappings for non-kernel devices. This means default may only
    be used safely for mono and stereo streams. You should probably prefix
    your device string with plug: to make sure ALSA transparently
    reformats/remaps/resamples your PCM stream for you if the
    hardware/backend does not support your sampling parameters
    natively.
  • Do not assume that any particular sample type is supported
    except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE,
    FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.
  • Do not use snd_pcm_avail_update() for
    synchronization purposes. It should be used exclusively to query the
    amount of bytes that may be written/read right now. Do not use
    snd_pcm_delay() to query the fill level of your playback
    buffer. It should be used exclusively for synchronisation
    purposes. Make sure you fully understand the difference, and note that
    the two functions return values that are not necessarily directly
    connected!
  • Do not assume that the mixer controls always know dB information.
  • Do not assume that all devices support MMAP style buffer access.
  • Do not assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.
  • Do not try to recover with your own code from ALSA error conditions such as buffer under-runs. Use snd_pcm_recover() instead.
  • Do not touch buffering/period metrics unless you have
    specific latency needs. Develop defensively, handling correctly the
    case when the backend cannot fulfill your buffering metrics
    requests. Be aware that the buffering metrics of the playback buffer
    only indirectly influence the overall latency in many
    cases. i.e. setting the buffer size to a fixed value might actually result in
    practical latencies that are much higher.
  • Do not assume that snd_pcm_rewind() is available and works and to which degree.
  • Do not assume that the time when a PCM stream can receive
    new data is strictly dependant on the sampling and buffering
    parameters and the resulting average throughput. Always make sure to
    supply new audio data to the device when it asks for it by signalling
    “writability” on the fd. (And similarly for capturing)
  • Do not use the “simple” interface snd_spcm_xxx().
  • Do not use any of the functions marked as “obsolete”.
  • Do not use the timer, midi, rawmidi, hwdep subsystems.

DOS:

  • Use snd_device_name_hint() for enumerating audio devices.
  • Use snd_smixer_xx() instead of raw snd_ctl_xxx()
  • For synchronization purposes use snd_pcm_delay().
  • For checking buffer playback/capture fill level use snd_pcm_update_avail().
  • Use snd_pcm_recover() to recover from errors returned by any of the ALSA functions.
  • If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use snd_pcm_rewind() if you need to react to user input quickly.

FAQ

What about ESD and NAS?
ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.
ALSA isn’t portable!
That’s not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.
Portability is key to me! What can I do?
Unfortunately no truly portable (i.e. to Win32) PCM API is
available right now that I could truly recommend. The systems shown
above are more or less portable at least to Unix-like operating
systems. That does not mean however that there are suitable backends
for all of them available. If you care about portability to Win32 and
MacOS you probably have to find a solution outside of the
recommendations above, or contribute the necessary
backends/portability fixes. None of the systems (with the exception of
OSS) is truly bound to Linux or Unix-like kernels.
What about PortAudio?
I don’t think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it’s your choice.
Oh, why do you hate OSS4 so much?
I don’t hate anything or anyone. I just don’t think OSS4 is a
serious option, especially not on Linux. On Linux, it is also
completely redundant due to ALSA.
You idiot, you have no clue!
You are right, I totally don’t. But that doesn’t hinder me from recommending things. Ha!
Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!
Sorry, that’s not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.

Final Words

Of course these recommendations are very basic and are only intended to
lead into the right direction. For each use-case different necessities
apply and hence options that I did not consider here might become
viable. It’s up to you to decide how much of what I wrote here
actually applies to your application.

This summary only includes software systems that are considered
stable and universally available at the time of writing. In the
future I hope to introduce a more suitable and portable replacement
for the safe ALSA subset of functions. I plan to update this text
from time to time to keep things up-to-date.

If you feel that I forgot a use case or an important API, then
please contact me or leave a comment. However, I think the summary
above is sufficiently comprehensive and if an entry is missing I most
likely deliberately left it out.

(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)

Oh, and please syndicate this, digg it. I’d like to see this guide to be well-known all around the Linux community. Thank you!