Tag Archives: Projects

What Are We Breaking Now?

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/brno.html

End of February devconf.cz
took place in Brno, Czech Republic. At the conference Kay Sievers,
Harald Hoyer and I did two presentations about our work on systemd
and about the systemd Journal. These talks were taped and the
recordings are now available online.

First, here’s our talk about What Are We
Breaking Now?
, in which we try to give an overview on what we
are working on currently in the systemd context, and what we expect to
do in the next few months. We cover Predictable Network Interface
Names
, the Boot
Loader Spec
, kdbus, the Apps framework, and more.

And then, I did my second talk about The systemd
Journal
, with a focus on how to practically make use of
journalctl, as a day-to-day tool for administrators (these practical
bits start around 28:40). The commands demoed here are all explained in an earlier blog story of
mine
.

Unfortunately, the audience questions are sometimes hard or
impossible to understand from the videos, and sometimes the text on
the slides is hard to read, but I still believe that the two talks are
quite interesting.

systemd Hackfest!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/hackfest.html

Hey, you, systemd hacker, Fedora hacker! Listen up! This Thu/Fri is the systemd
Hackfest
in Brno/Czech Rep, right before devconf.cz! On thursday we’ll talk about
(and hack on) all things systemd. And the hackfest friday is going to be a Fedora Activity Day,
so we’ll have a focus on systemd integration into Fedora.

You are invited!

See you in Brno!

The Biggest Myths

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/the-biggest-myths.html

Since we first proposed systemd
for inclusion in the distributions it has been frequently discussed in
many forums, mailing lists and conferences. In these discussions one
can often hear certain myths about systemd, that are repeated over and
over again, but certainly don’t gain any truth by constant
repetition. Let’s take the time to debunk a few of them:

  1. Myth: systemd is monolithic.

    If you build systemd with all configuration options enabled you
    will build 69 individual binaries. These binaries all serve different
    tasks, and are neatly separated for a number of reasons. For example,
    we designed systemd with security in mind, hence most daemons run at
    minimal privileges (using kernel capabilities, for example) and are
    responsible for very specific tasks only, to minimize their security
    surface and impact. Also, systemd parallelizes the boot more than any
    prior solution. This parallization happens by running more processes
    in parallel. Thus it is essential that systemd is nicely split up into
    many binaries and thus processes. In fact, many of these
    binaries[1] are separated out so nicely, that they are very
    useful outside of systemd, too.

    A package involving 69 individual binaries can hardly be called
    monolithic. What is different from prior solutions however,
    is that we ship more components in a single tarball, and maintain them
    upstream in a single repository with a unified release cycle.

  2. Myth: systemd is about speed.

    Yes, systemd is fast (A
    pretty complete userspace boot-up in ~900ms, anyone?
    ), but that’s
    primarily just a side-effect of doing things right. In fact, we
    never really sat down and optimized the last tiny bit of performance
    out of systemd. Instead, we actually frequently knowingly picked the
    slightly slower code paths in order to keep the code more
    readable. This doesn’t mean being fast was irrelevant for us, but
    reducing systemd to its speed is certainly quite a misconception,
    since that is certainly not anywhere near the top of our list of
    goals.

  3. Myth: systemd’s fast boot-up is irrelevant for
    servers.

    That is just completely not true. Many administrators actually are
    keen on reduced downtimes during maintenance windows. In High
    Availability setups it’s kinda nice if the failed machine comes back
    up really fast. In cloud setups with a large number of VMs or
    containers the price of slow boots multiplies with the number of
    instances. Spending minutes of CPU and IO on really slow boots of
    hundreds of VMs or containers reduces your system’s density
    drastically, heck, it even costs you more energy. Slow boots can be
    quite financially expensive. Then, fast booting of containers allows
    you to implement a logic such as socket
    activated containers
    , allowing you to drastically increase the
    density of your cloud system.

    Of course, in many server setups boot-up is indeed irrelevant, but
    systemd is supposed to cover the whole range. And yes, I am aware
    that often it is the server firmware that costs the most time at
    boot-up, and the OS anyways fast compared to that, but well, systemd
    is still supposed to cover the whole range (see above…), and no,
    not all servers have such bad firmware, and certainly not VMs and
    containers, which are servers of a kind, too.[2]

  4. Myth: systemd is incompatible with shell scripts.

    This is entirely bogus. We just don’t use them for the boot
    process, because we believe they aren’t the best tool for that
    specific purpose, but that doesn’t mean systemd was incompatible with
    them. You can easily run shell scripts as systemd services, heck, you
    can run scripts written in any language as systemd services,
    systemd doesn’t care the slightest bit what’s inside your
    executable. Moreover, we heavily use shell scripts for our own
    purposes, for installing, building, testing systemd. And you can stick
    your scripts in the early boot process, use them for normal services,
    you can run them at latest shutdown, there are practically no
    limits.

  5. Myth: systemd is difficult.

    This also is entire non-sense. A systemd platform is actually much
    simpler than traditional Linuxes because it unifies
    system objects and their dependencies as systemd units. The
    configuration file language is very simple, and redundant
    configuration files we got rid of. We provide uniform tools for much
    of the configuration of the system. The system is much less
    conglomerate than traditional Linuxes are. We also have pretty
    comprehensive documentation (all linked
    from the homepage
    ) about pretty much every detail of systemd, and
    this not only covers admin/user-facing interfaces, but also developer
    APIs.

    systemd certainly comes with a learning curve. Everything
    does. However, we like to believe that it is actually simpler to
    understand systemd than a Shell-based boot for most people. Surprised
    we say that? Well, as it turns out, Shell is not a pretty language to
    learn, it’s syntax is arcane and complex. systemd unit files are
    substantially easier to understand, they do not expose a programming
    language, but are simple and declarative by nature. That all said, if
    you are experienced in shell, then yes, adopting systemd will take a
    bit of learning.

    To make learning easy we tried hard to provide the maximum
    compatibility to previous solutions. But not only that, on many
    distributions you’ll find that some of the traditional tools will now
    even tell you — while executing what you are asking for — how you
    could do it with the newer tools instead, in a possibly nicer way.

    Anyway, the take-away is probably that systemd is probably as
    simple as such a system can be, and that we try hard to make it easy
    to learn. But yes, if you know sysvinit then adopting systemd will
    require a bit learning, but quite frankly if you mastered sysvinit,
    then systemd should be easy for you.

  6. Myth: systemd is not modular.

    Not true at all. At compile time you have a number of
    configure switches to select what you want to build, and what
    not. And we
    document
    how you can select in even more detail what you need,
    going beyond our configure switches.

    This modularity is not totally unlike the one of the Linux kernel,
    where you can select many features individually at compile time. If the
    kernel is modular enough for you then systemd should be pretty close,
    too.

  7. Myth: systemd is only for desktops.

    That is certainly not true. With systemd we try to cover pretty
    much the same range as Linux itself does. While we care for desktop
    uses, we also care pretty much the same way for server uses, and
    embedded uses as well. You can bet that Red Hat wouldn’t make it a
    core piece of RHEL7 if it wasn’t the best option for managing services
    on servers.

    People from numerous companies work on systemd. Car manufactureres
    build it into cars, Red Hat uses it for a server operating system, and
    GNOME uses many of its interfaces for improving the desktop. You find
    it in toys, in space telescopes, and in wind turbines.

    Most features I most recently worked on are probably relevant
    primarily on servers, such as container
    support
    , resource
    management
    or the security
    features
    . We cover desktop systems pretty well already, and there
    are number of companies doing systemd development for embedded, some
    even offer consulting services in it.

  8. Myth: systemd was created as result of the NIH syndrome.

    This is not true. Before we began working on systemd we were
    pushing for Canonical’s Upstart to be widely adopted (and Fedora/RHEL
    used it too for a while). However, we eventually came to the
    conclusion that its design was inherently flawed at its core (at least
    in our eyes: most fundamentally, it leaves dependency management to
    the admin/developer, instead of solving this hard problem in code),
    and if something’s wrong in the core you better replace it, rather
    than fix it. This was hardly the only reason though, other things that
    came into play, such as the licensing/contribution agreement mess
    around it. NIH wasn’t one of the reasons, though…[3]

  9. Myth: systemd is a freedesktop.org project.

    Well, systemd is certainly hosted at fdo, but freedesktop.org is
    little else but a repository for code and documentation. Pretty much
    any coder can request a repository there and dump his stuff there (as
    long as it’s somewhat relevant for the infrastructure of free
    systems). There’s no cabal involved, no “standardization” scheme, no
    project vetting, nothing. It’s just a nice, free, reliable place to
    have your repository. In that regard it’s a bit like SourceForge,
    github, kernel.org, just not commercial and without over-the-top
    requirements, and hence a good place to keep our stuff.

    So yes, we host our stuff at fdo, but the implied assumption of
    this myth in that there was a group of people who meet and then agree
    on how the future free systems look like, is entirely bogus.

  10. Myth: systemd is not UNIX.

    There’s certainly some truth in that. systemd’s sources do not
    contain a single line of code originating from original UNIX. However,
    we derive inspiration from UNIX, and thus there’s a ton of UNIX in
    systemd. For example, the UNIX idea of “everything is a file” finds
    reflection in that in systemd all services are exposed at runtime in a
    kernel file system, the cgroupfs. Then, one of the original
    features of UNIX was multi-seat support, based on built-in terminal
    support. Text terminals are hardly the state of the art how you
    interface with your computer these days however. With systemd we
    brought native multi-seat
    support back, but this time with full support for today’s hardware,
    covering graphics, mice, audio, webcams and more, and all that fully
    automatic, hotplug-capable and without configuration. In fact the
    design of systemd as a suite of integrated tools that each have their
    individual purposes but when used together are more than just the sum
    of the parts, that’s pretty much at the core of UNIX philosophy. Then,
    the way our project is handled (i.e. maintaining much of the core OS
    in a single git repository) is much closer to the BSD model (which is
    a true UNIX, unlike Linux) of doing things (where most of the core OS
    is kept in a single CVS/SVN repository) than things on Linux ever
    were.

    Ultimately, UNIX is something different for everybody. For us
    systemd maintainers it is something we derive inspiration from. For
    others it is a religion, and much like the other world religions there
    are different readings and understandings of it. Some define UNIX
    based on specific pieces of code heritage, others see it just as a set
    of ideas, others as a set of commands or APIs, and even others as a
    definition of behaviours. Of course, it is impossible to ever make all
    these people happy.

    Ultimately the question whether something is UNIX or not matters
    very little. Being technically excellent is hardly exclusive to
    UNIX. For us, UNIX is a major influence (heck, the biggest one), but
    we also have other influences. Hence in some areas systemd will be
    very UNIXy, and in others a little bit less.

  11. Myth: systemd is complex.

    There’s certainly some truth in that. Modern computers are complex
    beasts, and the OS running on it will hence have to be complex
    too. However, systemd is certainly not more complex than prior
    implementations of the same components. Much rather, it’s simpler, and
    has less redundancy (see above). Moreover, building a simple OS based
    on systemd will involve much fewer packages than a traditional Linux
    did. Fewer packages makes it easier to build your system, gets rid of
    interdependencies and of much of the different behaviour of every
    component involved.

  12. Myth: systemd is bloated.

    Well, bloated certainly has many different definitions. But in
    most definitions systemd is probably the opposite of bloat. Since
    systemd components share a common code base, they tend to share much
    more code for common code paths. Here’s an example: in a traditional
    Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all
    implemented a scheme to execute processes with various configuration
    options in a certain, hopefully clean environment. On systemd the code
    paths for all of this, for the configuration parsing, as well as the
    actual execution is shared. This means less code, less place for
    mistakes, less memory and cache pressure, and is thus a very good
    thing. And as a side-effect you actually get a ton more functionality
    for it…

    As mentioned above, systemd is also pretty modular. You can choose
    at build time which components you need, and which you don’t
    need. People can hence specifically choose the level of “bloat” they
    want.

    When you build systemd, it only requires three dependencies: glibc,
    libcap and dbus. That’s it. It can make use of more dependencies, but
    these are entirely optional.

    So, yeah, whichever way you look at it, it’s really not
    bloated.

  13. Myth: systemd being Linux-only is not nice to the BSDs.

    Completely wrong. The BSD folks are pretty much uninterested in
    systemd. If systemd was portable, this would change nothing, they
    still wouldn’t adopt it. And the same is true for the other Unixes in
    the world. Solaris has SMF, BSD has their own “rc” system, and they
    always maintained it separately from Linux. The init system is very
    close to the core of the entire OS. And these other operating systems
    hence define themselves among other things by their core
    userspace. The assumption that they’d adopt our core userspace if we
    just made it portable, is completely without any foundation.

  14. Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.

    Debian supports non-Linux kernels in their distribution. systemd
    won’t run on those. Is that a problem though, and should that hinder
    them to adopt system as default? Not really. The folks who ported
    Debian to these other kernels were willing to invest time in a massive
    porting effort, they set up test and build systems, and patched and
    built numerous packages for their goal. The maintainance of both a
    systemd unit file and a classic init script for the packaged services
    is a negligable amount of work compared to that, especially since
    those scripts more often than not exist already.

  15. Myth: systemd could be ported to other kernels if its maintainers just wanted to.

    That is simply not true. Porting systemd to other kernel is not
    feasible. We just use too many Linux-specific interfaces. For a few
    one might find replacements on other kernels, some features one might
    want to turn off, but for most this is nor really possible. Here’s a
    small, very incomprehensive list: cgroups, fanotify, umount2(),
    /proc/self/mountinfo
    (including notification), /dev/swaps (same),
    udev, netlink,
    the structure of /sys, /proc/$PID/comm,
    /proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat,
    /proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs,
    capabilities, namespaces of all kinds, various prctl()s, numerous
    ioctls,
    the mount() system call and its semantics, selinux, audit,
    inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(),
    SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...

    And no, if you look at this list and pick out the few where you can
    think of obvious counterparts on other kernels, then think again, and
    look at the others you didn’t pick, and the complexity of replacing
    them.

  16. Myth: systemd is not portable for no reason.

    Non-sense! We use the Linux-specific functionality because we need
    it to implement what we want. Linux has so many features that
    UNIX/POSIX didn’t have, and we want to empower the user with
    them. These features are incredibly useful, but only if they are
    actually exposed in a friendly way to the user, and that’s what we do
    with systemd.

  17. Myth: systemd uses binary configuration files.

    No idea who came up with this crazy myth, but it’s absolutely not
    true. systemd is configured pretty much exclusively via simple text
    files. A few settings you can also alter with the kernel command line
    and via environment variables. There’s nothing binary in its
    configuration (not even XML). Just plain, simple, easy-to-read text
    files.

  18. Myth: systemd is a feature creep.

    Well, systemd certainly covers more ground that it used to. It’s
    not just an init system anymore, but the basic userspace building
    block to build an OS from, but we carefully make sure to keep most of
    the features optional. You can turn a lot off at compile time, and
    even more at runtime. Thus you can choose freely how much feature
    creeping you want.

  19. Myth: systemd forces you to do something.

    systemd is not the mafia. It’s Free Software, you can do with it
    whatever you want, and that includes not using it. That’s pretty much
    the opposite of “forcing”.

  20. Myth: systemd makes it impossible to run syslog.

    Not true, we carefully made sure when we introduced
    the journal
    that all data is also passed on to any syslog daemon
    running. In fact, if something changed, then only that syslog gets
    more complete data now than it got before, since we now cover early
    boot stuff as well as STDOUT/STDERR of any system service.

  21. Myth: systemd is incompatible.

    We try very hard to provide the best possible compatibility with
    sysvinit. In fact, the vast majority of init scripts should work just
    fine on systemd, unmodified. However, there actually are indeed a few
    incompatibilities, but we try to document
    these
    and explain what to do about them. Ultimately every system
    that is not actually sysvinit itself will have a certain amount of
    incompatibilities with it since it will not share the exect same code
    paths.

    It is our goal to ensure that differences between the various
    distributions are kept at a minimum. That means unit files usually
    work just fine on a different distribution than you wrote it on, which
    is a big improvement over classic init scripts which are very hard to
    write in a way that they run on multiple Linux distributions, due to
    numerous incompatibilities between them.

  22. Myth: systemd is not scriptable, because of its D-Bus use.

    Not true. Pretty much every single D-Bus interface systemd provides
    is also available in a command line tool, for example in systemctl,
    loginctl,
    timedatectl,
    hostnamectl,
    localectl
    and suchlike. You can easily call these tools from shell scripts, they
    open up pretty much the entire API from the command line with
    easy-to-use commands.

    That said, D-Bus actually has bindings for almost any scripting
    language this world knows. Even from the shell you can invoke
    arbitrary D-Bus methods with dbus-send
    or gdbus. If
    anything, this improves scriptability due to the good support of D-Bus
    in the various scripting languages.

  23. Myth: systemd requires you to use some arcane configuration
    tools instead of allowing you to edit your configuration files
    directly.

    Not true at all. We offer some configuration tools, and using them
    gets you a bit of additional functionality (for example, command line
    completion for all settings!), but there’s no need at all to use
    them. You can always edit the files in question directly if you wish,
    and that’s fully supported. Of course sometimes you need to explicitly
    reload configuration of some daemon after editing the configuration,
    but that’s pretty much true for most UNIX services.

  24. Myth: systemd is unstable and buggy.

    Certainly not according to our data. We have been monitoring the
    Fedora bug tracker (and some others) closely for a long long time. The
    number of bugs is very low for such a central component of the OS,
    especially if you discount the numerous RFE bugs we track for the
    project. We are pretty good in keeping systemd out of the list of
    blocker bugs of the distribution. We have a relatively fast
    development cycle with mostly incremental changes to keep quality and
    stability high.

  25. Myth: systemd is not debuggable.

    False. Some people try to imply that the shell was a good
    debugger. Well, it isn’t really. In systemd we provide you with actual
    debugging features instead. For example: interactive debugging,
    verbose tracing, the ability to mask any component during boot, and
    more. Also, we provide documentation
    for it
    .

    It’s certainly well debuggable, we needed that for our own
    development work, after all. But we’ll grant you one thing: it uses
    different debugging tools, we believe more appropriate ones for the
    purpose, though.

  26. Myth: systemd makes changes for the changes’ sake.

    Very much untrue. We pretty much exclusively have technical
    reasons for the changes we make, and we explain them in the various
    pieces of documentation, wiki pages, blog articles, mailing list
    announcements. We try hard to avoid making incompatible changes, and
    if we do we try to document the why and how in detail. And if you
    wonder about something, just ask us!

  27. Myth: systemd is a Red-Hat-only project, is private property
    of some smart-ass developers, who use it to push their views to the
    world.

    Not true. Currently, there are 16 hackers with commit powers to the
    systemd git tree. Of these 16 only six are employed by Red Hat. The 10
    others are folks from ArchLinux, from Debian, from Intel, even from
    Canonical, Mandriva, Pantheon and a number of community folks with
    full commit rights. And they frequently commit big stuff, major
    changes. Then, there are 374 individuals with patches in our tree, and
    they too came from a number of different companies and backgrounds,
    and many of those have way more than one patch in the tree. The
    discussions about where we want to take systemd are done in the open,
    on our IRC channel (#systemd on freenode, you are always
    weclome), on our mailing
    list
    , and on public hackfests (such
    as our next one in Brno
    , you are invited). We regularly attend
    various conferences, to collect feedback, to explain what we are doing
    and why, like few others do. We maintain blogs, engage in social
    networks (we actually
    have some pretty interesting content on Google+
    , and our Google+
    Community is pretty alive, too
    .), and try really hard to explain
    the why and the how how we do things, and to listen to feedback and
    figure out where the current issues are (for example, from that
    feedback we compiled this lists of often heard myths about
    systemd…).

    What most systemd contributors probably share is a rough idea how a
    good OS should look like, and the desire to make it happen. However,
    by the very nature of the project being Open Source, and rooted in the
    community systemd is just what people want it to be, and if it’s not
    what they want then they can drive the direction with patches and
    code, and if that’s not feasible, then there are numerous other
    options to use, too, systemd is never exclusive.

    One goal of systemd is to unify the dispersed Linux landscape a
    bit. We try to get rid of many of the more pointless differences of
    the various distributions in various areas of the core OS. As part of
    that we sometimes adopt schemes that were previously used by only one
    of the distributions and push it to a level where it’s the default of
    systemd, trying to gently push everybody towards the same set of basic
    configuration. This is never exclusive though, distributions can
    continue to deviate from that if they wish, however, if they end-up
    using the well-supported default their work becomes much easier and
    they might gain a feature or two. Now, as it turns out, more
    frequently than not we actually adopted schemes that where Debianisms,
    rather than Fedoraisms/Redhatisms as best supported scheme by
    systemd. For example, systems running systemd now generally store
    their hostname in /etc/hostname, something that used to be
    specific to Debian and now is used across distributions.

    One thing we’ll grant you though, we sometimes can be
    smart-asses. We try to be prepared whenever we open our mouth, in
    order to be able to back-up with facts what we claim. That might make
    us appear as smart-asses.

    But in general, yes, some of the more influental contributors of
    systemd work for Red Hat, but they are in the minority, and systemd is
    a healthy, open community with different interests, different
    backgrounds, just unified by a few rough ideas where the trip should
    go, a community where code and its design counts, and certainly not
    company affiliation.

  28. Myth: systemd doesn’t support /usr split from the root directory.

    Non-sense. Since its beginnings systemd supports the
    --with-rootprefix= option to its configure script
    which allows you to tell systemd to neatly split up the stuff needed
    for early boot and the stuff needed for later on. All this logic is
    fully present and we keep it up-to-date right there in systemd’s build
    system.

    Of course, we still don’t think that actually
    booting with /usr unavailable is a good idea
    , but we
    support this just fine in our build system. This won’t fix the
    inherent problems of the scheme that you’ll encounter all across the
    board, but you can’t blame that on systemd, because in systemd we
    support this just fine.

  29. Myth: systemd doesn’t allow your to replace its components.

    Not true, you can turn off and replace pretty much any part of
    systemd, with very few exceptions. And those exceptions (such as
    journald) generally allow you to run an alternative side by side to
    it, while cooperating nicely with it.

  30. Myth: systemd’s use of D-Bus instead of sockets makes it intransparent.

    This claim is already contradictory in itself: D-Bus uses sockets
    as transport, too. Hence whenever D-Bus is used to send something
    around, a socket is used for that too. D-Bus is mostly a standardized
    serialization of messages to send over these sockets. If anything this
    makes it more transparent, since this serialization is well
    documented, understood and there are numerous tracing tools and
    language bindings for it. This is very much unlike the usual
    homegrown protocols the various classic UNIX daemons use to
    communicate locally.

Hmm, did I write I just wanted to debunk a “few” myths? Maybe these
were more than just a few… Anyway, I hope I managed to clear up a
couple of misconceptions. Thanks for your time.

Footnotes

[1] For example, systemd-detect-virt,
systemd-tmpfiles,
systemd-udevd are.

[2] Also, we are trying to do our little part on maybe
making this better. By exposing boot-time performance of the firmware
more prominently in systemd’s boot output we hope to shame the
firmware writers to clean up their stuff.

[3] And anyways, guess which project includes a library “libnih” — Upstart or systemd?[4]

[4] Hint: it’s not systemd!

systemd for Administrators, Part XX

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/socket-activated-containers.html

This is no time for procrastination,
here
is
already the twentieth
installment
of

my ongoing series
on
systemd
for
Administrators:

Socket Activated Internet Services and OS Containers

Socket
Activation
is an important feature of systemd. When
we first
announced
systemd we already tried to make the point how great
socket activation is for increasing parallelization and robustness of
socket services, but also for simplifying the dependency logic of the
boot. In this episode I’d like to explain why socket activation is an
important tool for drastically improving how many services and even
containers you can run on a single system with the same resource
usage. Or in other words, how you can drive up the density of customer
sites on a system while spending less on new hardware.

Socket Activated Internet Services

First, let’s take a step back. What was socket activation again? —
Basically, socket activation simply means that systemd sets up
listening sockets (IP or otherwise) on behalf of your services
(without these running yet), and then starts (activates) the
services as soon as the first connection comes in. Depending on the
technology the services might idle for a while after having processed
the connection and possible follow-up connections before they exit on
their own, so that systemd will again listen on the sockets and
activate the services again the next time they are connected to. For
the client it is not visible whether the service it is interested in
is currently running or not. The service’s IP socket stays continously
connectable, no connection attempt ever fails, and all connects will
be processed promptly.

A setup like this lowers resource usage: as services are only
running when needed they only consume resources when required. Many
internet sites and services can benefit from that. For example, web
site hosters will have noticed that of the multitude of web sites that
are on the Internet only a tiny fraction gets a continous stream of
requests: the huge majority of web sites still needs to be available
all the time but gets requests only very unfrequently. With a scheme
like socket activation you take benefit of this. By hosting many of
these sites on a single system like this and only activating their
services as necessary allows a large degree of over-commit: you can
run more sites on your system than the available resources actually
allow. Of course, one shouldn’t over-commit too much to avoid
contention during peak times.

Socket activation like this is easy to use in systemd. Many modern
Internet daemons already support socket activation out of the box (and
for those which don’t yet it’s not
hard
to add). Together with systemd’s instantiated
units support
it is easy to write a pair of service and socket
templates that then may be instantiated multiple times, once for each
site. Then, (optionally) make use of some of the security
features
of systemd to nicely isolate the customer’s site’s
services from each other (think: each customer’s service should only
see the home directory of the customer, everybody else’s directories
should be invisible), and there you go: you now have a highly scalable
and reliable server system, that serves a maximum of securely
sandboxed services at a minimum of resources, and all nicely done with
built-in technology of your OS.

This kind of setup is already in production use in a number of
companies. For example, the great folks at Pantheon are running their
scalable instant Drupal system on a setup that is similar to this. (In
fact, Pantheon’s David Strauss pioneered this scheme. David, you
rock!)

Socket Activated OS Containers

All of the above can already be done with older versions of
systemd. If you use a distribution that is based on systemd, you can
right-away set up a system like the one explained above. But let’s
take this one step further. With systemd 197 (to be included in Fedora
19), we added support for socket activating not only individual
services, but entire OS containers. And I really have to say it
at this point: this is stuff I am really excited
about. 😉

Basically, with socket activated OS containers, the host’s systemd
instance will listen on a number of ports on behalf of a container,
for example one for SSH, one for web and one for the database, and as
soon as the first connection comes in, it will spawn the container
this is intended for, and pass to it all three sockets. Inside of the
container, another systemd is running and will accept the sockets and
then distribute them further, to the services running inside the
container using normal socket activation. The SSH, web and database
services will only see the inside of the container, even though they
have been activated by sockets that were originally created on the
host! Again, to the clients this all is not visible. That an entire OS
container is spawned, triggered by simple network connection is entirely
transparent to the client side.[1]

The OS containers may contain (as the name suggests) a full
operating system, that might even be a different distribution than is
running on the host. For example, you could run your host on Fedora,
but run a number of Debian containers inside of it. The OS containers
will have their own systemd init system, their own SSH instances,
their own process tree, and so on, but will share a number of other
facilities (such as memory management) with the host.

For now, only systemd’s own trivial container manager, systemd-nspawn
has been updated to support this kind of socket activation. We hope
that libvirt-lxc will
soon gain similar functionality. At this point, let’s see in more
detail how such a setup is configured in systemd using nspawn:

First, please use a tool such as debootstrap or yum’s
--installroot to set up a container OS
tree[2]. The details of that are a bit out-of-focus
for this story, there’s plenty of documentation around how to do
this. Of course, make sure you have systemd v197 installed inside
the container. For accessing the container from the command line,
consider using systemd-nspawn
itself. After you configured everything properly, try to boot it up
from the command line with systemd-nspawn’s -b switch.

Assuming you now have a working container that boots up fine, let’s
write a service file for it, to turn the container into a systemd
service on the host you can start and stop. Let’s create
/etc/systemd/system/mycontainer.service on the host:

[Unit]
Description=My little container

[Service]
ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3
KillMode=process

This service can already be started and stopped via systemctl
start
and systemctl stop. However, there’s no nice way
to actually get a shell prompt inside the container. So let’s add SSH
to it, and even more: let’s configure SSH so that a connection to the
container’s SSH port will socket-activate the entire container. First,
let’s begin with telling the host that it shall now listen on the SSH
port of the container. Let’s create
/etc/systemd/system/mycontainer.socket on the host:

[Unit]
Description=The SSH socket of my little container

[Socket]
ListenStream=23

If we start this unit with systemctl start on the host
then it will listen on port 23, and as soon as a connection comes in
it will activate our container service we defined above. We pick port
23 here, instead of the usual 22, as our host’s SSH is already
listening on that. nspawn virtualizes the process list and the file
system tree, but does not actually virtualize the network stack, hence
we just pick different ports for the host and the various containers
here.

Of course, the system inside the container doesn’t yet know what to
do with the socket it gets passed due to socket activation. If you’d
now try to connect to the port, the container would start-up but the
incoming connection would be immediately closed since the container
can’t handle it yet. Let’s fix that!

All that’s necessary for that is teach SSH inside the container
socket activation. For that let’s simply write a pair of socket and
service units for SSH. Let’s create
/etc/systemd/system/sshd.socket in the container:

[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=23
Accept=yes

Then, let’s add the matching SSH service file
/etc/systemd/system/[email protected] in the container:

[Unit]
Description=SSH Per-Connection Server for %I

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket

Then, make sure to hook sshd.socket into the
sockets.target so that unit is started automatically when the
container boots up:

ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/

And that’s it. If we now activate mycontainer.socket on
the host, the host’s systemd will bind the socket and we can connect
to it. If we do this, the host’s systemd will activate the container,
and pass the socket in to it. The container’s systemd will then take
the socket, match it up with sshd.socket inside the
container. As there’s still our incoming connection queued on it, it
will then immediately trigger an instance of [email protected],
and we’ll have our login.

And that’s already everything there is to it. You can easily add
additional sockets to listen on to
mycontainer.socket. Everything listed therein will be passed
to the container on activation, and will be matched up as good as
possible with all socket units configured inside the
container. Sockets that cannot be matched up will be closed, and
sockets that aren’t passed in but are configured for listening will be
bound be the container’s systemd instance.

So, let’s take a step back again. What did we gain through all of
this? Well, basically, we can now offer a number of full OS containers
on a single host, and the containers can offer their services without
running continously. The density of OS containers on the host can
hence be increased drastically.

Of course, this only works for kernel-based virtualization, not for
hardware virtualization. i.e. something like this can only be
implemented on systems such as libvirt-lxc or nspawn, but not in
qemu/kvm.

If you have a number of containers set up like this, here’s one
cool thing the journal allows you to do. If you pass -m to
journalctl on the host, it will automatically discover the
journals of all local containers and interleave them on
display. Nifty, eh?

With systemd 197 you have everything to set up your own socket
activated OS containers on-board. However, there are a couple of
improvements we’re likely to add soon: for example, right now even if
all services inside the container exit on idle, the container still
will stay around, and we really should make it exit on idle too, if
all its services exited and no logins are around. As it turns out we
already have much of the infrastructure for this around: we can reuse
the auto-suspend functionality we added for laptops: detecting when a
laptop is idle and suspending it then is a very similar problem to
detecting when a container is idle and shutting it down then.

Anyway, this blog story is already way too long. I hope I haven’t
lost you half-way already with all this talk of virtualization,
sockets, services, different OSes and stuff. I hope this blog story is
a good starting point for setting up powerful highly scalable server
systems. If you want to know more, consult the documentation and drop
by our IRC channel. Thank you!

Footnotes

[1] And BTW, this
is another reason
why fast boot times the way systemd offers them
are actually a really good thing on servers, too.

[2] To make it easy: you need a command line such as yum
--releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*'
--enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

to install Fedora, and debootstrap --arch=amd64 unstable
/srv/mycontainer/
to install Debian. Also see the bottom of systemd-nspawn(1).
Also note that auditing is currently broken for containers, and if enabled in
the kernel will cause all kinds of errors in the container. Use
audit=0 on the host’s kernel command line to turn it off.

systemd for Administrators, Part XIX

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/detect-virt.html

Happy new year
2013! Here
is
now the nineteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Detecting Virtualization

When we started working on systemd
we had a closer look on what the various existing init scripts used on
Linux where actually doing. Among other things we noticed that a
number of them where checking explicitly whether they were running in
a virtualized environment (i.e. in a kvm, VMWare, LXC guest or
suchlike) or not. Some init scripts disabled themselves in such
cases[1], others enabled themselves only in such
cases[2]. Frequently, it would probably have been a better
idea to check for other conditions rather than explicitly checking for
virtualization, but after looking at this from all sides we came to
the conclusion that in many cases explicitly conditionalizing services
based on detected virtualization is a valid thing to do. As a result
we added a new configuration option to systemd that can be used to
conditionalize services this way: ConditionVirtualization;
we also added a small tool that can be used in shell scripts to detect
virtualization: systemd-detect-virt(1);
and finally, we added a minimal bus interface to query this from other
applications.

Detecting whether your code is run inside a virtualized environment
is
actually not that hard
. Depending on what precisely you want to
detect it’s little more than running the CPUID instruction and maybe
checking a few files in /sys and /proc. The
complexity is mostly about knowing the strings to look for, and
keeping this list up-to-date. Currently, the the virtualization
detection code in systemd can detect the following virtualization
systems:

  • Hardware virtualization (i.e. VMs):

    • qemu
    • kvm
    • vmware
    • microsoft
    • oracle
    • xen
    • bochs
  • Same-kernel virtualization (i.e. containers):

Let’s have a look how one may make use if this functionality.

Conditionalizing Units

Adding ConditionVirtualization
to the [Unit] section of a unit file is enough to
conditionalize it depending on which virtualization is used or whether
one is used at all. Here’s an example:

[Unit]
Name=My Foobar Service (runs only only on guests)
ConditionVirtualization=yes

[Service]
ExecStart=/usr/bin/foobard

Instead of specifiying “yes” or “no” it is possible
to specify the ID of a specific virtualization solution (Example:
kvm“, “vmware“, …), or either
container” or “vm” to check whether the kernel is
virtualized or the hardware. Also, checks can be prefixed with an exclamation mark (“!”) to invert a check. For further details see the manual page.

In Shell Scripts

In shell scripts it is easy to check for virtualized systems with
the systemd-detect-virt(1)
tool. Here’s an example:

if systemd-detect-virt -q ; then
        echo "Virtualization is used:" `systemd-detect-virt`
else
        echo "No virtualization is used."
fi

If this tool is run it will return with an exit code of zero
(success) if a virtualization solution has been found, non-zero
otherwise. It will also print a short identifier of the used
virtualization solution, which can be suppressed with
-q. Also, with the -c and -v parameters it is
possible to detect only kernel or only hardware virtualization
environments. For further details see the manual
page
.

In Programs

Whether virtualization is available is also exported on the system bus:

$ gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Manager Virtualization
(<'systemd-nspawn'>,)

This property contains the empty string if no virtualization is
detected. Note that some container environments cannot be detected
directly from unprivileged code. That’s why we expose this property on
the bus rather than providing a library — the bus implicitly solves
the privilege problem quite nicely.

Note that all of this will only ever detect and return information
about the “inner-most” virtualization solution. If you stack
virtualization (“We must go deeper!”) then these interfaces will
expose the one the code is most directly interfacing
with. Specifically that means that if a container solution is used
inside of a VM, then only the container is generally detected and
returned.

Footonotes

[1] For example: running certain device management service in a
container environment that has no access to any physical hardware makes little sense.

[2] For example: some VM solutions work best if certain
vendor-specific userspace components are running that connect the
guest with the host in some way.

foss.in Needs Your Funding!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fossin2012-2.html

One of the most exciting conferences in the Free Software world, foss.in in Bangalore, India has trouble finding enough
sponsoring
for this year’s edition. Many speakers from
all around the Free Software world
(including yours truly) have signed up
to present at the event, and the conference would appreciate any corporate
funding they can get!

Please check if
your company can help
and contact the
organizers
for details!

See you in Bangalore!

FOSS.IN

systemd for Developers III

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/journal-submit.html

Here’s the third episode of of my
systemd for Developers series.

Logging to the Journal

In a recent blog
story
intended for administrators I shed some light on how to use
the journalctl(1)
tool to browse and search the systemd journal. In this blog story for developers
I want to explain a little how to get log data into the systemd
Journal in the first place.

The good thing is that getting log data into the Journal is not
particularly hard, since there’s a good chance the Journal already
collects it anyway and writes it to disk. The journal collects:

  1. All data logged via libc syslog()
  2. The data from the kernel logged with printk()
  3. Everything written to STDOUT/STDERR of any system service

This covers pretty much all of the traditional log output of a
Linux system, including messages from the kernel initialization phase,
the initial RAM disk, the early boot logic, and the main system
runtime.

syslog()

Let’s have a quick look how syslog() is used again. Let’s
write a journal message using this call:

#include <syslog.h>

int main(int argc, char *argv[]) {
        syslog(LOG_NOTICE, "Hello World");
        return 0;
}

This is C code, of course. Many higher level languages provide APIs
that allow writing local syslog messages. Regardless which language
you choose, all data written like this ends up in the Journal.

Let’s have a look how this looks after it has been written into the
journal (this is the JSON
output
journalctl -o json-pretty generates):

{
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "_TRANSPORT" : "syslog",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "SYSLOG_FACILITY" : "1",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "_PID" : "3068",
        "SYSLOG_IDENTIFIER" : "test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351126905014938"
}

This nicely shows how the Journal implicitly augmented our little
log message with various meta data fields which describe in more
detail the context our message was generated from. For an explanation
of the various fields, please refer to systemd.journal-fields(7)

printf()

If you are writing code that is run as a systemd service, generating journal
messages is even easier:

#include <stdio.h>

int main(int argc, char *argv[]) {
        printf("Hello World\n");
        return 0;
}

Yupp, that’s easy, indeed.

The printed string in this example is logged at a default log
priority of LOG_INFO[1]. Sometimes it is useful to change
the log priority for such a printed string. When systemd parses
STDOUT/STDERR of a service it will look for priority values enclosed
in < > at the beginning of each line[2], following the scheme
used by the kernel’s printk() which in turn took
inspiration from the BSD syslog network serialization of messages. We
can make use of this systemd feature like this:

#include <stdio.h>

#define PREFIX_NOTICE "<5>"

int main(int argc, char *argv[]) {
        printf(PREFIX_NOTICE "Hello World\n");
        return 0;
}

Nice! Logging with nothing but printf() but we still get
log priorities!

This scheme works with any programming language, including, of course, shell:

#!/bin/bash

echo "<5>Hellow world"

Native Messages

Now, what I explained above is not particularly exciting: the
take-away is pretty much only that things end up in the journal if
they are output using the traditional message printing APIs. Yaaawn!

Let’s make this more interesting, let’s look at what the Journal
provides as native APIs for logging, and let’s see what its benefits
are. Let’s translate our little example into the 1:1 counterpart
using the Journal’s logging API sd_journal_print(3):

#include <systemd/sd-journal.h>

int main(int argc, char *argv[]) {
        sd_journal_print(LOG_NOTICE, "Hello World");
        return 0;
}

This doesn’t look much more interesting than the two examples
above, right? After compiling this with `pkg-config --cflags
--libs libsystemd-journal`
appended to the compiler parameters,
let’s have a closer look at the JSON representation of the journal
entry this generates:

 {
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "CODE_FUNC" : "main",
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "CODE_FILE" : "src/journal/test-journal-submit.c",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World",
        "CODE_LINE" : "4",
        "_PID" : "3516",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351128226954170"
}

This looks pretty much the same, right? Almost! I highlighted three new
lines compared to the earlier output. Yes, you guessed it, by using
sd_journal_print() meta information about the generating
source code location is implicitly appended to each
message[3], which is helpful for a developer to identify
the source of a problem.

The primary reason for using the Journal’s native logging APIs is a
not just the source code location however: it is to allow
passing additional structured log messages from the program into the
journal. This additional log data may the be used to search the
journal for, is available for consumption for other programs, and
might help the administrator to track down issues beyond what is
expressed in the human readable message text. Here’s and example how
to do that with sd_journal_send():

#include <systemd/sd-journal.h>
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
        sd_journal_send("MESSAGE=Hello World!",
                        "MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555",
                        "PRIORITY=5",
                        "HOME=%s", getenv("HOME"),
                        "TERM=%s", getenv("TERM"),
                        "PAGE_SIZE=%li", sysconf(_SC_PAGESIZE),
                        "N_CPUS=%li", sysconf(_SC_NPROCESSORS_ONLN),
                        NULL);
        return 0;
}

This will write a log message to the journal much like the earlier
examples. However, this times a few additional, structured fields are
attached:

{
        "__CURSOR" : "s=ac9e9c423355411d87bf0ba1a9b424e8;i=5930;b=5335e9cf5d954633bb99aefc0ec38c25;m=16544f875b;t=4ccd863cdc4f0;x=896defe53cc1a96a",
        "__REALTIME_TIMESTAMP" : "1351129666274544",
        "__MONOTONIC_TIMESTAMP" : "95903778651",
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "CODE_FUNC" : "main",
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "CODE_FILE" : "src/journal/test-journal-submit.c",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_PID" : "4049",
        "CODE_LINE" : "6",
        "MESSAGE_ID" : "52fb62f99e2c49d89cfbf9d6de5e3555",
        "HOME" : "/home/lennart",
        "TERM" : "xterm-256color",
        "PAGE_SIZE" : "4096",
        "N_CPUS" : "4",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351129666241467"
}

Awesome! Our simple example worked! The five meta data fields we
attached to our message appeared in the journal. We used sd_journal_send()
for this which works much like sd_journal_print() but takes a
NULL terminated list of format strings each followed by its
arguments. The format strings must include the field name and a ‘=’
before the values.

Our little structured message included seven fields. The first three we passed are well-known fields:

  1. MESSAGE= is the actual human readable message part of the structured message.
  2. PRIORITY= is the numeric message priority value as known from BSD syslog formatted as an integer string.
  3. MESSAGE_ID= is a 128bit ID that identifies our specific
    message call, formatted as hexadecimal string. We randomly generated
    this string with journalctl --new-id128. This can be used by
    applications to track down all occasions of this specific
    message. The 128bit can be a UUID, but this is not a requirement or enforced.

Applications may relatively freely define additional fields as they
see fit (we defined four pretty arbitrary ones in our example). A
complete list of the currently well-known fields is available in systemd.journal-fields(7).

Let’s see how the message ID helps us finding this message and all
its occasions in the journal:

$ journalctl MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555
-- Logs begin at Thu, 2012-10-18 04:07:03 CEST, end at Thu, 2012-10-25 04:48:21 CEST. --
Oct 25 03:47:46 epsilon test-journal-se[4049]: Hello World!
Oct 25 04:40:36 epsilon test-journal-se[4480]: Hello World!

Seems I already invoked this example tool twice!

Many messages systemd itself generates have
message IDs
. This is useful for example, to find all occasions
where a program dumped core (journalctl
MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
), or when a user
logged in (journalctl
MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66
). If your application
generates a message that might be interesting to recognize in the
journal stream later on, we recommend attaching such a message ID to
it. You can easily allocate a new one for your message with journalctl
--new-id128
.

This example shows how we can use the Journal’s native APIs to
generate structured, recognizable messages. You can do much more than
this with the C API. For example, you may store binary data in journal
fields as well, which is useful to attach coredumps or hard disk SMART
states to events where this applies. In order to make this blog story
not longer than it already is we’ll not go into detail about how to do
this, an I ask you to check out sd_journal_send(3)
for further information on this.

Python

The examples above focus on C. Structured logging to the Journal is
also available from other languages. Along with systemd itself we ship
bindings for Python. Here’s an example how to use this:

from systemd import journal
journal.send('Hello world')
journal.send('Hello, again, world', FIELD2='Greetings!', FIELD3='Guten tag')

Other binding exist for Node.js,
PHP, Lua.

Portability

Generating structured data is a very useful feature for services to
make their logs more accessible both for administrators and other
programs. In addition to the implicit structure the Journal
adds to all logged messages it is highly beneficial if the various
components of our stack also provide explicit structure
in their messages, coming from within the processes themselves.

Porting an existing program to the Journal’s logging APIs comes
with one pitfall though: the Journal is Linux-only. If non-Linux
portability matters for your project it’s a good idea to provide an
alternative log output, and make it selectable at compile-time.

Regardless which way to log you choose, in all cases we’ll forward
the message to a classic syslog daemon running side-by-side with the
Journal, if there is one. However, much of the structured meta data of
the message is not forwarded since the classic syslog protocol simply
has no generally accepted way to encode this and we shouldn’t attempt
to serialize meta data into classic syslog messages which might turn
/var/log/messages into an unreadable dump of machine
data. Anyway, to summarize this: regardless if you log with
syslog(), printf(), sd_journal_print() or
sd_journal_send(), the message will be stored and indexed by
the journal and it will also be forwarded to classic syslog.

And that’s it for today. In a follow-up episode we’ll focus on
retrieving messages from the Journal using the C API, possibly
filtering for a specific subset of messages. Later on, I hope to give
a real-life example how to port an existing service to the Journal’s
logging APIs. Stay tuned!

Footnotes

[1] This can be changed with the SyslogLevel= service
setting. See systemd.exec(5)
for details.

[2] Interpretation of the < > prefixes of logged lines
may be disabled with the SyslogLevelPrefix= service setting. See systemd.exec(5)
for details.

[3] Appending the code location to the log messages can be
turned off at compile time by defining
-DSD_JOURNAL_SUPPRESS_LOCATION.

systemd for Administrators, Part XVIII

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/resources.html

Hot
on
the
heels
of
the
previous
story
, here’s
now the eighteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Managing Resources

An important facet of modern computing is resource management: if
you run more than one program on a single machine you want to assign
the available resources to them enforcing particular policies. This is
particularly crucial on smaller, embedded or mobile systems where the
scarce resources are the main constraint, but equally for large
installations such as cloud setups, where resources are plenty, but
the number of programs/services/containers on a single node is
drastically higher.

Traditionally, on Linux only one policy was really available: all
processes got about the same CPU time, or IO bandwith, modulated a bit
via the process nice value. This approach is very simple and
covered the various uses for Linux quite well for a long
time. However, it has drawbacks: not all all processes deserve to be
even, and services involving lots of processes (think: Apache with a
lot of CGI workers) this way would get more resources than services
whith very few (think: syslog).

When thinking about service management for systemd, we quickly
realized that resource management must be core functionality of it. In
a modern world — regardless if server or embedded — controlling CPU,
Memory, and IO resources of the various services cannot be an
afterthought, but must be built-in as first-class service settings. And
it must be per-service and not per-process as the traditional nice
values or POSIX
Resource Limits
were.

In this story I want to shed some light on what you can do to
enforce resource policies on systemd services. Resource Management in
one way or another has been available in systemd for a while already,
so it’s really time we introduce this to the broader audience.

In an
earlier blog post
I highlighted the difference between Linux
Control Croups (cgroups) as a labelled, hierarchal grouping mechanism,
and Linux cgroups as a resource controlling subsystem. While systemd
requires the former, the latter is optional. And this optional latter
part is now what we can make use of to manage per-service
resources. (At this points, it’s probably a good idea to read up on cgroups before
reading on, to get at least a basic idea what they are and what they
accomplish. Even thought the explanations below will be pretty
high-level, it all makes a lot more sense if you grok the background a
bit.)

The main Linux cgroup controllers for resource management are cpu,
memory
and blkio. To
make use of these, they need to be enabled in the kernel, which many
distributions (including Fedora) do. systemd exposes a couple of high-level service
settings to make use of these controllers without requiring too much
knowledge of the gory kernel details.

Managing CPU

As a nice default, if the cpu controller is enabled in the
kernel, systemd will create a cgroup for each service when starting
it. Without any further configuration this already has one nice
effect: on a systemd system every system service will get an even
amount of CPU, regardless how many processes it consists off. Or in
other words: on your web server MySQL will get the roughly same amount
of CPU as Apache, even if the latter consists a 1000 CGI script
processes, but the former only of a few worker tasks. (This behavior can
be turned off, see DefaultControllers=
in /etc/systemd/system.conf.)

On top of this default, it is possible to explicitly configure the
CPU shares a service gets with the CPUShares=
setting. The default value is 1024, if you increase this number you’ll
assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.

Let’s see in more detail, how we can make use of this. Let’s say we
want to assign Apache 1500 CPU shares instead of the default of
1024. For that, let’s create a new administrator service file for
Apache in /etc/systemd/system/httpd.service, overriding the
vendor supplied one in /usr/lib/systemd/system/httpd.service,
but let’s change the CPUShares= parameter:

.include /usr/lib/systemd/system/httpd.service

[Service]
CPUShares=1500

The first line will pull in the vendor service file. Now, lets’s
reload systemd’s configuration and restart Apache so that the new
service file is taken into account:

systemctl daemon-reload
systemctl restart httpd.service

And yeah, that’s already it, you are done!

(Note that setting CPUShares= in a unit file will cause the
specific service to get its own cgroup in the cpu hierarchy,
even if cpu is not included in
DefaultControllers=.)

Analyzing Resource usage

Of course, changing resource assignments without actually
understanding the resource usage of the services in questions is like
blind flying. To help you understand the resource usage of all
services, we created the tool systemd-cgtop,
that will enumerate all cgroups of the system, determine their
resource usage (CPU, Memory, and IO) and present them in a top-like fashion. Building
on the fact that systemd services are managed in cgroups this tool
hence can present to you for services what top shows you for
processes.

Unfortunately, by default cgtop will only be able to chart
CPU usage per-service for you, IO and Memory are only tracked as total
for the entire machine. The reason for this is simply that by default
there are no per-service cgroups in the blkio and
memory controller hierarchies but that’s what we need to
determine the resource usage. The best way to get this data for all
services is to simply add the memory and blkio
controllers to the aforementioned DefaultControllers= setting
in system.conf.

Managing Memory

To enforce limits on memory systemd provides the
MemoryLimit=, and MemorySoftLimit= settings for
services, summing up the memory of all its processes. These settings
take memory sizes in bytes that are the total memory limit for the
service. This setting understands the usual K, M, G, T suffixes for
Kilobyte, Megabyte, Gigabyte, Terabyte (to the base of 1024).

.include /usr/lib/systemd/system/httpd.service

[Service]
MemoryLimit=1G

(Analogue to CPUShares= above setting this option will cause
the service to get its own cgroup in the memory cgroup
hierarchy.)

Managing Block IO

To control block IO multiple settings are available. First of all
BlockIOWeight= may be used which assigns an IO weight
to a specific service. In behaviour the weight concept is not
unlike the shares concept of CPU resource control (see
above). However, the default weight is 1000, and the valid range is
from 10 to 1000:

.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=500

Optionally, per-device weights can be specified:

.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/dev/disk/by-id/ata-SAMSUNG_MMCRE28G8MXP-0VBL1_DC06K01009SE009B5252 750

Instead of specifiying an actual device node you also specify any
path in the file system:

.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/home/lennart 750

If the specified path does not refer to a device node systemd will
determine the block device /home/lennart is on, and assign
the bandwith weight to it.

You can even add per-device and normal lines at the same time,
which will set the per-device weight for the device, and the other
value as default for everything else.

Alternatively one may control explicit bandwith limits with the
BlockIOReadBandwidth= and BlockIOWriteBandwidth=
settings. These settings take a pair of device node and bandwith rate
(in bytes per second) or of a file path and bandwith rate:

.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOReadBandwith=/var/log 5M

This sets the maximum read bandwith on the block device backing
/var/log to 5Mb/s.

(Analogue to CPUShares= and MemoryLimit= using
any of these three settings will result in the service getting its own
cgroup in the blkio hierarchy.)

Managing Other Resource Parameters

The options described above cover only a small subset of the
available controls the various Linux control group controllers
expose. We picked these and added high-level options for them since we
assumed that these are the most relevant for most folks, and that they
really needed a nice interface that can handle units properly and
resolve block device names.

In many cases the options explained above might not be sufficient
for your usecase, but a low-level kernel cgroup setting might help. It
is easy to make use of these options from systemd unit files, without
having them covered with a high-level setting. For example, sometimes
it might be useful to set the swappiness of a service. The
kernel makes this controllable via the memory.swappiness
cgroup attribute, but systemd does not expose it as a high-level
option. Here’s how you use it nonetheless, using the low-level
ControlGroupAttribute= setting:

.include /usr/lib/systemd/system/httpd.service

[Service]
ControlGroupAttribute=memory.swappiness 70

(Analogue to the other cases this too causes the service to be
added to the memory hierarchy.)

Later on we might add more high-level controls for the
various cgroup attributes. In fact, please ping us if you frequently
use one and believe it deserves more focus. We’ll consider adding a
high-level option for it then. (Even better: send us a patch!)

Disclaimer: note that making use of the various resource
controllers does have a runtime impact on the system. Enforcing
resource limits comes at a price. If you do use them, certain
operations do get slower. Especially the memory controller
has (used to have?) a bad reputation to come at a performance
cost.

For more details on all of this, please have a look at the
documenation of the mentioned
unit settings
, and of the cpu,
memory
and blkio
controllers.

And that’s it for now. Of course, this blog story only focussed on
the per-service resource settings. On top this, you can also
set the more traditional, well-known per-process resource
settings, which will then be inherited by the various subprocesses,
but always only be enforced per-process. More specifically that’s
IOSchedulingClass=, IOSchedulingPriority=,
CPUSchedulingPolicy=, CPUSchedulingPriority=,
CPUAffinity=, LimitCPU= and related. These do not
make use of cgroup controllers and have a much lower performance
cost. We might cover those in a later article in more detail.

systemd for Administrators, Part XVII

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/journalctl.html

It’s
that
time again,
here’s
now the seventeenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Using the Journal

A
while back I already
posted a blog story introducing some
functionality of the journal, and how it is exposed in
systemctl. In this episode I want to explain a few more uses
of the journal, and how you can make it work for you.

If you are wondering what the journal is, here’s an explanation in
a few words to get you up to speed: the journal is a component of systemd,
that captures Syslog messages, Kernel log messages, initial RAM disk
and early boot messages as well as messages written to STDOUT/STDERR
of all services, indexes them and makes this available to the user. It
can be used in parallel, or in place of a traditional syslog daemon,
such as rsyslog or syslog-ng. For more information, see the initial
announcement
.

The journal has been part of Fedora since F17. With Fedora 18 it
now has grown into a reliable, powerful tool to handle your logs. Note
however, that on F17 and F18 the journal is configured by default to
store logs only in a small ring-buffer in /run/log/journal,
i.e. not persistent. This of course limits its usefulness quite
drastically but is sufficient to show a bit of recent log history in
systemctl status. For Fedora 19, we plan to change this, and
enable persistent logging by default. Then, journal files will be
stored in /var/log/journal and can grow much larger, thus
making the journal a lot more useful.

Enabling Persistency

In the meantime, on F17 or F18, you can enable journald’s persistent storage manually:

# mkdir -p /var/log/journal

After that, it’s a good idea to reboot, to get some useful
structured data into your journal to play with. Oh, and since you have
the journal now, you don’t need syslog anymore (unless having
/var/log/messages as text file is a necessity for you.), so
you can choose to deinstall rsyslog:

# yum remove rsyslog

Basics

Now we are ready to go. The following text shows a lot of features
of systemd 195 as it will be included in Fedora 18[1], so
if your F17 can’t do the tricks you see, please wait for F18. First,
let’s start with some basics. To access the logs of the journal use
the journalctl(1)
tool. To have a first look at the logs, just type in:

# journalctl

If you run this as root you will see all logs generated on the
system, from system components the same way as for logged in
users. The output you will get looks like a pixel-perfect copy of the
traditional /var/log/messages format, but actually has a
couple of improvements over it:

  • Lines of error priority (and higher) will be highlighted red.
  • Lines of notice/warning priority will be highlighted bold.
  • The timestamps are converted into your local time-zone.
  • The output is auto-paged with your pager of choice (defaults to less).
  • This will show all available data, including rotated logs.
  • Between the output of each boot we’ll add a line clarifying that a new boot begins now.

Note that in this blog story I will not actually show you any of
the output this generates, I cut that out for brevity — and to give
you a reason to try it out yourself with a current image for F18’s
development version with systemd 195. But I do hope you get the idea
anyway.

Access Control

Browsing logs this way is already pretty nice. But requiring to be
root sucks of course, even administrators tend to do most of their
work as unprivileged users these days. By default, Journal users can
only watch their own logs, unless they are root or in the adm
group. To make watching system logs more fun, let’s add ourselves to
adm:

# usermod -a -G adm lennart

After logging out and back in as lennart I know have access
to the full journal of the system and all users:

$ journalctl

Live View

If invoked without parameters journalctl will show me the current
log database. Sometimes one needs to watch logs as they grow, where
one previously used tail -f /var/log/messages:

$ journalctl -f

Yes, this does exactly what you expect it to do: it will show you
the last ten logs lines and then wait for changes and show them as
they take place.

Basic Filtering

When invoking journalctl without parameters you’ll see the
whole set of logs, beginning with the oldest message stored. That of
course, can be a lot of data. Much more useful is just viewing the
logs of the current boot:

$ journalctl -b

This will show you only the logs of the current boot, with all the
aforementioned gimmicks mentioned. But sometimes even this is way too
much data to process. So what about just listing all the real issues
to care about: all messages of priority levels ERROR and worse, from
the current boot:

$ journalctl -b -p err

If you reboot only seldom the -b makes little sense,
filtering based on time is much more useful:

$ journalctl --since=yesterday

And there you go, all log messages from the day before at 00:00 in
the morning until right now. Awesome! Of course, we can combine this with
-p err or a similar match. But humm, we are looking for
something that happened on the 15th of October, or was it the 16th?

$ journalctl --since=2012-10-15 --until="2011-10-16 23:59:59"

Yupp, there we go, we found what we were looking for. But humm, I
noticed that some CGI script in Apache was acting up earlier today,
let’s see what Apache logged at that time:

$ journalctl -u httpd --since=00:00 --until=9:30

Oh, yeah, there we found it. But hey, wasn’t there an issue with
that disk /dev/sdc? Let’s figure out what was going on there:

$ journalctl /dev/sdc

OMG, a disk error![2] Hmm, let’s quickly replace the
disk before we lose data. Done! Next! — Hmm, didn’t I see that the vpnc binary made a booboo? Let’s
check for that:

$ journalctl /usr/sbin/vpnc

Hmm, I don’t get this, this seems to be some weird interaction with
dhclient, let’s see both outputs, interleaved:

$ journalctl /usr/sbin/vpnc /usr/sbin/dhclient

That did it! Found it!

Advanced Filtering

Whew! That was awesome already, but let’s turn this up a
notch. Internally systemd stores each log entry with a set of
implicit meta data. This meta data looks a lot like an
environment block, but actually is a bit more powerful: values can
take binary, large values (though this is the exception, and usually
they just contain UTF-8), and fields can have multiple values assigned
(an exception too, usually they only have one value). This implicit
meta data is collected for each and every log message, without user
intervention. The data will be there, and wait to be used by
you. Let’s see how this looks:

$ journalctl -o verbose -n
[...]
Tue, 2012-10-23 23:51:38 CEST [s=ac9e9c423355411d87bf0ba1a9b424e8;i=4301;b=5335e9cf5d954633bb99aefc0ec38c25;m=882ee28d2;t=4ccc0f98326e6;x=f21e8b1b0994d7ee]
        PRIORITY=6
        SYSLOG_FACILITY=3
        _MACHINE_ID=a91663387a90b89f185d4e860000001a
        _HOSTNAME=epsilon
        _TRANSPORT=syslog
        SYSLOG_IDENTIFIER=avahi-daemon
        _COMM=avahi-daemon
        _EXE=/usr/sbin/avahi-daemon
        _SYSTEMD_CGROUP=/system/avahi-daemon.service
        _SYSTEMD_UNIT=avahi-daemon.service
        _SELINUX_CONTEXT=system_u:system_r:avahi_t:s0
        _UID=70
        _GID=70
        _CMDLINE=avahi-daemon: registering [epsilon.local]
        MESSAGE=Joining mDNS multicast group on interface wlan0.IPv4 with address 172.31.0.53.
        _BOOT_ID=5335e9cf5d954633bb99aefc0ec38c25
        _PID=27937
        SYSLOG_PID=27937
        _SOURCE_REALTIME_TIMESTAMP=1351029098747042

(I cut out a lot of noise here, I don’t want to make this story
overly long. -n without parameter shows you the last 10 log
entries, but I cut out all but the last.)

With the -o verbose switch we enabled verbose
output. Instead of showing a pixel-perfect copy of classic
/var/log/messages that only includes a minimimal subset of
what is available we now see all the gory details the journal has
about each entry. But it’s highly interesting: there is user credential
information, SELinux bits, machine information and more. For a full
list of common, well-known fields, see the
man page
.

Now, as it turns out the journal database is indexed by all
of these fields, out-of-the-box! Let’s try this out:

$ journalctl _UID=70

And there you go, this will show all log messages logged from Linux
user ID 70. As it turns out one can easily combine these matches:

$ journalctl _UID=70 _UID=71

Specifying two matches for the same field will result in a logical
OR combination of the matches. All entries matching either will be
shown, i.e. all messages from either UID 70 or 71.

$ journalctl _HOSTNAME=epsilon _COMM=avahi-daemon

You guessed it, if you specify two matches for different field
names, they will be combined with a logical AND. All entries matching
both will be shown now, meaning that all messages from processes named
avahi-daemon and host epsilon.

But of course, that’s
not fancy enough for us. We are computer nerds after all, we live off
logical expressions. We must go deeper!

$ journalctl _HOSTNAME=theta _UID=70 + _HOSTNAME=epsilon _COMM=avahi-daemon

The + is an explicit OR you can use in addition to the implied OR when
you match the same field twice. The line above hence means: show me
everything from host theta with UID 70, or of host
epsilon with a process name of avahi-daemon.

And now, it becomes magic!

That was already pretty cool, right? Righ! But heck, who can
remember all those values a field can take in the journal, I mean,
seriously, who has thaaaat kind of photographic memory? Well, the
journal has:

$ journalctl -F _SYSTEMD_UNIT

This will show us all values the field _SYSTEMD_UNIT takes in the
database, or in other words: the names of all systemd services which
ever logged into the journal. This makes it super-easy to build nice
matches. But wait, turns out this all is actually hooked up with shell
completion on bash! This gets even more awesome: as you type your
match expression you will get a list of well-known field names, and of
the values they can take! Let’s figure out how to filter for SELinux
labels again. We remember the field name was something with SELINUX in
it, let’s try that:

$ journalctl _SE<TAB>

And yupp, it’s immediately completed:

$ journalctl _SELINUX_CONTEXT=

Cool, but what’s the label again we wanted to match for?

$ journalctl _SELINUX_CONTEXT=<TAB><TAB>
kernel                                                       system_u:system_r:local_login_t:s0-s0:c0.c1023               system_u:system_r:udev_t:s0-s0:c0.c1023
system_u:system_r:accountsd_t:s0                             system_u:system_r:lvm_t:s0                                   system_u:system_r:virtd_t:s0-s0:c0.c1023
system_u:system_r:avahi_t:s0                                 system_u:system_r:modemmanager_t:s0-s0:c0.c1023              system_u:system_r:vpnc_t:s0
system_u:system_r:bluetooth_t:s0                             system_u:system_r:NetworkManager_t:s0                        system_u:system_r:xdm_t:s0-s0:c0.c1023
system_u:system_r:chkpwd_t:s0-s0:c0.c1023                    system_u:system_r:policykit_t:s0                             unconfined_u:system_r:rpm_t:s0-s0:c0.c1023
system_u:system_r:chronyd_t:s0                               system_u:system_r:rtkit_daemon_t:s0                          unconfined_u:system_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:crond_t:s0-s0:c0.c1023                     system_u:system_r:syslogd_t:s0                               unconfined_u:system_r:useradd_t:s0-s0:c0.c1023
system_u:system_r:devicekit_disk_t:s0                        system_u:system_r:system_cronjob_t:s0-s0:c0.c1023            unconfined_u:unconfined_r:unconfined_dbusd_t:s0-s0:c0.c1023
system_u:system_r:dhcpc_t:s0                                 system_u:system_r:system_dbusd_t:s0-s0:c0.c1023              unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:dnsmasq_t:s0-s0:c0.c1023                   system_u:system_r:systemd_logind_t:s0
system_u:system_r:init_t:s0                                  system_u:system_r:systemd_tmpfiles_t:s0

Ah! Right! We wanted to see everything logged under PolicyKit’s security label:

$ journalctl _SELINUX_CONTEXT=system_u:system_r:policykit_t:s0

Wow! That was easy! I didn’t know anything related to SELinux could
be thaaat easy! 😉 Of course this kind of completion works with any
field, not just SELinux labels.

So much for now. There’s a lot more cool stuff in journalctl(1)
than this. For example, it generates JSON output for you! You can match
against kernel fields! You can get simple
/var/log/messages-like output but with relative timestamps!
And so much more!

Anyway, in the next weeks I hope to post more stories about all the
cool things the journal can do for you. This is just the beginning,
stay tuned.

Footnotes

[1] systemd 195 is currently still in Bodhi
but hopefully will get into F18 proper soon, and definitely before the
release of Fedora 18.

[2] OK, I cheated here, indexing by block device is not in
the kernel yet, but on its way due to Hannes’
fantastic work
, and I hope it will make appearence in
F18.

systemd for Administrators, Part XVI

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/serial-console.html

And,
yes,
here’s
now the sixteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Gettys on Serial Consoles (and Elsewhere)

TL;DR: To make use of a serial console, just use
console=ttyS0 on the kernel command line, and systemd will
automatically start a getty on it for you.

While physical RS232 serial ports
have become exotic in today’s PCs they play an important role in
modern servers and embedded hardware. They provide a relatively robust
and minimalistic way to access the console of your device, that works
even when the network is hosed, or the primary UI is unresponsive. VMs
frequently emulate a serial port as well.

Of course, Linux has always had good support for serial consoles,
but with systemd we
tried to make serial console support even simpler to use. In the
following text I’ll try to give an overview how serial console gettys on
systemd work, and how TTYs of any kind are handled.

Let’s start with the key take-away: in most cases, to get a login
prompt on your serial prompt you don’t need to do anything. systemd
checks the kernel configuration for the selected kernel console and
will simply spawn a serial getty on it. That way it is entirely
sufficient to configure your kernel console properly (for example, by
adding console=ttyS0 to the kernel command line) and that’s
it. But let’s have a look at the details:

In systemd, two template units are responsible for bringing up a
login prompt on text consoles:

  1. [email protected] is responsible for virtual
    terminal
    (VT) login prompts, i.e. those on your VGA screen as
    exposed in /dev/tty1 and similar devices.
  2. [email protected] is responsible for all other
    terminals, including serial ports such as /dev/ttyS0. It
    differs in a couple of ways from [email protected]: among other
    things the $TERM environment variable is set to
    vt102 (hopefully a good default for most serial terminals)
    rather than linux (which is the right choice for VTs only),
    and a special logic that clears the VT scrollback buffer (and only
    work on VTs) is skipped.
Virtual Terminals

Let’s have a closer look how [email protected] is started,
i.e. how login prompts on the virtual terminal (i.e. non-serial TTYs)
work. Traditionally, the init system on Linux machines was configured
to spawn a fixed number login prompts at boot. In most cases six
instances of the getty program were spawned, on the first six VTs,
tty1 to tty6.

In a systemd world we made this more dynamic: in order to make
things more efficient login prompts are now started on demand only. As
you switch to the VTs the getty service is instantiated to
[email protected], [email protected] and so
on. Since we don’t have to unconditionally start the getty processes
anymore this allows us to save a bit of resources, and makes start-up
a bit faster. This behaviour is mostly transparent to the user: if the
user activates a VT the getty is started right-away, so that the user
will hardly notice that it wasn’t running all the time. If he then
logs in and types ps he’ll notice however that getty
instances are only running for the VTs he so far switched to.

By default this automatic spawning is done for the VTs up to VT6
only (in order to be close to the traditional default configuration of
Linux systems)[1]. Note that the auto-spawning of gettys
is only attempted if no other subsystem took possession of the VTs
yet. More specifically, if a user makes frequent use of fast user
switching
via GNOME he’ll get his X sessions on the first six VTs,
too, since the lowest available VT is allocated for each session.

Two VTs are handled specially by the auto-spawning logic: firstly
tty1 gets special treatment: if we boot into graphical mode
the display manager takes possession of this VT. If we boot into
multi-user (text) mode a getty is started on it — unconditionally,
without any on-demand logic[2].

Secondly, tty6 is
especially reserved for auto-spawned gettys and unavailable to other
subsystems such as X[3]. This is done in order to ensure
that there’s always a way to get a text login, even if due to
fast user switching X took possession of more than 5 VTs.

Serial Terminals

Handling of login prompts on serial terminals (and all other kind
of non-VT terminals) is different from that of VTs. By default systemd
will instantiate one [email protected] on the main
kernel[4] console, if it is not a virtual terminal. The
kernel console is where the kernel outputs its own log messages and is
usually configured on the kernel command line in the boot loader via
an argument such as console=ttyS0[5]. This logic ensures that
when the user asks the kernel to redirect its output onto a certain
serial terminal, he will automatically also get a login prompt on it
as the boot completes[6]. systemd will also spawn a login
prompt on the first special VM console (that’s /dev/hvc0,
/dev/xvc0, /dev/hvsi0), if the system is run in a VM
that provides these devices. This logic is implemented in a generator
called systemd-getty-generator
that is run early at boot and pulls in the necessary services
depending on the execution environment.

In many cases, this automatic logic should already suffice to get
you a login prompt when you need one, without any specific
configuration of systemd. However, sometimes there’s the need to
manually configure a serial getty, for example, if more than one
serial login prompt is needed or the kernel console should be
redirected to a different terminal than the login prompt. To
facilitate this it is sufficient to instantiate
[email protected] once for each serial port you want it
to run on[7]:

# systemctl enable [email protected]
# systemctl start [email protected]

And that’s it. This will make sure you get the login prompt on the
chosen port on all subsequent boots, and starts it right-away
too.

Sometimes, there’s the need to configure the login prompt in even
more detail. For example, if the default baud rate configured by the
kernel is not correct or other agetty parameters need to
be changed. In such a case simply copy the default unit template to
/etc/systemd/system and edit it there:

# cp /usr/lib/systemd/system/[email protected] /etc/systemd/system/[email protected]
# vi /etc/systemd/system/[email protected]
 .... now make your changes to the agetty command line ...
# ln -s /etc/systemd/system/[email protected] /etc/systemd/system/getty.target.wants/
# systemctl daemon-reload
# systemctl start [email protected]

This creates a unit file that is specific to serial port
ttyS2, so that you can make specific changes to this port and
this port only.

And this is pretty much all there’s to say about serial ports, VTs
and login prompts on them. I hope this was interesting, and please
come back soon for the next installment of this series!

Footnotes

[1] You can easily modify this by changing
NAutoVTs= in logind.conf.

[2] Note that whether the getty on VT1 is started on-demand
or not hardly makes a difference, since VT1 is the default active VT
anyway, so the demand is there anyway at boot.

[3] You can easily change this special reserved VT by
modifying ReserveVT= in logind.conf.

[4] If multiple kernel consoles are used simultaneously, the
main console is the one listed first in
/sys/class/tty/console/active, which is the last one
listed on the kernel command line.

[5] See kernel-parameters.txt
for more information on this kernel command line
option.

[6] Note that agetty -s is used here so that the
baud rate configured at the kernel command line is not altered and
continued to be used by the login prompt.

[7] Note that this systemctl enable syntax only
works with systemd 188 and newer (i.e. F18). On older versions use
ln -s /usr/lib/systemd/system/[email protected]
/etc/systemd/system/getty.target.wants/[email protected] ; systemctl
daemon-reload
instead.

Berlin Open Source Meetup

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/berlin-open-source-meetup.html

Prater

Chris Kühl and I are organizing a Berlin
Open Source Meetup
on Aug 19th at the Prater Biergarten in Prenzlauer Berg.
If you live in Berlin (or are passing by) and are involved in or interested in
Open Source then you are invited!

There’s also a Google+ event for the meetup.

It’s a public event, so everybody is welcome, and please feel free to invite others!

See you at the Prater!

systemd for Administrators, Part XV

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/watchdog.html

Quickly
following the previous iteration
, here’s
now the fifteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Watchdogs

There are three big target audiences we try to cover with systemd:
the embedded/mobile folks, the desktop people and the server
folks. While the systems used by embedded/mobile tend to be
underpowered and have few resources are available, desktops tend to be
much more powerful machines — but still much less resourceful than
servers. Nonetheless there are surprisingly many features that matter
to both extremes of this axis (embedded and servers), but not the
center (desktops). On of them is support for watchdogs in
hardware and software.

Embedded devices frequently rely on watchdog hardware that resets
it automatically if software stops responding (more specifically,
stops signalling the hardware in fixed intervals that it is still
alive). This is required to increase reliability and make sure that
regardless what happens the best is attempted to get the system
working again. Functionality like this makes little sense on the
desktop[1]. However, on
high-availability servers watchdogs are frequently used, again.

Starting with version 183 systemd provides full support for
hardware watchdogs (as exposed in /dev/watchdog to
userspace), as well as supervisor (software) watchdog support for
invidual system services. The basic idea is the following: if enabled,
systemd will regularly ping the watchdog hardware. If systemd or the
kernel hang this ping will not happen anymore and the hardware will
automatically reset the system. This way systemd and the kernel are
protected from boundless hangs — by the hardware. To make the chain
complete, systemd then exposes a software watchdog interface for
individual services so that they can also be restarted (or some other
action taken) if they begin to hang. This software watchdog logic can
be configured individually for each service in the ping frequency and
the action to take. Putting both parts together (i.e. hardware
watchdogs supervising systemd and the kernel, as well as systemd
supervising all other services) we have a reliable way to watchdog
every single component of the system.

To make use of the hardware watchdog it is sufficient to set the
RuntimeWatchdogSec= option in
/etc/systemd/system.conf. It defaults to 0 (i.e. no hardware
watchdog use). Set it to a value like 20s and the watchdog is
enabled. After 20s of no keep-alive pings the hardware will reset
itself. Note that systemd will send a ping to the hardware at half the
specified interval, i.e. every 10s. And that’s already all there is to
it. By enabling this single, simple option you have turned on
supervision by the hardware of systemd and the kernel beneath
it.[2]

Note that the hardware watchdog device (/dev/watchdog) is
single-user only. That means that you can either enable this
functionality in systemd, or use a separate external watchdog daemon,
such as the aptly named watchdog.

ShutdownWatchdogSec= is another option that can be
configured in /etc/systemd/system.conf. It controls the
watchdog interval to use during reboots. It defaults to 10min, and
adds extra reliability to the system reboot logic: if a clean reboot
is not possible and shutdown hangs, we rely on the watchdog hardware
to reset the system abruptly, as extra safety net.

So much about the hardware watchdog logic. These two options are
really everything that is necessary to make use of the hardware
watchdogs. Now, let’s have a look how to add watchdog logic to
individual services.

First of all, to make software watchdog-supervisable it needs to be
patched to send out “I am alive” signals in regular intervals in its
event loop. Patching this is relatively easy. First, a daemon needs to
read the WATCHDOG_USEC= environment variable. If it is set,
it will contain the watchdog interval in usec formatted as ASCII text
string, as it is configured for the service. The daemon should then
issue sd_notify("WATCHDOG=1")
calls every half of that interval. A daemon patched this way should
transparently support watchdog functionality by checking whether the
environment variable is set and honouring the value it is set to.

To enable the software watchdog logic for a service (which has been
patched to support the logic pointed out above) it is sufficient to
set the WatchdogSec= to the desired failure latency. See systemd.service(5)
for details on this setting. This causes WATCHDOG_USEC= to be
set for the service’s processes and will cause the service to enter a
failure state as soon as no keep-alive ping is received within the
configured interval.

If a service enters a failure state as soon as the watchdog logic
detects a hang, then this is hardly sufficient to build a reliable
system. The next step is to configure whether the service shall be
restarted and how often, and what to do if it then still fails. To
enable automatic service restarts on failure set
Restart=on-failure for the service. To configure how many
times a service shall be attempted to be restarted use the combination
of StartLimitBurst= and StartLimitInterval= which
allow you to configure how often a service may restart within a time
interval. If that limit is reached, a special action can be
taken. This action is configured with StartLimitAction=. The
default is a none, i.e. that no further action is taken and
the service simply remains in the failure state without any further
attempted restarts. The other three possible values are
reboot, reboot-force and
reboot-immediate. reboot attempts a clean reboot,
going through the usual, clean shutdown logic. reboot-force
is more abrupt: it will not actually try to cleanly shutdown any
services, but immediately kills all remaining services and unmounts
all file systems and then forcibly reboots (this way all file systems
will be clean but reboot will still be very fast). Finally,
reboot-immediate does not attempt to kill any process or
unmount any file systems. Instead it just hard reboots the machine
without delay. reboot-immediate hence comes closest to a
reboot triggered by a hardware watchdog. All these settings are
documented in systemd.service(5).

Putting this all together we now have pretty flexible options to
watchdog-supervise a specific service and configure automatic restarts
of the service if it hangs, plus take ultimate action if that doesn’t
help.

Here’s an example unit file:

[Unit]
Description=My Little Daemon
Documentation=man:mylittled(8)

[Service]
ExecStart=/usr/bin/mylittled
WatchdogSec=30s
Restart=on-failure
StartLimitInterval=5min
StartLimitBurst=4
StartLimitAction=reboot-force

This service will automatically be restarted if it hasn’t pinged
the system manager for longer than 30s or if it fails otherwise. If it
is restarted this way more often than 4 times in 5min action is taken
and the system quickly rebooted, with all file systems being clean
when it comes up again.

And that’s already all I wanted to tell you about! With hardware
watchdog support right in PID 1, as well as supervisor watchdog
support for individual services we should provide everything you need
for most watchdog usecases. Regardless if you are building an embedded
or mobile applience, or if your are working with high-availability
servers, please give this a try!

(Oh, and if you wonder why in heaven PID 1 needs to deal with
/dev/watchdog, and why this shouldn’t be kept in a separate
daemon, then please read this again and try to understand that this is
all about the supervisor chain we are building here, where the hardware watchdog
supervises systemd, and systemd supervises the individual
services. Also, we believe that a service not responding should be
treated in a similar way as any other service error. Finally, pinging
/dev/watchdog is one of the most trivial operations in the OS
(basically little more than a ioctl() call), to the support for this
is not more than a handful lines of code. Maintaining this externally
with complex IPC between PID 1 (and the daemons) and this watchdog
daemon would be drastically more complex, error-prone and resource
intensive.)

Note that the built-in hardware watchdog support of systemd does
not conflict with other watchdog software by default. systemd does not
make use of /dev/watchdog by default, and you are welcome to
use external watchdog daemons in conjunction with systemd, if this
better suits your needs.

And one last thing: if you wonder whether your hardware has a
watchdog, then the answer is: almost definitely yes — if it is anything more
recent than a few years. If you want to verify this, try the wdctl
tool from recent util-linux, which shows you everything you need to
know about your watchdog hardware.

I’d like to thank the great folks from Pengutronix for contributing
most of the watchdog logic. Thank you!

Footnotes

[1] Though actually most desktops tend to include watchdog
hardware these days too, as this is cheap to build and available in
most modern PC chipsets.

[2] So, here’s a free tip for you if you hack on the core
OS: don’t enable this feature while you hack. Otherwise your system
might suddenly reboot if you are in the middle of tracing through PID
1 with gdb and cause it to be stopped for a moment, so that no
hardware ping can be done…

systemd for Administrators, Part XIV

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/self-documented-boot.html

And
here’s the fourteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

The Self-Explanatory Boot

One complaint we often hear about systemd is
that its boot process was hard to understand, even
incomprehensible. In general I can only disagree with this sentiment, I
even believe in quite the opposite: in comparison to what we had
before — where to even remotely understand what was going on you had
to have a decent comprehension of the programming language that is
Bourne Shell[1] — understanding systemd’s boot process is
substantially easier. However, like in many complaints there is some
truth in this frequently heard discomfort: for a seasoned Unix
administrator there indeed is a bit of learning to do when the switch
to systemd is
made. And as systemd developers it is our duty to make the learning
curve shallow, introduce as few surprises as we can, and provide
good documentation where that is not possible.

systemd always had huge body of documentation as manual
pages
(nearly 100 individual pages now!), in the Wiki and
the various blog stories I posted. However, any amount of
documentation alone is not enough to make software easily
understood. In fact, thick manuals sometimes appear intimidating and
make the reader wonder where to start reading, if all he was
interested in was this one simple concept of the whole system.

Acknowledging all this we have now added a new, neat, little
feature to systemd: the self-explanatory boot process. What do we mean
by that? Simply that each and every single component of our boot comes
with documentation and that this documentation is closely linked to
its component, so that it is easy to find.

More specifically, all units in systemd (which are what
encapsulate the components of the boot) now include references to
their documentation, the documentation of their configuration files
and further applicable manuals. A user who is trying to understand the
purpose of a unit, how it fits into the boot process and how to
configure it can now easily look up this documentation with the
well-known systemctl status command. Here’s an example how
this looks for systemd-logind.service:

$ systemctl status systemd-logind.service
systemd-logind.service - Login Service
	  Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static)
	  Active: active (running) since Mon, 25 Jun 2012 22:39:24 +0200; 1 day and 18h ago
	    Docs: man:systemd-logind.service(7)
	          man:logind.conf(5)
	          http://www.freedesktop.org/wiki/Software/systemd/multiseat
	Main PID: 562 (systemd-logind)
	  CGroup: name=systemd:/system/systemd-logind.service
		  └ 562 /usr/lib/systemd/systemd-logind

Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event2 (Power Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event6 (Video Bus)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event0 (Lid Switch)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event1 (Sleep Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event7 (ThinkPad Extra Buttons)
Jun 25 22:39:25 epsilon systemd-logind[562]: New session 1 of user gdm.
Jun 25 22:39:25 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/42/X11-display.
Jun 25 22:39:32 epsilon systemd-logind[562]: New session 2 of user lennart.
Jun 25 22:39:32 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/500/X11-display.
Jun 25 22:39:54 epsilon systemd-logind[562]: Removed session 1.

On the first look this output changed very little. If you look
closer however you will find that it now includes one new field:
Docs lists references to the documentation of this
service. In this case there are two man page URIs and one web URL
specified. The man pages describe the purpose and configuration of
this service, the web URL includes an introduction to the basic
concepts of this service.

If the user uses a recent graphical terminal implementation it is
sufficient to click on the URIs shown to get the respective
documentation[2]. With other words: it never has been that
easy to figure out what a specific component of our boot is about:
just use systemctl status to get more information about it
and click on the links shown to find the documentation.

The past days I have written man pages and added these references
for every single unit we ship with systemd. This means, with
systemctl status you now have a very easy way to find out
more about every single service of the core OS.

If you are not using a graphical terminal (where you can just click
on URIs), a man page URI in the middle of the output of systemctl
status
is not the most useful thing to have. To make reading the
referenced man pages easier we have also added a new command:

systemctl help systemd-logind.service

Which will open the listed man pages right-away, without the need
to click anything or copy/paste an URI.

The URIs are in the formats documented by the uri(7)
man page. Units may reference http and https URLs, as well as man and
info pages.

Of course all this doesn’t make everything self-explanatory, simply
because the user still has to find out about systemctl status
(and even systemctl in the first place so that he even knows
what units there are); however with this basic knowledge further
help on specific units is in very easy reach.

We hope that this kind of interlinking of runtime behaviour and the
matching documentation is a big step forward to make our boot easier
to understand.

This functionality is partially already available in Fedora 17, and
will show up in complete form in Fedora 18.

That all said, credit where credit is due: this kind of references
to documentation within the service descriptions is not new, Solaris’
SMF had similar functionality for quite some time. However, we believe
this new systemd feature is certainly a novelty on Linux, and with
systemd we now offer you the best documented and best self-explaining
init system.

Of course, if you are writing unit files for your own packages,
please consider also including references to the documentation of your
services and its configuration. This is really easy to do, just list
the URIs in the new Documentation= field in the
[Unit] section of your unit files. For details see systemd.unit(5). The
more comprehensively we include links to documentation in our OS
services the easier the work of administrators becomes. (To make sure
Fedora makes comprehensive use of this functionality I filed a bug on
FPC
).

Oh, and BTW: if you are looking for a rough overview of systemd’s
boot process here’s
another new man page we recently added
, which includes a pretty
ASCII flow chart of the boot process and the units involved.

Footnotes

[1] Which TBH is a pretty crufty, strange one on top.

[2] Well, a terminal
where this bug is fixed
(used together with a help
browser where this one is fixed
).

Presentation in Warsaw

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/warsaw.html

I recently had the chance to speak about systemd
and other projects, as well as the politics behind them at a Bar Camp in Warsaw,
organized by the fine people of OSEC. The presentation has been recorded,
and has now been posted online. It’s a very long recording (1:43h),
but it’s quite interesting (as I’d like to believe) and contains a bit
of background where we are coming from and where are going to. Anyway,
please have a look. Enjoy!

I’d like to thank the organizers for this great event and for
publishing the recording online.

systemd for Administrators, Part XIII

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/systemctl-journal.html

Here’s
the thirteenth installment
of

my ongoing series
on
systemd
for
Administrators:

Log and Service Status

This one is a short episode. One of the most commonly used commands
on a systemd
system is systemctl status which may be used to determine the
status of a service (or other unit). It always has been a valuable
tool to figure out the processes, runtime information and other meta
data of a daemon running on the system.

With Fedora 17 we introduced the
journal
, our new logging scheme that provides structured, indexed
and reliable logging on systemd systems, while providing a certain
degree of compatibility with classic syslog implementations. The
original reason we started to work on the journal was one specific
feature idea, that to the outsider might appear simple but without the
journal is difficult and inefficient to implement: along with the
output of systemctl status we wanted to show the last 10 log
messages of the daemon. Log data is some of the most essential bits of
information we have on the status of a service. Hence it it is an
obvious choice to show next to the general status of the
service.

And now to make it short: at the same time as we integrated the
journal into systemd and Fedora we also hooked up
systemctl with it. Here’s an example output:

$ systemctl status avahi-daemon.service
avahi-daemon.service - Avahi mDNS/DNS-SD Stack
	  Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled)
	  Active: active (running) since Fri, 18 May 2012 12:27:37 +0200; 14s ago
	Main PID: 8216 (avahi-daemon)
	  Status: "avahi-daemon 0.6.30 starting up."
	  CGroup: name=systemd:/system/avahi-daemon.service
		  ├ 8216 avahi-daemon: running [omega.local]
		  └ 8217 avahi-daemon: chroot helper

May 18 12:27:37 omega avahi-daemon[8216]: Joining mDNS multicast group on interface eth1.IPv4 with address 172.31.0.52.
May 18 12:27:37 omega avahi-daemon[8216]: New relevant interface eth1.IPv4 for mDNS.
May 18 12:27:37 omega avahi-daemon[8216]: Network interface enumeration completed.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for fd00::e269:95ff:fe87:e282 on eth1.*.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 172.31.0.52 on eth1.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering HINFO record with values 'X86_64'/'LINUX'.
May 18 12:27:38 omega avahi-daemon[8216]: Server startup complete. Host name is omega.local. Local service cookie is 3555095952.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/ssh.service) successfully established.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/sftp-ssh.service) successfully established.

This, of course, shows the status of everybody’s favourite
mDNS/DNS-SD daemon with a list of its processes, along with — as
promised — the 10 most recent log lines. Mission accomplished!

There are a couple of switches available to alter the output
slightly and adjust it to your needs. The two most interesting
switches are -f to enable follow mode (as in tail
-f
) and -n to change the number of lines to show (you
guessed it, as in tail -n).

The log data shown comes from three sources: everything any of the
daemon’s processes logged with libc’s syslog() call,
everything submitted using the native Journal API, plus everything any
of the daemon’s processes logged to STDOUT or STDERR. In short:
everything the daemon generates as log data is collected, properly
interleaved and shown in the same format.

And that’s it already for today. It’s a very simple feature, but an
immensely useful one for every administrator. One of the kind “Why didn’t
we already do this 15 years ago?”.

Stay tuned for the next installment!

Boot & Base OS Miniconf at Linux Plumbers Conference 2012, San Diego

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc2012.html

Linux Plumbers Conference Logo

We are working on putting together a miniconf on
the topic of Boot & Base OS
for the Linux Plumbers Conference 2012 in San
Diego (Aug 29-31). And we need your submission!

Are you working on some exciting project related to Boot and Base OS and
would like to present your work? Then please submit something following
these guidelines
, but please CC Kay Sievers and Lennart Poettering.

I hope that at this point the Linux Plumbers Conference
needs little introduction, so I will spare any further prose on how great and
useful and the best conference ever it is for everybody who works on the plumbing
layer of Linux. However, there’s one conference that will be co-located with
LPC that is still little known, because it happens for the first time: The C Conference, organized by Brandon Philips
and friends. It covers all things C, and they are still looking for more
topics, in a reverse CFP. Please
consider submitting a proposal and registering to the conference!

C
Conference Logo

The Most Awesome, Least-Advertised Fedora 17 Feature

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/multi-seat.html

There’s one feature In the upcoming Fedora 17 release that is
immensly useful but very little known, since its feature page
‘ckremoval’
does not explicitly refer to it in its name: true
automatic multi-seat support for Linux.

A multi-seat computer is a system that offers not only one local
seat for a user, but multiple, at the same time. A seat refers to a
combination of a screen, a set of input devices (such as mice and
keyboards), and maybe an audio card or webcam, as individual local
workplace for a user. A multi-seat computer can drive an entire class
room of seats with only a fraction of the cost in hardware, energy,
administration and space: you only have one PC, which usually has way
enough CPU power to drive 10 or more workplaces. (In fact, even a
Netbook has fast enough to drive a couple of seats!) Automatic
multi-seat
refers to an entirely automatically managed seat setup:
whenever a new seat is plugged in a new login screen immediately
appears — without any manual configuration –, and when the seat is
unplugged all user sessions on it are removed without delay.

In Fedora 17 we added this functionality to the low-level user and
device tracking of systemd, replacing the previous ConsoleKit logic
that lacked support for automatic multi-seat. With all the ground work
done in systemd, udev and the other components of our plumbing layer
the last remaining bits were surprisingly easy to add.

Currently, the automatic multi-seat logic works best with the USB
multi-seat hardware from Plugable
you can buy cheaply on Amazon
(US)
. These devices require exactly zero configuration with the
new scheme implemented in Fedora 17: just plug them in at any time,
login screens pop up on them, and you have your additional
seats. Alternatively you can also assemble your seat manually with a
few easy loginctl
attach
commands, from any kind of hardware you might have lying
around. To get a full seat you need multiple graphics cards, keyboards
and mice: one set for each seat. (Later on we’ll probably have a graphical
setup utility for additional seats, but that’s not a pressing issue we
believe, as the plug-n-play multi-seat support with the Plugable
devices is so awesomely nice.)

Plugable provided us for free with hardware for testing
multi-seat. They are also involved with the upstream development of
the USB DisplayLink driver for Linux. Due to their positive
involvement with Linux we can only recommend to buy their
hardware. They are good guys, and support Free Software the way all
hardware vendors should! (And besides that, their hardware is also
nicely put together. For example, in contrast to most similar vendors
they actually assign proper vendor/product IDs to their USB hardware
so that we can easily recognize their hardware when plugged in to set
up automatic seats.)

Currently, all this magic is only implemented in the GNOME stack
with the biggest component getting updated being the GNOME Display
Manager. On the Plugable USB hardware you get a full GNOME Shell
session with all the usual graphical gimmicks, the same way as on any
other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics
cards such as these USB devices!) If you are hacking on a different
desktop environment, or on a different display manager, please have a
look at the
multi-seat documentation
we put together, and particularly at
our short piece about writing
display managers
which are multi-seat capable.

If you work on a major desktop environment or display manager and
would like to implement multi-seat support for it, but lack the
aforementioned Plugable hardware, we might be able to provide you with
the hardware for free. Please contact us directly, and we might be
able to send you a device. Note that we don’t have unlimited devices
available, hence we’ll probably not be able to pass hardware to
everybody who asks, and we will pass the hardware preferably to people
who work on well-known software or otherwise have contributed good
code to the community already. Anyway, if in doubt, ping us, and
explain to us why you should get the hardware, and we’ll consider you!
(Oh, and this not only applies to display managers, if you hack on some other
software where multi-seat awareness would be truly useful, then don’t
hesitate and ping us!)

Phoronix has this
story about this new multi-seat
support which is quite interesting and
full of pictures. Please have a look.

Plugable started a Pledge
drive
to lower the price of the Plugable USB multi-seat terminals
further. It’s full of pictures (and a video showing all this in action!), and uses the code we now make
available in Fedora 17 as base. Please consider pledging a few
bucks.

Recently David Zeuthen added
multi-seat support to udisks
as well. With this in place, a user
logged in on a specific seat can only see the USB storage plugged into
his individual seat, but does not see any USB storage plugged into any
other local seat. With this in place we closed the last missing bit of
multi-seat support in our desktop stack.

With this code in Fedora 17 we cover the big use cases of
multi-seat already: internet cafes, class rooms and similar
installations can provide PC workplaces cheaply and easily without any
manual configuration. Later on we want to build on this and make this
useful for different uses too: for example, the ability to get a login
screen as easily as plugging in a USB connector makes this not useful
only for saving money in setups for many people, but also in embedded
environments (consider monitoring/debugging screens made available via
this hotplug logic) or servers (get trivially quick local access to
your otherwise head-less server). To be truly useful in these areas we
need one more thing though: the ability to run a simply getty
(i.e. text login) on the seat, without necessarily involving a
graphical UI.

The well-known X successor Wayland already comes out of the box with multi-seat
support based on this logic.

Oh, and BTW, as Ubuntu appears to be “focussing” on “clarity” in the
cloud” now ;-), and chose Upstart instead of systemd, this feature
won’t be available in Ubuntu any time soon. That’s (one detail of) the
price Ubuntu has to pay for choosing to maintain it’s own (largely
legacy, such as ConsoleKit) plumbing stack.

Multi-seat has a long history on Unix. Since the earliest days Unix
systems could be accessed by multiple local terminals at the same
time. Since then local terminal support (and hence multi-seat)
gradually moved out of view in computing. The fewest machines these
days have more than one seat, the concept of terminals survived almost
exclusively in the context of PTYs (i.e. fully virtualized API
objects, disconnected from any real hardware seat) and VCs (i.e. a
single virtualized local seat), but almost not in any other way (well,
server setups still use serial terminals for emergency remote access,
but they almost never have more than one serial terminal). All what we
do in systemd is based on the ideas originally brought forward in
Unix; with systemd we now try to bring back a number of the good ideas
of Unix that since the old times were lost on the roadside. For
example, in true Unix style we already started to expose the concept
of a service in the file system (in
/sys/fs/cgroup/systemd/system/), something where on Linux the
(often misunderstood) “everything is a file” mantra previously
fell short. With automatic multi-seat support we bring back support
for terminals, but updated with all the features of today’s desktops:
plug and play, zero configuration, full graphics, and not limited to
input devices and screens, but extending to all kinds of devices, such
as audio, webcams or USB memory sticks.

Anyway, this is all for now; I’d like to thank everybody who was
involved with making multi-seat work so nicely and natively on the
Linux platform. You know who you are! Thanks a ton!